Ontology

What is ontology?

In computer science, the term ontology describes an area for the unambiguous representation and communication of knowledge in this field. In addition to a uniform terminology, this knowledge also includes the use of relationships, hierarchies, rules and terms.

The Aim of an ontology in computer science is aimed at ensuring the clear and Clear provision of information and knowledge without room for interpretation through this "common language". The implementation of this network of relationships in this sub-area takes place primarily in information systems, artificial intelligence and in Databases Application.

As early as the beginning of the 1990s, the concept of ontology was used in connection with the artificial intelligence and from there it spread to many areas of computer science.

Term development

Your Ontology has its origins in philosophy, where it refers to the "doctrine of being".. Also in the definition of terms in the philosophical sense, one of the questions asked is how so-called entities (this describes a being or a concrete or abstract object) can be categorised or related to each other. Often the term metaphysics is used synonymously with ontology. The term metaphysics goes back to the Greek philosopher Aristotle and describes, according to the definition of the term, that "something" that comes after physics.

On the basis of this definitional space, questions arise about being, nothingness, finiteness and infinity, among others, which are also considered in all religions. In addition to Aristotle, the German philosopher Immanuel Kant also dealt extensively with metaphysics. While the term has its origins in philosophy, other scientific disciplines, such as psychology, sociology and medicine, are increasingly becoming aware of it in their research.

Examples for ontologies

An example of the application of knowledge representation in computer science is the so-called Semantic Web is the idea of the World Wide Web. This idea from World Wide Web founder Tim Berners-Lee is based on extending the conventional World Wide Web so that the meaning and significance of information can be clearly assigned.

A further background to this endeavour is to Facilitate communication or work between people and machines. In addition to the implementation of uniform rules, data models and syntax, the development of the ontology language Web Ontology Language (OWL) also provided a remedy. A concrete application example is the unambiguous and conflict-free meaning of the word "Washington" in the respective context. Since Washington, in addition to a city, can also represent a federal state, a name or a warship, among other things, a more detailed definition is necessary.

Another possible application for ontology in computer science is in the field of artificial intelligence and is used primarily for Machine-interpretable knowledge representation. With the help of the normalisation, rules and specifications of the ontology, an inference machine can draw logical conclusions.

In medicine, for example, it is found in the Gene Ontology Application. The aim is to provide and further develop databases that provide uniform information on the function of genes in biomedicine.

In the field of psychology, the representation of relations is particularly important in the sub-area of the Psychosociology widespread. With the help of these, attempts are made to grasp and categorise social phenomena such as groups, families, bonds, but also personalities with uniform terms, or to describe interactions.

Differences with taxonomy and epistemology

Ontology vs. taxonomy

While an ontology focuses on the network of connections and relations, a taxonomy describes structured hierarchical relationships. The term taxonomy is derived from the ancient Greek and translates as law of order. Taxonomy originated in the natural sciences and is still used extensively there to describe races, genera and orders. Taxonomy is also used in computer science to represent hierarchical relationships and inheritance.

Ontology vs. epistemology

The term epistemology also comes from ancient Greek and describes the teaching of science. Epistemology is often also paraphrased as epistemology and raises the question of how knowledge comes about and how knowledge is justified. While epistemology deals with the basic acquisition of knowledge, ontology focuses on the nature of being or reality.

Open Data

What is Open Data?

Open data is data that can be used, shared and processed by the general public. Open Data often comes in the form of a demand and is strongly promoted by the Open Knowledge Foundation. In summary, the said foundation defines open data as follows:

  • A Duplication of the data must not cause any costs. This means, for example, that the data formats in which the files are saved are chosen appropriately and data sets are basically complete; one can say that it is "made easy" for data consumers to share the content.
  • In addition to sharing, the type of reuse also plays a role. In order to To merge data with other sources, they must be provided in a format that is interoperable (the CSV or JSON format has become established for this). At best, the provider offers interfaces based on common protocols such as SOAP or REST to enable the Databases are readable by man and machine.
  • While the above-mentioned technical criteria are placed on open data, in contrast, no social conditions may be attached to it. Everyone must be able to use, share or further process the data - certain groups of persons or fields of application must not be excluded.

It is not without reason that the federal government and the Länder are committed to initiatives such as Open Government for Open Data. Public offices can work more efficiently, private companies benefit from easy access to knowledge and social security increases due to information transparency.

Legal backing through the Open Data Act

So that this progressive concept is actually implemented in practice, The first Open Data Act came into force in 2017.. Due to this legal situation Authorities obliged to provide their data in machine-readable form. In connection with the Federal Government's Open Data Strategy, Germany thus laid the foundation for a solid ecosystem. The Open Data Act is being further developed together with Austria and Switzerland and is intended to ensure even more responsible, innovative and public benefit-oriented data use in the future.

The most important German Open Data databases

  • The nationwide metadatabase GovData contains, in addition to the administrative data itself, a lot of information about the data, e.g. who created it, when and where.
  • GENESIS Online, the database of the Federal Statistical Office, contains a broad range of topics of official statistics and is categorically deeply structured. As for the databases Regional database Germany and Municipal education databaseIf the data are based on GENESIS-Online, various interfaces are available to process the data efficiently.
  • The Open Data Platform Open.NRW is intended to serve as an information portal and, under the guiding principle of "Open Government in North Rhine-Westphalia", provides a constantly growing database consisting of administrative data on projects in the state.
  • The Ministry of Regional Development and Housing in Baden-Württemberg relies on geodata for Open Government and provides the following information Geoportal Baden-Württemberg a comprehensive tool for interested associations, business representatives and citizens. A further point of contact for geographic reference data is the Geoportal Hesse data. Geodata are also mainly provided in the state of Lower Saxony. The State Office for Geoinformation and Land Surveying of Lower Saxony (LGLN) provides the platform Open Geo Data ready.
  • Among other things, you can find interesting information about the German capital Berlin on the portal Berlin Open Data. In addition, data sets from thematic fields such as education, health or transport can be viewed. The state of Schleswig-Holstein follows a similar approach. Here, too, the web application can be used to Open Data Schleswig-Holstein open data on socially relevant topics such as culture, energy or the economy.

Open Source

What is Open Source?

The term open source (OS or OSS for short) denotes software whose source code is publicly accessible and can be viewed, changed and used by anyone. Most open source software can be used free of charge.

The opposite of open source is closed source. The source code of this software is not publicly accessible and may not be changed, used or passed on. It is distributed commercially with the help of licences.

Difference between open source and freeware

Freeware refers to software that is made available free of charge by the author. However, the source code is not freely accessible and may not be modified or distributed. Therein lies the difference to OSS.

Advantages of Open Source

Low costs

The Most open source software is free of charge accessible. And even paid OSS is for the most part much cheaper compared to closed software alternatives.

Independence from commercial providers

Software from commercial providers pursues their corporate goals. Thus, there is a certain dependency, which can cause problems especially when the software no longer brings in enough profit. In most cases, the software is no longer supported or offered by the manufacturer over time and the customer has to look for alternatives.

With open source, this problem does not exist to the same extent because no, or only low monetisation is available.

Individuality

Since the Code editable at any time it can also be customised at any time. It is possible to delete unnecessary functions or add missing ones. In this way, individually suitable solutions can be sought and advanced.

This is possible either with the help of our own expertise, with the support of the community or with commercial experts.

Compatibility

The exclusion of competitors through their own data formats and systems tends to be a hindrance to open source, therefore much emphasis on interoperability (a system has the ability to cooperate with other systems). As a result, there are much fewer compatibility problems with open source than with closed source.

Security

This is because the code is checked several times by many developers, some of whom are very good, errors and security gaps are quickly noticed. With closed source, it usually takes longer.

Disadvantages of Open Source

Dependence on an active community

There is no claim to a guarantee or support against a manufactureras is the case with closed source applications. Therefore, with open source there is a certain dependence on an active community in terms of support and further development.

High training and knowledge expenditure

OSS is usually not as well-known and beginner-friendly for laypersons as the widely used commercial products. Thus the use of OSS often requires more familiarisation, training and expertise.

What is popular open source software as a business solution?

For ETL, Reporting, OLAP/Analysis and Data Mining

Pentaho by Hitachi Vantara offers a collection of Business Intelligence-software, which are free of charge in the basic version. Solutions are offered for the following areas ETLReporting, OLAP/Analysis and Data mining provided.

As an ETL tool, Pentaho Data Integration (PDI for short) offers connection possibilities to various Databases. Through further plug-ins, connections to other systems are also possible, such as to SAP with the help of ProERPconn and to Navision with the NaviX Table plug-in. Also Big Data-Pentaho Data Integration counts processing as one of its strengths.

Pentaho BI Suite offers one of the few business intelligence solutions in the open source sector.

For Data Virtualization

Data Virtualisation can be seen as the opposite of the ETL process, as the data remains in its original systems and the virtualisation component accesses it directly and makes it available for use.

Denodo Express from Denodo Technologies Inc. offers an open source solution for data virtualisation. It connects and integrates local and cloud-based data sources, as well as Big Data, with each other. This data is made available to end users, enterprise applications, dashboards, portals, intranet, search and other tools.

OpenLooKeng from Huawei has also been available as open source since mid-2020 and offers uniform SQL interfaces for accessing different data sources.

For data labelling

Data labelling is essential for Machine learningIt provides the existing data with the required characteristics, for example, whether a picture shows a person or not.

There are several data labelling tools available as open source. Some are specialised in certain file formats and others can process all of them.

Examples of data labelling tools for images only:

  • bbox-visualizer
  • CVATT
  • hover
  • Labelme
  • Yolo-mark

Examples of data labelling tools for text only:

  • dataqa
  • Hubdoccano

Examples of data labelling tools for audio, images and text:

  • awesome-data-labelling
  • Label studio

OpenGPT-X

What is OpenGPT-X?

OpenGPT-X describes a European project in which a large language model is to be developed. Language models are used, for example, for Chatbots but also for writing texts, understanding complex texts or conducting conversations. GPT stands for "Generative Pretrained". Transformer", the following "X" represents a variable for the version.

A consortium of well-known European companies, institutes and universities is participating in the project under the leadership of the Fraunhofer Institute. It was created, among other things, to build European sovereignty in the field of major language models and to minimise dependence on the USA and China. With GPT-3 (Generative Pretrained Transformer 3) was developed by the company OpenAI in May 2020, the major third-generation language model was introduced in the USA. In June 2021, China responded to the pioneer from the USA with Wu Dao 2.0 ("Understanding the Laws of Nature") in its second version.

What are the goals of the European joint project?

The The primary aim of the project is to preserve European digital sovereignty and independence with its own AI language model. The European properties in the area of Data protectionvalues and linguistic diversity are to be linked to one's own Model be taken into account.

The OpenGPT-X project is designed to enable data-based business solutions in the GAIA-X ecosystem. GAIA-X is a project to create a networked and secure data infrastructure in Europe to use and share data in a decentralised way. The name Gaia is derived from Greek mythology and describes a deity who is regarded as the personified earth.

OpenGPT-X is responsible for building a node for large AI language models and innovative language application services in the GAIA-X project.

What differentiates OpenGPT-X from other language models such as GPT-3?

In OpenGPT-X, special attention is paid to the European context of the AI language model. This primarily concerns the Integration of the many European languages, European ethical values as well as culture.

In addition, OpenGPT-X is also intended to meet European standards of data protection. These specifics are often cited as points of criticism of alternatives such as GPT-3 or Wu Dao 2.0 and are to be improved by the European solution and thus make the Protect the economic interests of "Europe as a business location.

Furthermore, this approach also allows governmental and legal concerns to be taken into account, such as the observance of European values, the European cultural context and regulations in the development of language applications. OpenGPT-X is also intended to be used in the decentralised Cloud solution GAIA-X and thus form a building block within the European data infrastructure.

OpenAI

What is OpenAI?

OpenAI is engaged in artificial intelligence research. OpenAI LP is a company controlled by the non-profit organisation OpenAI Inc. It is funded by central donors such as Elon Musk and Microsoft.

The goal of OpenAI is a artificial intelligence to develop on an open source basis. This is to be done in such a way that society benefits and can reap the benefits without any harm being done. The organisation offers free cooperation with other institutions and researchers can make their patents and research results available to the public. The company has already received over one billion dollars in donations.

OpenAI explores the important question of the existential threat of artificial intelligence potentially surpassing and replacing human intelligence. The background to the work is the fears of scientists such as Stephen Hawking and Stuart Jonathan Russel, who anticipate that artificial intelligence could one day acquire the ability to simply improve itself.

This could lead to a veritable explosion of intelligence. Thus, through the superiority of AI, a displacement of the human species by a superintelligent artificial intelligence would be conceivable. This would not be subject to any human values. Elon Musk also sees such artificial intelligence as the greatest existential danger for humanity. The aim is to counteract this risk in the development of artificial intelligence, and OpenAI was founded and conceived as an open-source non-profit organisation for this purpose.

What potential does artificial intelligence offer?

Artificial intelligence can revolutionise communication between humans and computers and learn to fully understand users' natural language. Artificial intelligence can automatically use programmes that are given a natural language interface. OpenAI Codex can translate natural language into code.

Practically everywhere human intelligence is used, artificial intelligence could also eventually be used. There is the OpenAI Gym platform, which deals with reinforcement learning. In addition, systems can cooperate with each other and develop joint algorithms. This was introduced with OpenAI Five. GPT-2 and GPT-3 are extensive projects that provide text generators, for example, and GPT-3, which can be controlled with 175 billion parameters.