Word2vec

What is Word2vec?

Word2vec is a neural networkwhich is used for text analysis by means of Word embedding (in English Word Embedding) is used. To do this, Word2vec converts the words of a text into numerical vectors and can use these numbers to mathematically calculate and recognise connections and the context of the words to each other. Through supervised learning (Supervised Learning) increases Word2vec's ability to recognise contexts and out admitted, such as:

The "sun" is for the "day" what the "moon" is.nd" is for the "night".

or

Berlin" is to "Germany" what "Tokyo" is to "Japan".an" is

In addition, affiliations such as "spoon", "fork" and "knife" are recognised and these words are grouped together.

Word2vec was introduced in 2013 by a team of researchers from Google led by Tomas Mikolov. The research paper entitled "Efficient Estimation of Word Representations in Vector Space". describes two possible methods for learning the context:

Continuous Bag-of-Words Model (CBOW)

The target word is predicted based on the adjacent context words. The context consists of some words before and after the searched (middle) word. It is Bag-of-words model because the order of the words is not relevant in the context. The CBOW model is particularly good at this, syntactic relations between two words to capture.

Continuous Skip Gram Model

In this model, several context words are issued based on one input word. It basically works the other way around like the CBOW model. The Continuous Skip Gram model is better at it, semantic relations between two words than the CBOW model.

An example using the word "fish

The CBOW model will output the plural "fish" as the next vector. The Skip-Gram model, on the other hand, will find an independent, but semantically relevant, word like "fishing rod".

This difference makes the Skip-gram model the more popular of the two as it has greater utility for most applications.

What are the applications of Word2vec?

Word2vec, like other word embeddings, can be used for many online applications. It forms the Basis for search engine suggestions and recommendations in online shops. Through context analysis, optimised suggestions can be made for the user in order to output the best possible results. This makes Word2vec essential for areas such as e-commerce and customer relationship management. But also for Creating content or for scientific research it is very helpful.

Word2vec in Python

To use Word2vec in Python the two Modules gensim and nltk required.

Gensim is a Open Source Library in Python and is used for semantic text analysis and document comparison. The abbreviation nltk stands for "Natural Language Toolkit" and includes computational linguistics libraries and programmes for Python.

It is possible to use the Continuous Bag-of-Words model or the Continuous Skip-Gram model to train the context. Many already trained models can also be found online.

Knowledge-based systems

What are knowledge-based systems?

Knowledge-based systems, which are also often abbreviated to WBS, represent the generic term for intelligent information systems that use knowledge with methods such as knowledge representation and / or the Knowledge modelling Evaluate and make usable. They are considered a subfield of AI research. They are always used where there are tasks that require human intelligence.

In order to solve problems, such systems use both factual knowledge as well as knowledge components with uncertainties, which are often also referred to as subjective knowledge. Knowledge-based systems also include expert systems, software agents and rule-based systems.

How are they structured?

Knowledge-based systems consist of various core components that form a complex structure here. The Knowledge base forms the basis here. Facts, rules, case-specific knowledge and generic knowledge are stored under this component. In simple terms, this area could also be called a knowledge base.

The second key point is the inference component. In this section, the various pieces of information, i.e. knowledge, are processed and new rules as well as facts are derived. The inference component is therefore the workspace in knowledge-based systems. The user interface is the last important component. This is used for general communication with the user.

Is it a complex application, two further components are addedwhich can be considered as a minor component. Often these minor components are found for expert systemsto give an example of its application.

The Knowledge component is the first new component in this extended form. This offers the possibility of expanding the knowledge base both manually and automatically. The knowledge component establishes a direct connection between the user interface and the knowledge base. This way, the knowledge base can be constantly supplied with new or changed information.

As second component is the explanatory component. Through these, information about the solution finding is communicated to the user so that it can be better understood. In short, the answers to the user's questions of how and why are given or transmitted here.

What are examples of use in practice?

Knowledge-based systems can be used very flexibly and are thus used in monitoring, in general planning, but also in data interpretation. A typical example for the use of knowledge-based systems is also medical informatics. Here, these systems are used to develop a solution to a problem with the help of patient data. In this way, a diagnosis and a possible form of therapy can be derived, which is then applied to the patient.

How are knowledge-based systems used in artificial intelligence?

Knowledge-based systems are programmes that in the Artificial intelligence as a scientific discipline are listed. These programmes use application-specific knowledge to develop different approaches to solutions. In doing so, they act completely independently and separately from the rest of the system.

However, the task of such systems is not only to provide a Processing of the specific data but also to carry out a Elimination of the often poorly structured knowledge areas to be achieved. For this purpose, a systematic stocktaking is carried out, which builds up a new structure with the help of the available expertise. In this way, gaps in knowledge can be identified and, if necessary, closed.

This new structure can then be used to develop new theories and models. Through this principle, solution paths can be created for the respective tasks that could not have been created purely based on the expertise provided at the beginning. In AI research, such systems are thus not only to be seen as a possible aid for solving problems, but also offer entirely new technical possibilities due to their structure. The computational abilities are combined here with human ways of thinking, which leads to completely new possibilities and approaches to solutions.

Knowledge modelling

What is knowledge modelling?

Knowledge modelling deals with the Representation of knowledge in a form that can be interpreted by both humans and machines and is used in knowledge-based systems. It is a sub-area of knowledge management and the Artificial intelligence.

In order to be able to represent knowledge, several steps are required. The first is knowledge acquisition, which is divided into explicit knowledge, clearly represented by language and signs, and tacit knowledge, which becomes visible through an action. In the second step, knowledge representation, the acquired knowledge must be formalised. This is usually done by building a knowledge database or new links in the brain. In a third step, approaches to solving the problems are found before the knowledge is represented in the fourth step.

What are practical examples of knowledge modelling?

The Knowledge modelling is mainly applied in the manufacturing industrywhich is already heavily based on machine learning build. This model is used in the aviation and automotive industries. Both branches rely on the processing and communication of information for the continuous optimisation of processes and products. This includes, among other things, the Design, manufacturing and innovation process.

With the help of knowledge-based engineering, new products are manufactured quickly and efficiently and brought to market. In doing so, the companies can draw on existing knowledge from production. Of course, this requires very good knowledge management and precise knowledge modelling.

However, much of this knowledge is tacit, i.e. it is not available in a formalised form and is therefore difficult to retrieve and update. It is often also called tacit or personal knowledge. This is where knowledge modelling comes in. It is used to select types of knowledge that are needed for specific processes.
For this purpose, knowledge is summarised in a structured way by means of various procedures and methods and a formalised representation is made possible. This creates knowledge that can be processed and retrieved by computers and machines, i.e. the knowledge base. This then allows efficient implementation of new processes and workflows.

Word embedding

What is word embedding?

The word embedding establishes a connection between one or more entered words and the dictionary of a neural network. The term embedding contains the word embedding, which has its origin in mathematics. - Put simply, it is about a subset of a larger set. As a subset of the Natural Language Processing (NLP) and of the machine learning this form of mapping is used for natural language processing.

Typical use cases for NLP, and thus also word embedding, is the machine translation. - This means translators such as Google Translate or DeepL Translate. The extent to which algorithms influence the quality of the output or the translated text is decided by the procedures used in the algorithm.

What forms of word embedding are there?

The The simplest principle of word embedding is the Bag-of-words approach. In this method, a set of words is defined, with each word represented as a natural number and the order does not matter. Words can be repeated; repetitions are counted. Supplemented by a statistical classification procedure, it can be used, for example, to determine whether an email is spam or not by analysing the frequency of explicit words.

The Extension of the Bag-of-Words approach, Word2Vec represents each word as a multi-dimensional value. This representation makes it possible to visualise proximity to other words in a three-dimensional space. In this way, connections between words can be recognised and the artificial intelligence can be taught. This means that with the help of Word2Vec, for example, the best matching word (with the highest probability of being the missing word) is determined and a gap in a sentence is filled. The multi-dimensional vector representation can also be used to teach the neural network new words. To do this, the missing word in a sentence is not searched for; instead, two alternatives are offered to fill the gap. The neural network then learns the new words by means of so-called features, which is noticeable in the embedding in three-dimensional space.

Furthermore, the contextual word embedding an essential part of word embedding. The aim is to recognise and correctly represent the different meanings of homonyms. In practice, this goal is achieved by the Long short-term memory (LSTM) realised. This is a module that was originally intended to improve the development of artificial intelligence and did so. But even the Long Short-Term Memory (LSTM) quickly reached new limits. Neural networks with similar linguistic forms require so-called attention mechanisms. In order to be able to run these in parallel and consequently fast enough, in 2018 technology BERT was released, which was based on Transformer. BERT was trained using Next Sentence Prediction (NSP) and, thanks to this training, can now also learn contextual embeddings.

Which technologies are used to implement word embedding?

For the much discussed topic of machine translation and machine learning in general, there are Numerous libraries; most of them are based on the Python, the preferred programming languagewhen it comes to artificial intelligence. There are several reasons for this: Firstly, as a high-level language, Python is relatively easy to understand and has a low threshold; secondly, there are other useful libraries in addition to the machine learning libraries; and thirdly, Python is very flexible, which can be seen, for example, in the fact that the code can be executed on any platform. When it comes to concrete implementation, two libraries should be given particular attention: Keras Serves as a low-threshold APIthanks to which applications are made available in an uncomplicated manner; as the backend behind it, is offered by TensorFlow a library with a complex architecture. Keras is therefore merely a wrapper class for backend libraries such as Theano, PlaidML or MXNet.

Waymo

Waymo is a subsidiary of Alphabet Inc, Google's parent company, which develops autonomous driving technology. It operates a commercial self-driving taxi service in the greater Phoenix, Arizona region, called Waymo One. The company has fully mapped the Chandler area in Arizona. Towards the end of the year 2020, the company went into public service and became the first commercial service to operate self-driving cars without requiring human drivers as a backup.

The company is also developing its technology for use by other vehicles that include delivery vans, logistics vehicles and class 8 tractor-trailers. It also has partnered with leading vehicle manufacturers that include Jaguar Land Rover, Fiat Chrysler, Nissan-Renault, Daimler-AG and Volvo.

Technology

Waymo uses a combination of chips and sensors along with cameras to improve car visibility. It also has wipers to clear the lidar system. "Much of the self-driving hardware used by the company is developed in-house. Part of the hardware included enhanced camera vision system, several sensors, laser-based lidar system and improved radar systems. The sensors are able to give the vehicle a 360-degree view of the objects around them.

On the other hand, lasers can detect objects that are up to 300 metres away. There are also short-range lasers that focus on items close to the vehicle, while the radar can track objects as they move. The interior of the car does not have a steering wheel but a few buttons that include pull over, help, lock and start ride.

Engineers at Waymo have created a simulator program called Carcraft. This program simulates the driving conditions to help its vehicles determine the optimum settings for different driving conditions. This software has helped the vehicles navigate various cities where the company has or intends to start its service. It boasts of over 6 billion miles travelled by the end of 2020.

Are Self-Driving Vehicles Safe?

So far, the self-driving cars have been involved in a few, non-fatal minor collisions. "Most of these accidents were caused by other vehicles flouting some rules or being at fault. Generally, the vehicles are safe to ride in. To show confidence in the technology, the company publishes regular reports on incidents on the road that involve their vehicles. It also keeps testing their technologies on harsher conditions such as winter roads, crowded locations and off-road conditions.

The company expects that self-driving technology will be the new norm in the future. It is a matter of time before the technology becomes available around the world.