TinyML

What is TinyML?

TinyML stands for Tiny Machine Learning and describes die Application of Machine learning in small or tiny electronic components and devices such as microcontrollers or IoT devices or in embedded systems. This is to enable the devices to use and implement machine learning.

Due to the conditions in such devices, there are other prerequisites for machine learning than in other use cases. Since the devices are normally operated with low energy requirements, this premise must also apply to the computing capacities for solving machine learning problems. The great special feature in this approach lies in the Resource constraint in the sense of limited working memory as well as limited computing power of the microcontrollers.

There is No uniform definition of TinyML devicesHowever, the categorisation is often for devices with a working memory of less than one megabyte and a power consumption of less than one milliwatt.

How does the model work?

Since the training of the Machine Learning Models is usually computationally intensive, the training must be carried out in an external environment. After the external training of the model, the algorithm is transferred once to the TinyML device. In the optimal case, the device for inference, i.e. for productive operation, is capable of performing the machine learning tasks autonomously and without Cloud communication Execute. Common frameworks and tools in the field of machine learning such as TensorFlow fail for the use case on TinyML devices, which is why special libraries and frameworks have been developed for this purpose.

Advantages of TinyML

The advantages of TinyML lie primarily in the possible Autarky of the system. Due to the option of local data processing, the need for a high-bandwidth internet connection is eliminated. At the same time, this also reduces latency times. Since there is no need for data transfer between different devices or systems, TinyML also has high demands in the field of Privacy and data protection. Due to its low energy requirements, TinyML can be operated on battery-powered devices.

Use cases and examples

The fields of application of TinyML are very diverse. Especially Applications on microcontrollers are often widespread because they can be used universally. Often they also work in combination with other applications, as in the so-called keyword spotting or wakeword detection of Voice assistants. If a keyword (e.g. "Hey Siri" or "Okay Google") is recognised, the main CPU of the device switches on, while TinyML is responsible for the permanent filtering of the activation commands. A similar example describes the Monitoring accelerometers and gyroscopes through TinyML in smartphones, which detects if a device is picked up and then activates the main CPU.

Another use case is described by the Object and image recognition of surveillance cameras or sound analysis. In the field of Fire prevention TinyML can be used to train smoke detectors to distinguish real fires from false alarms. Also for Microdrone applications TinyML is used by enabling the drones to navigate independently in the environment. Due to the special requirements of TinyML, special frameworks and libraries are needed for implementation. Examples include TensorFlow Lite, uTensor or CMSIS-NN (Common Microcontroller Software Interface Standard - Neural Network).

Transfer Learning

What is Transfer Learning?

Transfer learning describes a method in the field of Machine Learningwhich are mainly used for Classification tasks is applied. Transfer Learning is based on the approach that the Algorithm a pre-trained Model (e.g. of a artificial neural network) of a similar or also foreign use case and supplements or extends it with additional layers.which are aimed at the specific application.

The difference of transfer learning to other machine learning methods is that the Model not trained from scratch but can be based on data that has already been trained. The pre-trained model is normally adopted unchanged, which is why we also speak of "frozen layers". The trained model is then transferred to the model of the specific use case and subsequently concretised by adding further layers. This approach has the advantage of saving time and resources, as a certain part of the training can be skipped and a pre-trained model can be used.

The method is often used in the Object recognition and image data analysis Application. As a tangible example of the function of the approach of this learning method, the case is often mentioned that a pre-trained model can basically identify dogs in photos or distinguish them from other animals or objects. In the context of transfer learning, the classification layer is extended to include training for the ability to distinguish between different breeds of dogs. Something similar can also be applied in a previous step, in which models were trained to distinguish dogs from cats, for example, or living beings from objects in general.

What are applications and examples?

The areas of application of transfer learning are manifold, whereby the currently largest area of application in image data analysis lies. Since videos are merely a string of individual images, the method can clearly also be applied to videos. In order for machines to be able to "see" or understand the information on images, they must first be made readable for machines, which is the area of the Computer Vision is to be attributed.

A well-known model in the field of Image classification The Vision Transformer (ViT) which lays the foundation for further data processing. Another area of application is the Language analysisbut also the processing of text data. In the sub-area of the Natural Language Processing (NLP) human language is made understandable for machines, which lays the foundation for further applications of transfer learning. A very large and useful field of application is the Medical image classification or image analysis is the most important. Transfer learning is used above all in computer-assisted tissue recognition in connection with imaging examination methods such as CT or MRT. With the help of the learning method, abnormalities in tissue images can be identified and categorised or classified, which can indicate tumours, cancer or other diseases, for example.

Some Programme libraries and frameworks like Keras or PyTorch offer simple implementation possibilities of pre-trained models that are used as attachment points for transfer learning. In the field of image recognition, for example, ResNet from Microsoft or Inception from Google can be mentioned. These models are implemented in the library and then modified or extended within the framework of so-called "fine-tuning" so that the model can be used for the specific application.

Transformer (Machine Learning)

What is a Transformer?

Transformer in the area of the machine learning are a form of neural networksThe first is the "Attention Mechanism", which makes use of a so-called "attention mechanism". Here, a part of an input variable (for example, a word, a word syllable of a sentence or a pixel of an image) is related to the remaining parts of an input variable.

The aim of this method is that the particular word or pixel contributes to the understanding of the overall data by combining it with the rest of the components. For example, with Search queries on the internet the understanding of pronouns and prepositions in connection with nouns is elementary, as only in this way is it possible and purposeful to grasp the meaning of the overall search.

Transformers are predominantly found in Application in the field of Deep Learning to Text recognition, -processing or the Image recognition.

Architecture in Deep Learning

The structure of a transformer in machine learning is basically divided into an encoder and a decoder. The sequence of the data pass in the encoder and decoder is as follows.

In the first step of the process, the input is "embedded" in the Processable data transferred in the form of vectors. In the next step, the position of the vectors (or words in a sentence) is communicated to the transformer by the "positional encoding" in the form of an indexing. This is followed by the first attention mechanism. In this multi-head attention layer, the transformer compares the data currently being processed (e.g. a word) with all other data (e.g. the remaining words in a sentence) and determines the relevance. Because of this self-comparison, this layer is also called "self-attention".

Now follows the step "Add & Norm" in which the original data is copied unchanged before passing through the multi-head attention layer and is added and normalised with the processed data from the multi-head attention layer. The last layer of the encoder is the "Feed-Forward-layer", which is represented by a neural network with an input, a hidden and an output layer and converts the values to a range from 0 to infinity by a non-linear activation function. The encoder processing is completed with a repeated and previously described "Add & Norm" step.

In the next process step, the decoder starts by initialising and positioning an output sequence. This is done analogously to the encoder by "output embedding" and "positional encoding". This step is followed by running through the "Masked Multi-Head Attention" layer, which is particularly relevant in the training phase of the model. Here, the decoder learns from an actual input. Training data generate or approach a target output. Due to the parallel mode of operation of the transformer in machine learning, the respective position of the individual output sequence is already available to the decoder in training modewhereby the future position of the output sequence is masked or obscured. This layer also gets its name from this.

This layer is followed by another "Add & Norm" step before the data is passed on to the multi-head attention layer. This layer is also referred to as "Encoder-Decoder Attention", as it involves an Connection between the encoder and the decoder is established. It thus connects the input sequences passed through in the encoder with the previously generated output sequences and is therefore also called a "cross-attention" connection. This mechanism is required, for example, in order to Translation of a text into another language to calculate which word of the target language should be ranked next in the sentence. This layer is followed by another "Add & Norm" step, a "Feed-Forward layer", which is similar to the procedure from the encoder, and another "Add & Norm" step. In the penultimate step of the transformer in Deep Learning, the data/vectors processed to date are transferred into a larger vector in the linear layer, in order to be able to represent the entire vocabulary of a target language in the context of a translation, for example. In the final softmax function, a probability between 0 and 1 is calculated for each output sequence and thus the most probable final output is calculated.

Temporal Difference Learning

What is Temporal Difference Learning?

Temporal Difference Learning (also called TD Learning) describes a version of reinforcement learning.This is one of the three learning methods of machine learning, along with supervised learning and unsupervised learning.

As with other reinforcement learning methods, Temporal Difference Learning does not require the learning algorithm to have a starting point or a starting point. Training data necessary. The system, or a software agent, learns through a trial-and-error process in which it receives a reward for a sequence of decisions/actions and aligns and adjusts its future strategy accordingly. The model of the algorithm is based on the Markov decision problem, in which the benefit for a software agent results from a sequence of actions.

Unlike other learning methods, in TD learning the assessment function updates with the appropriate reward after each individual action, rather than after a sequence of actions has been completed. In this way, the strategy iteratively approaches the optimal function. This process is called bootstrapping or bragging and aims to reduce the variance in finding a solution.

What algorithms exist in TD learning?

Within Temporal Difference Learning, several algorithms exist to implement the method.

3. Q-Learning the software agent evaluates the utility of an action to be performed instead of the utility level of a state and chooses the action with the greatest increase in utility based on the current evaluation function. In view of this, Q-learning is referred to as an "action-value function" instead of a "state-value function".

Also with SARSA (abbreviation for "state-action-reward-state-action") is an algorithm with an action-value function. In addition to this commonality with Q-learning, SARSA differs from Q-learning in that Q-learning is an off-policy algorithm, whereas SARSA is an on-policy algorithm. In the case of an off-policy, the next state is taken into account for action determination, whereas in the case of on-policy, the algorithm takes into account both the next state and its current action and the agent thus remains true to its strategy for calculating the subsequent action. The algorithms considered so far only take into account the immediate reward of the next action.

With so-called TD n-step methods on the other hand, the rewards of the n next steps are included.

At TD Lambda TD(λ) is an extension of the temporal difference learning algorithm. There is the possibility that not only a single state leads to the adjustment of the evaluation function, but within a sequence the values of several states can be adjusted. The decay rate λ regulates the extent of the possible change for each individual state, whereby this quantity moves away from the state under consideration with each iteration and decreases exponentially. TD-Lambda can also be applied to the methods of Q-learning and SARSA.

What are these algorithms used for in practice?

The areas of application of Temporal Difference Learning in the context of reinforcement learning methods are manifold. A striking example of its use is the game TD-Gammon, which is based on the game Backgammon and was developed using a TD-Lambda algorithm. The same applies to the game AlphaGowhich is based on the Japanese board game Go.

One application of Q-learning can be found in the framework of the autonomous driving in road traffic, as the system independently learns collision-free overtaking strategies and lane changes and then maintains a constant speed.

SARSA, on the other hand, can be used to detect credit card fraud, for example. The SARSA method calculates the algorithm for detecting fraud, while the Classification- and Regression method of a Random-Forest optimised the accuracy of credit card default prediction.

Text recognition (Optical Character Recognition)

What is text recognition?

Optical Character Recognition (OCR) converts analogue text into editable digital text. For example, a printed form is scanned and converted by the OCR software into a text document on the computer, which can then be searched, edited and saved.

Modern OCR text recognition is able to correctly recognise over 99 % of the text information. Words that are not recognised are marked by the programme and corrected by the user.

To further improve the results, OCR text recognition is often supplemented with methods of context analysis (Intelligent Character Recognition, ICR for short). For example, if the text recognition software has recognised "2room", the "2" is corrected to a "Z", resulting in the output of the word "room", which makes sense in context.

There is also Intelligent Word Recognition (IWR), which is supposed to solve the problems of recognising flowing handwriting.

Some examples of free and paid optical character recognition software (in alphabetical order):

  • ABBYY FineReader PDF
  • ABBYY FlexiCapture
  • Adobe Acrobat Pro DC
  • Amazon Textract
  • Docparser
  • FineReader
  • Google Document AI
  • IBM Datacap
  • Klippa
  • Microsoft OneNote
  • Nanonets
  • OmniPage Ultimate
  • PDF Reader
  • Readiris
  • Rossum
  • SimpleOCR
  • Softworks OCR
  • Soda PDF
  • Veryfi

Write an OCR text recogniser yourself with Python or C#

It is possible to work with the programming languages Python or C# itself to incorporate text recognition into scripts. This requires the free OCR library Tesseract, which works for Linux and Windows.

This approach provides a customisable text recognition solution for both scans and photos.

How does Optical Character Recognition software work?

The basis is the raster graphic (image copy of the text), which is created with the help of a scanner or a camera from the physically existing text, for example a book page. The text recognition of a photo is usually more difficult here than with a scan, where the image copy provides very similarly good conditions. With a photo, exposure and the angle at which the document was taken can cause problems, but these can be corrected through the use of AI.

After that, the OCR software works in 3 steps:

1. recognition of the page and outline structure

The scanned graphic is analysed for dark and light areas. Normally, the dark areas are identified as characters to be recognised and the light areas as background.

2. pattern or feature recognition

This is followed by further processing of the dark areas to find alphabetic letters or numeric digits. The approach of the various OCR solutions differs in whether only one character, one word or a text block is recognised at a time. The characters are identified using pattern or feature recognition:

Pattern recognition: The OCR programme compares the characters to be checked with its database of text examples in different fonts and formats and recognises identical patterns.

Feature recognition: The OCR programme applies rules regarding the features of a particular letter or number. Features can be, for example, the number of angled lines, crossed lines or curves in a character.

For example, the information for the letter "F" consists of a long vertical line and 2 short rectangular lines.

3. coding in output format and error control

Depending on the area of application and the software used, the document is saved in different formats. For example, it is output as a Word or PDF file, or saved directly in a database.

In addition, the last step also involves error checking by the user to manually correct words or characters that are not recognised.

How does AI support text recognition?

On the one hand supports Artificial Intelligence (AI) in text recognition already during the optimisation of the raster graphics, especially with photos. If the document to be read in is bent or creased, the text is sometimes too slanted or distorted, which causes problems for the OCR software during processing. With photos, poor exposure and an unsuitable shooting angle can also lead to bad conditions for the OCR software.

With the help of AI, the document can be "smoothed" in its structure, the lighting optimised and the angle corrected, and thus again offers good conditions for text recognition.

On the other hand, AI improves the results of text recognition itself. Artificial intelligence learns with every text and every corrected error. In this way, the errors in text recognition are constantly minimised and the OCR software constantly delivers better results.