Computer vision studies how computers can be programmed to understand digital video or images. Computer vision is essential in areas such as object recognition and autonomous driving. Both in NLP as well as in computer vision have Deep Learning approaches proved to be extremely efficient.
Visual perception is one of the most complex processes. What is so simple and natural to us, without us even thinking about it, requires the constant processing of incredible amounts of data. We use our eyes in everything we do in life, and vision is arguably the most important sense - and an extremely sophisticated one that takes a long time to develop.
Teaching a computer to see like a human being has been on the minds of researchers for a long time. Although there had already been advances and successes in this field, the state of research changed when artificial neural networks (KNN) were developed. Thanks to their ability to model the complex relationships, they led to a leap in performance in the fields of object recognition and computer vision.
Thus, the creation of AI systems suddenly moved the automation of extremely complex tasks that seemed impossible some 20 years ago, such as autonomous driving, into the realm of possibility sooner rather than later.
For us, an image is the interplay of millions of visual inputs and objects whose interrelationships make up the whole and only then ultimately make sense to us. For a computer, an image is just another arrangement of zeros and ones put together in a strange way.
So it is not surprising that if we were to train, say, a GLM (Generalised Linear Model) to recognise an object in an image, we would not get very far. There are far too many correlations and they work in ways that are too difficult to be recognised by a linear model.
But deep neural networks, with their millions of neurons and billions of connections between them, are able to see through such clutter. Their ability to generate and recognise abstract features in a data set enables them to recognise edges, features and ultimately objects in an image.
This has enabled significant advances in computer vision. Today, neural networks have been trained to perform incredible tasks, such as recognising all types of images in a picture or recognising a person's sexual orientation from facial features.