[at] trainee program data.camp workshop: demand matching with NLP in an agile setting

by | 11. May 2021 | Basics

[at] trains its own data and AI experts

We at [at] have been running a trainee program for several years now, training data scientists and data engineers inhouse. Throughout the yearlong program, the trainees rotate trough several [at] teams to work on different projects and improve their skills in a wide variety of topics. As a novum in 2020, the data.camp was introduced – a four day workshop where all trainees work together on a common project of their choosing to boost teamwork and gain more practical experience.
To not be distracted by the usual office business, the data.camp usually takes place in the [at] project rooms in Austria. Due to Covid, the first data.camp in October 2020 had to take place in the Munich Office. Luckily, the team was able to use the new [at] office rooms (in the same building) to have at least some kind of a ‘camp’ feeling.
Follow these links if you want to find out more about the data science program or the data engineering program.

at data.camp
© [at]

Topic of the first data.camp: automated googeling

The topic of the first data.camp was embedded within a real and ongoing [at] project. We chose the topic of automated googling as it allowed us to work in three different fields: analytics, engineering and visualization. The original setting was as follows:
Whenever employees at an [at] customer needed a supplier for a certain demand, they would send their requirements to the supplier management department. There, another employee would search the internet for a fitting supplier. For example, a demand for AI trainings so that the supplier management department can figure out by which company those are offered. The assignment for [at] was to automate this process.
At the beginning, our [at] colleagues had already developed an algorithm that expected a demand text as an input and returned a list of potential suppliers as an output (see below). Under the hood, the algorithm first extracted key phrases from the demand text and used them to create search queries. These queries were sent to Google to retrieve a list of potential companies. To ensure that these companies are indeed companies, the algorithm checked them against the company database Crunchbase. From all verified companies, a description was retrieved from the web (again a Google search) to be then compared to the original demand text (using natural language processing). Finally, the algorithm outputted all companies whose descriptions where sufficiently similar to the demand text.

nlp project
© [at]

Project setup: Agile, of course

As data projects tend to constantly change with new developments, we used Scrum as a framework to work in an agile way. Therefore, we appointed a Scrum Master, a Product Owner and five Developers. Every day we had several Scrum meetings to synchronize. This made the data.camp a great opportunity to try out Scrum and all its concepts.
Behind the scenes we had experienced [at] experts at our disposal as a resource for our questions. This enabled as to advance very quickly and develop professional solutions.

the at data.camp
© [at]

Improvements in Analytics, Engineering and Visualization

Everyone who already participated in a Hackathon may know how much can be achieved within a few days. Here is what we did:
One workstream improved the natural language processing skills of the algorithm. As the demand texts generally concerned technologies, we taught the algorithm examples of words we are interested in (such as “augmented reality” or “machine learning”). This enabled us to create better search queries from the demand texts.
The runtime of the whole algorithm was around 30 minutes when we started. As this does not only result in a bad customer experience, but also makes development much more difficult (imagine waiting for 30 minutes each time you want to try something out), another workstream aimed to reduce the runtime. We increased the computational power by moving the project to the cloud and downloaded the Crunchbase database (for company verification) to use it locally instead of making API calls for each verification. Both measures resulted in a reduction of the runtime to only 2.5 minutes.
Generally, there is a lot going on behind the scenes of a machine learning service – just think of all the steps of our algorithm to convert a simple demand text into a company list. Although it can be very convenient for the end user that all this complexity is hidden, some users may need to understand how the final result was retrieved. Therefore, the last workstream outputted all intermediate results and displayed them in an app in real-time. Have a look at the screenshots at the bottom of this article.

Summary and Take-Aways

“We learnt a lot whilst having a lot of fun” is a very good summary of the first data.camp. On top of the technical achievements in analytics, engineering and visualization, we gained highly valuable experiences with Scrum. Moreover, working together like this gave a big boost to the trainee team spirit from which we will benefit long after this data.camp.

data.camp


The current Trainees of Oct. 20

Daniel, Louis, Thanos, Simon, Sebastian, Julian, Luca (on the laptop screen)

<a href="https://www.alexanderthamm.com/en/blog/author/julian/" target="_self">Julian Germek</a>

Julian Germek

Julian Germek ist studierter Wirtschaftsingenieur und seit 2020 als Data Scientist bei der Alexander Thamm GmbH beschäftigt. Dort beschäftigt er sich hauptsächlich mit der Entwicklung und Produktivsetzung von Machine-Learning-Anwendungen wie Empfehlungsdiensten und Zeitreihenprognosen. Sein Studienhintergrund erlaubt es ihm, auch fortgeschrittenes technisches oder wirtschaftliches Domänenwissen in seine Anwendungen zu integrieren.

Karriere

Data Navigator Newsletter