Pandas (Python)

What is Pandas?

Pandas is a widespread Open source library for data analysis and manipulation in the programming language Python. Due to the fast, flexible and meaningful data structures and efficient data analysis tools, it is often used in the areas of Data Science, machine learning and Deep Learning used.

It provides a fast and expressive way to manipulate and analyse structured data and is easy to learn for anyone familiar with Python programming. The integration with other libraries such as NumPy, Matplotlib and Seaborn also makes it a complete solution for the Data analysis and visualisation in Python.

The name "Pandas" is derived from the term "Panel Data", which refers to multi-dimensional data structures often used in econometrics. The library provides two primary data structures: Series (1-dimensional) and DataFrame (2-dimensional), both of which allow the display and manipulation of labelled data.

Examples of the use of Pandas (Python)

Data cleansing

Pandas offers various functions for cleaning and pre-processing data, such as handling missing values, converting data types, removing duplicates and handling outliers.

import pandas as pd
# Load a sample dataset with missing values
df = pd.read_csv("data.csv")
# Handle missing values by replacing them with the mean of the column
df.fillna(df.mean(), inplace=True)
# Convert column data type from string to integer
df['column_name'] = df['column_name'].astype(int)
# Remove duplicates from the DataFrame
df.drop_duplicates(inplace=True)
# Handle outliers by replacing them with the 95th percentile of the column
upper_bound = df['column_name'].quantile(0.95)
df['column_name'] = df['column_name'].apply(lambda x: upper_bound if x > upper_bound else x)

Data aggregation

It is possible to perform various operations to aggregate and summarise data, such as groupby, pivot tables and resampling. These operations can be helpful in transforming raw data into useful insights.

import pandas as pd
# Load a sample dataset
df = pd.read_csv("data.csv")
# Group the data by a categorical column and calculate the mean of a numeric column
grouped = df.groupby('column_name')['numeric_column'].mean()
# Create a pivot table to aggregate the data
pivot = df.pivot_table(index='column_name', values='numeric_column', aggfunc='mean')
# Resample the data to aggregate it by week
resampled = df.resample('W', on='date_column').sum()

Data visualisation

Pandas integrates well with popular data visualisation libraries such as Matplotlib, Seaborn and Plotly. This makes it easy to create bar charts, histograms, scatter plots and more to visualise and communicate insights from your data.

import pandas as pd
import matplotlib.pyplot as plt
# Load a sample dataset
df = pd.read_csv("data.csv")
# Create a bar plot to visualize the distribution of a categorical column
df['column_name'].value_counts().plot(kind='bar')
plt.show()
# Create a histogram to visualize the distribution of a numeric column
df['numeric_column'].plot(kind='hist')
plt.show()
# Create a scatter plot to visualize the relationship between two numeric columns
df.plot(x='numeric_column_1', y='numeric_column_2', kind='scatter')
plt.show()

Pandas vs. NumPy

Pandas differs in some respects from NumPy, another popular library for numerical calculations in Python. While NumPy provides basic numerical operations, Pandas offers more advanced data analysis and manipulation capabilities. NumPy works mainly with arrays, while Pandas works with rows and data frames that are labelled and allow mixed data types. Also, unlike NumPy, Pandas offers built-in handling of missing values.

Pandas vs. SQL

A significant difference between Pandas and SQL is that Pandas is a library for in-memory data processing, while SQL is a language for accessing and manipulating data stored in databases. SQL is better suited for working with large, persistently stored data sets, while Pandas is more flexible for fast and efficient data manipulation, exploration and analysis.

Application Programming Interface (API)

What is an application programming interface (API)?

API stands for Application Programming Interface and refers to a programming interface that enables the Communication between different applications enables. External programmes can gain access to certain components of a software via an API and transfer data.

Unlike with a binary interface, the programme connection takes place at source code level takes place. Operations are carried out via standard commands so that compatibility with different programming languages is guaranteed. Among other things, an API can be based on Databaseshard disks, graphics cards and user interfaces.

The advantage of a programming interface is the Easy integration of new application components into an existing system. In addition, APIs are usually documented in detail with their associated parameters.

How does an API work?

Programming interfaces (API) are used in particular by developers to allow their programmes to dock with another. A programming interface specifies how data can be received and sent. The commands and data types that an API accepts are defined in protocols. They are used by the corresponding components for uniform communication.

A basic distinction is made between Internal/private APIs and external/open APIs. Private programming interfaces can only be used by programmers within an organisation. This optimises work on internal company processes. In addition, they are protected from unauthorised access by certain security measures. External APIs are available to the public in directories for integration into other systems. However, sometimes the use of an API is restricted or subject to a fee.

Application areas for programming interfaces

APIs can be used to connect a wide range of processes:

Weather forecast

Global weather data from a wide range of international sources are retrieved via programming interfaces and can be displayed to the user via app on the smartphone.

Appointment booking

Service providers can use APIs to enable their customers to make bookings on online portals or search for specific services. These can be, for example, appointment information at doctors' surgeries or the comparison of flight prices. The website connects to the programming interfaces of the respective service providers and generates an overview with the most suitable options.

E-commerce

Retailers use APIs to control the inventory of their products and provide customers with information about availability.

What is the difference between API and REST API?

REST is an abbreviation for Representational State Transfer and refers to a software architecture, which is guided by the principles and behaviour of the World Wide Web. A REST API is a specific form of an APIused for data transfer on distributed systems. Compared to a general API, the REST architecture has the following features six design principleswhich must be adhered to by developers:

Uniform programming interface

The resources are accessible via a specific Uniform Resource Identifier (URI). Different operations can be performed using HTTP methods via the same URI. Suitable formats for resources are, for example, JSON, XML or text.

Independence of client and server

Client and server applications must be decoupled from each other. The client should need nothing more than the URI of the respective resource.

Cache

To increase the scalability of the server and improve the performance of the client, resources can be stored in the cache.

Statelessness

Rest APIs do not require information about sessions. If the server requires data about the client session, this is sent via a separate request.

Multi-layer system architecture

Between the client and the server, there may be a number of other applications that communicate with each other. The client cannot see through how many servers the response was transmitted.

Code on demand (optional)

In most cases, static resources are transferred via REST APIs. Sometimes, however, it can also be executable code such as Java applets. This should only be executed on demand.

PyTorch

PyTorch is a Open source framework for Machine Learning (machine learning) and is based on the programming language Python and the Torch library. It was developed in 2016 by a team of researchers for artificial intelligence by Facebook to improve the efficiency of developing and deploying research prototypes. PyTorch computes with tensors, which are accelerated by graphics processors (GPU for short). Over 200 different mathematical operations can be used with the framework.

Today, PyTorch is one of the most popular platforms for research in the field of Deep Learning and is mainly used for artificial intelligence (AI), data science and research. PyTorch is becoming increasingly popular because it makes it comparatively easy to create models for artificial neural networks (KNN) have created. PyTorch can also be used for reinforcement learning. It can be downloaded free of charge as open source from GitHub.

What is PyTorch Lightning?

PyTorch Lightning is a Open source library for Python and provides a high-level interface for PyTorch. The focus is on flexibility and performance to enable researchers, data scientists and machine learning engineers to create suitable and, most importantly, scalable ML systems. PyTorch Lightning is also available as open source for download from GitHub.

What are the features and benefits of PyTorch?

Dynamic graph calculation

The network behaviour can be changed spontaneously and the complete code does not have to be executed for this.

Automatic differentiation

Using backward sweeps in neural networks, the derivative of a function is calculated numerically.

User-friendly interface

It is called TorchScript and makes seamless switching between modes possible. It offers functionality, speed, flexibility and ease of use.

Python support

Since PyTorch is based on Python, it is easy to learn and programme and all libraries compatible with Python, such as NumPy or SciPy, can be used. Furthermore, uncomplicated debugging with Python tools is possible.

Scalability

It takes place on important Cloud platforms a good support and is therefore easy to scale.

Dataset and DataLoader

It is possible to create your own dataset for PyTorch to store all the necessary data. The dataset is managed by means of DataLoader. Among other things, the DataLoader can run through the data, manage batches and transform data.

In addition, PyTorch can export learning models in the Open Neural Network Exchange (ONNX) standard format and has a C++ front-end interface option.

What are examples of the use of PyTorch?

  • Object detection
  • Segmentation (semantic segmentation)
  • LSTM (Long Short-Term Memory)
  • Transformer

PyTorch vs. Tensorflow

Tensorflow is also a deep learning framework and was developed by Google. It has been around longer than PyTorch and therefore has a larger developer community and more documentation. Both frameworks have their advantages and disadvantages, as they are intended for different projects.

While Tensorflow defines the computational graphs in a static way, PyTorch takes a dynamic approach. Also, the dynamic graphs can be manipulated in real time with PyTorch and only at the end with Tensorflow. Therefore, PyTorch is particularly suitable for uncomplicated prototyping and research work due to its simple and easy handling. Tensorflow, on the other hand, is particularly suitable for projects that require scalable production models.

PyTorch vs. scikit-learn

Scikit-learn (also called Sklearn) is a free library for Python and specialises in machine learning. It offers a range of Classification-, Regression- and Clustering algorithms, such as Random Forest, Support vector machines or k-means. Scikit-learn provides an efficient and straightforward Data analysis and is particularly suitable for defining algorithms, but is rather unsuitable for end-to-end training of deep neural networks, for which, on the other hand, PyTorch can be used very well.

Pythia (Software)

The computer programme Pythia is used in particle physics. Here it is used to simulate or generate collisions at particle accelerators such as CERN. The software is thus at the same time the most frequently used Monte Carlo event generator. The calculations are based on the algorithms of probability theory.

Random samples of a distribution are drawn with random experiments via the Pythia software. This is particularly useful for finding out which signals are noticeable at the particle accelerator when physics models deviate from the standard. It makes sense to simulate the models numerically in advance via the Pythia software.

For which simulations is the Pythia software used?

A classic field of application for Pythia software is particle physics with its most diverse areas of application. For example, if a physics model predicts a new particle, assumptions can be made upstream. Before the experiment phase, they help to obtain clues about the signals that are sought in the experiment. If necessary, the detectors of the simulation can be optimised for this.

Basic considerations that can be made in advance:

  • Which particles can be produced and how should they decay in the model?
  • How complex and limited is the measurability of decay products?

The simulation with the Pythia software creates a clear signal. This describes the number and momentum of the incoming particles from the collision. The aim is for the detectors to detect the particles created as part of the acceleration experiment.

Experiments in particle accelerators are among the most important sources for discovering new physical phenomena. It gets difficult because the experiments have become bigger and bigger over the decades. This is where programmes like the Pythia software help in the search for new particles. The Pythia software has an extensive portfolio of settings and functionalities. Pythia enables the simulation of different scenarios in the exploration of physical events.

In the resulting application, protons are accelerated to an enormous speed in particle accelerators. They collide with each other in one of the detectors. In the process, the energy contained in the particle is converted into new particles in an extremely short space of time. When they hit the detectors, the data analysis begins. Thus, from here on, traces are evaluated and tracks are tried to be read in order to reconstruct the events of the collision.

The amounts of data generated in the process are exorbitant. For years, physics has been in the process of artificial intelligence to classify, sort and order.

First order logic

What is first-level predicate logic?

First-order logic (FOL) is a method based on mathematics for assigning unique properties to an object. Here, each sentence/statement is decomposed into its subject and its predicate. The relationship between them is done in first-level predicate logic by P(x), where P stands for predicate and the variable x for the corresponding subject.

It should be noted that the Predicates in First-Order Logic refer to only one subject at a time.. Unlike in linguistics, a predicate is not necessarily a verb, but merely provides relevant information about the subject in question. The use of the Predicates also allow relations to be established; for example, through comparisons (greater/smaller than, equal to, etc.).

In the first-level predicate logic, the Quantifiers and represented by the symbols ∀ (universal quantifier; read: "for all") and ∃ (existential quantifier; read: "it exists" or "for some"). The representation is done in First-Order Logic by mathematical symbols and consists of:

  • Terms: Human, animal, plant etc.
  • Names of objects. In the linguistic sense, these can be both objects and subjects!
  • Variables a, b, c, ..., x, y, z etc.

These stand for objects that are not yet known.

Predicates [red, fragrant, is a flower etc.] stand for properties and relations that are linguistically comparable to verbs or attributes.

Quantifiers [∀, ∃] allow statements about sets of objects for which the predicate applies.

Relations [∧ (and), ∨ (or), →(implies), ⇒ (follows from), ⇔ (is equivalent to), == (equality - operator)] give conclusions about relations.

Example of first level predicate logic

The rose is red.

P(x) = red(rose)

The rose is fragrant.

P(x) = fragrant(rose)

The rose is a flower.

P(x) = Flower(Rose)

We learn about the rose that it red is, Smells and a Flower is.

This results in ∀:

All Roses are red.

All Roses fragrant.

All Roses are Flowers.

However, not all roses are red and not every rose is fragrant.

That all roses are flowers, on the other hand, is a true statement.

∀(x) Rose(x) → Flower(x)

In order that the other two statements can be checked for their correctness, existential quantifiers are now used.

From the two statements:

"All the roses are red." and "All the roses are fragrant." are made by using ∃:

"Some roses are red." and "Some roses are fragrant."

To translate it into a first-order formula, we need to define a variable x:

A predicate A(x), where x the Rose and a predicate G(x), which corresponds for x is, red resp. Smells.

∃(x) Rose(x) → red(x)

resp.

∃(x) Rose(x) → fragrant(x)

This tells you that there are roses that are red are and roses exist that fragrant. It follows logically that there must also be roses that are not red or that are not fragrant.