What are Computational Statistics?

The Computational Statistics field is the point of contact between information technology and statistics. Behind the term is an essential area of Data Sciencewhich currently enjoys a great deal of attention in a wide variety of application fields and will certainly continue to do so in the future, be it for Google PageRank, spam filters in e-mail inboxes or in the context of Big data analyses.

In addition to data science, computational statistics is also subordinate to simulation science; this generally involves recreating experiments in order to minimise the amount of work involved in research or to make experiments possible in the first place.

Computational statistics is often equated with statistical computing. In fact, the former is mainly about implementing algorithms in applications; in statistical computing it is the other way around and concepts from computer science are applied to statistics.

Important methods:

  • The Markov chain is a stochastic process that is used in a wide variety of fields: Economists use it to optimise traffic systems, in financial mathematics it is used to model share prices and online marketers use it to create texts; even the popular board game Monopoly can be understood as a Markov chain. In simplified terms, this mathematical method looks at the development of random systems over time. In other words, a sequence of dice rolls whose respective dice result is, of course, independent of the previous dice roll. Using Monopoly as an example, this process could now be used to determine how likely certain game scenarios are.
  • The Monte Carlo simulation makes it possible to carry out statistical studies that would be impossible or very costly in other ways. If, for example, the average height of a person is to be determined, one could measure all the citizens of the earth and divide the sum by the world population - this is an impossible undertaking. In the Monte Carlo simulation, a smaller number of people are randomly selected, which keeps the workload low. The more measurements are made, the closer one gets to the real result - the reason for this is the law of large numbers. Monte Carlo simulation is also used in many areas: climate models predict the weather, for example, companies use it to weigh up risks and production processes in manufacturing plants are optimised with the help of this method.
  • The Maximum likelihood method is a universally applicable estimation method - in bioinformatics it is considered a standard procedure. Like Monte Carlo simulation, the maximum likelihood method is used to keep the effort as low as possible. This means: If you want to try out different parameters for a statistic, but there are no measurements for them, the maximum likelihood method is used to determine the parameter that most likely leads to the desired result.

What role do computational statistics play in the development of new technologies?

Computer-aided statistics is made up of various components. Based on the mathematical principles of probability, distribution, estimation and inference, methods (such as the Markov chain) are used to process data. Those who work in this field have mastered the procedures of statistics and their digital implementation.

In the future, work with computational statistics will play a role more than ever. Especially Areas of the Digitisation are mostly supplemented by computer-assisted statistics. In the field of autonomous driving For example, there is an urgent need for statistics; as safety is the primary concern in public road transport, computerised statistics are essential. The Nanotechnology and the medical sector in general will continue to rely on methods such as the maximum likelihood method to conduct research on DNA threads.

The Fields of technologisation require analysis by computer-aided statistics, be it the virtual reality, blockchain or the artificial intelligence.

An example of computerised statistics in the development of new technologies is an online platform for rental flats. Since the company was founded, there was the problem of the countless variables that make it difficult for landlords to set prices. Therefore, from the beginning, they relied on Data Science to calculate price suggestions for their clients. These suggestions reduce the workload for the landlord and thus make it less difficult to place an ad for the vacant flat. In turn, the resulting increase in turnover is processed statistically. Computer-aided statistics is closely interwoven with the development of new technologies; this can be seen in this example.