What is NumPy?

NumPy (short for Numerical Python) is a powerful Library for performing mathematical operations on large data fields. She is in Python and builds on Python's numerical library, making it easy to integrate with other Python libraries and tools. NumPy is widely used in the scientific and data science community and is considered a Fundamental tool for many data-intensive applications.

One of the main features of NumPy is the Ability to work with arrays of data. An array is a data structure that stores a collection of elements of the same type in a contiguous block of memory. NumPy arrays are similar to Python lists, but they are much more efficient for certain types of operations, such as mathematical calculations.

NumPy is often used to perform mathematical operations on large arrays of data. It is also commonly used for other tasks such as reshaping, flattening and appending arrays.

Examples for the use of NumPy

Creating a NumPy array

A NumPy array can be created from a Python list using the numpy.array() function. For example, an array of 10 evenly spaced values between 0 and 1 is generated using the numpy.linspace() function.

import numpy as np
a = np.linspace(0, 1, 10)
print(a)

Transforming an array

The shape of a NumPy array can be changed with the reshape() function. For example, a 1D array with 10 elements can be converted into a 2D array with 5 rows and 2 columns.

import numpy as np
a = np.arange(10)
print(a)
b = a.reshape(5, 2)
print(b)

Perform mathematical operations with arrays

Mathematical operations can be performed with arrays, e.g. addition, subtraction, multiplication and division, by using the standard mathematical operators. For example, two arrays can be added and the sum of all elements in an array can be calculated.

import numpy as np
a = np.array([1, 2, 3, 4, 5])
b = np.array([5, 4, 3, 2, 1])
c = a + b
print(c)
print(c.sum())

Random choices

In NumPy, the numpy.random.choice() function can be used to randomly select elements from an array or a given 1-D array-like object.

Here is an example to randomly select three elements from an array:

import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
b = np.random.choice(a, size=3, replace=False)
print(b)

The above code randomly selects 3 elements from the array "a" without replacing them and assigns them to the variable "b".

The probability can also be specified for the selection of each element with the parameter p. For example, if three elements are to be randomly selected from the array "a", but the element with the value 5 is to have a higher probability of being selected, you can use the following code:

import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
p = [0.1, 0.1, 0.1, 0.1, 0.3, 0.1, 0.1, 0.1, 0.1, 0.1]
b = np.random.choice(a, size=3, p=p)
print(b)

In this example, the element with the value 5 has a probability of 30% of being selected, while all other elements have a probability of 10%.

Another useful function for random selection is numpy.random.shuffle, which shuffles the array along the first axis of a multidimensional array. This function changes the input in place and returns None.

import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
np.random.shuffle(a)
print(a)

NumPy vs. Python

NumPy is faster than Python listsbecause it uses a more efficient array storage layout that allows faster access to elements. In addition, NumPy offers a Wide range of built-in mathematical functionswhich are optimised for use with arrays.

NumPy vs. Pandas

Compared to pandas NumPy a lower level library, which focuses on providing efficient array operations. Pandas builds on NumPy and offers additional functions for working with tabular data, such as data frames and series. While NumPy is useful for performing mathematical operations on large arrays of data, Pandas is suitable for data manipulation and analysis tasks.

NumPy is not as easy to learn as Pandas, as Pandas provides a high-level interface for working with tabular data that is more user-friendly than NumPy's array operations. However, NumPy is a fundamental tool for many data-intensive applications and is widely used in the scientific and data science community.

In terms of performance NumPy faster than Pandaswhen it comes to performing array operations. However, Pandas offers additional features for working with tabular data that can make it slower for certain types of operations.