NumPy, SciPy and SciKits
Python provides a framework on which numerical and scientific data processing can be built. As part of our short course on Python for Physics and Astronomy we will look at the capabilities of the NumPy, SciPy and SciKits packages. This is a brief overview with a few examples drawn primarily from the excellent but short introductory book SciPy and NumPy by Eli Bressert (O'Reilly 2012).
NumPy adds arrays and linear albegra to Python, with special functions, transformations, the ability to operate on all elements of an array in one stroke. Remember that in addition to the web resources, in interactive Python try help() for more information on a function:
will tell you that arange takes one required argument but has several optional ones:
numpy.arange = arange(...) arange([start,] stop[, step,], dtype=None Return evenly spaced values within a given interval.
Let's see how to create and manipulate arrays with arange and other functions.
Arrays are at the heart of NumPy. The program
import numpy as np mylist = [1, 3, 5, 7, 11, 13] myarray = np.array(mylist) print myarray
creates a list and makes an array from it. You can create an array of 30 32-bit floating point zeros with
myarray = np.zeros(30, dtype=np.float32)
The dtype argument is optional (defaults to float64) and can be
- unit8, 16, 32, 64
- int8, 16, 32, 64
- float16, 32, 64, 128
- complex64, 128
Arrays may also have character data, as in this
test = np.array(['a very','friendly','dog'])
which will be a dtype of
meaning a string of 8 characters (set in this instance by the "friendly" element of the list). Arrays must have all elements the same size.
Arrays can be arranged to be multi-dimensional. In our example of zeros, we could instead have
myarray = np.zeros((3,4,5)) print myarray
which will show
array([[[ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.]],
[[ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.]],
[[ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.]]])
Notice that the array is created with np.zeros(), and the argument is (3,4,5). This means an array of 3x4x5 elements, all zero (float64, by default), grouped as 3 elements, each of 4 elements, each of 5 elements.
An array can be reshaped after it is made. The array stays the same way in memory, but the grouping changes.
myarray = np.array( [1,3,5,7,11,13] )
makes a linear array of 6 elements, and
changes its shape, so it will be
array([[ 1, 3, 5], [ 7, 11, 13]])
To create an array of 1000 elements from 0 to 999, the function
my_1d_array = np.arange(1000)
It is reshaped into an array 10x10x10 with
my_3d_array = my_1d_array.reshape((10,10,10))
or an multi-dimensional array may be flattened
my_new_array = my_3d_array.ravel()
The original data remain unchanged, but they are sliced and indexed in different ways depending on the shape of the array and the choices for indexing it.
In summary, NumPy arrays may be created with
np.array() np.arange() np.zeros() np.empty()
written to files in a binary format, and read back with
np.save('myfile.npy', myarray) newarray = np.load('myfile.npy')
In the following we will see how to index within arrays, operate on arrays, and have arrays functionally combine with one another.
Once an array is created, you can refer to it as a whole, or to its elements one-by-one. Create a list, turn it into an array, reshape it, and print it with something like this
#Make a list mylist = [1,2,3,4,5,6,7,8,9,10,11,12] #Make an array from a list myarray = np.array(mylist)
#Reshape the array my3by4 = myarray.reshape(3,4) print my3by4
array([[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12]])
The (2,3) element in the reshaped array is
Remember that indexing is from 0, so the "2" here means the 3 "group", and each group had 4 elements. The "3" means the fourth element of that group, which is 12.
Therefore, the notation is just like matrices, with the first index the row and the second index the column, enumerated from (0,0) at the upper left as printed.
Lists in Python are indexed similarly. Make a grouped list this way
alist = [[2,4],[6,8]] print alist
as expected. With NumPy you can also get access to a selected column --
arr = np.array(alist) print arr
shows "4". The way to have column 1 alone is
shows us a list of column 1,
shows the list of row 1,
It's also possible to find elements conditionally. Let's make an array of 16 elements from 0 to 15, shape it to 4x4, and find those elements greater than 5
# Create the array myarray = np.arange(16).reshape(4,4) print myarray
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]])
# Index the array myindex = np.where(myarray <5) print(myindex)
(array([0, 0, 0, 0, 1]), array([0, 1, 2, 3, 0]))
mynewarray = myarray[myindex] print mynewarray
array([0, 1, 2, 3, 4])
Functions broadcasting over an array
The term broadcasting in NumPy refers to applying an operation to every element of an array. Consider this simple example
cube = np.zeros((2,2,2))
that makes a 2x2x2 cube of zeros. You can add 1 to every element, then divide them all by 3, with
newcube = (cube + 1.)/3. print newcube
array([[[ 0.33333333, 0.33333333], [ 0.33333333, 0.33333333]],
[[ 0.33333333, 0.33333333], [ 0.33333333, 0.33333333]]])
Perhaps more interestingly, create an array of multiples of Pi with
manypi = np.arange(8)*np.pi/4.
and then find the sine of these angles
manysines = np.sin(manypi)
The first one will produce the array of angles
array([ 0. , 0.78539816, 1.57079633, 2.35619449, 3.14159265, 3.92699082, 4.71238898, 5.49778714])
and the second will give the array of their sines
array([ 0.00000000e+00, 7.07106781e-01, 1.00000000e+00, 7.07106781e-01, 1.22464680e-16, -7.07106781e-01, -1.00000000e+00, -7.07106781e-01])
NumPy math has the usual trigonometric functions, and others. A full list is in the documentation, and some useful ones are for element-by-element
- sin(x), cos(x), tan(x)
- arcsin(x), arccos(x), arctan(x)
- arctan2(x,y) returns the angle whose tangent is the ratio of x and y
- sinh(x), cosh(x), tanh(x)
- arcsinh(x), arccosh(x), arctanh(x)
- degrees(x) or rad2deg(x) convert from radians
- radians(x) or deg2rad(x) convert from degrees
- rint(x) round to the nearest integer
- fix(x) round to the nearest integer toward zero
The broadcast feature of NumPy arrays can be combined with selective indexing in an interesting way. Consider this program to make an array of random numbers in a Gaussian distribution with mean 0, and sigma 1
a = np.random.randn(100)
Print this will show 100 values, some negative, some positive. We can select only those larger than a cutoff, work with them, and put them back in the array
index = a > 0.75 b = a[index] b= b ** 3 - 100. a[index] = b
Printing index will show an array of True and False that mask the original array. The array b will only have those elements that satisified the condition used to make the index. The index is used again, to put modified elements back into the original array.
Matrix and vector math in NumPy
We have seen that NumPy has its own set of math functions that broadcast over arrays. If you define an array
a = np.arange(0,1,0.01)*np.pi
you will have an array from 0 to Pi in steps of 0.01*Pi. You can operate on this array, element by element, with np.cos
b = np.cos(a)
and have a new array that contains the cosine of each element of the first one.
Before we move on to SciPy, which builds on NumPy to add a number of useful features, there are two things we should note.
First, NumPy includes matrix math, either by mapping its native arrives to conventional matrices, or by invoking matrix math operations on its arrays. We will only look at this second way, which handles most applications and creates very clear code.
The product of two matrices is done with the "dot" operation between two NumPy arrays.
a = np.array(([1,2],[3,4])) b = np.array(([0,1],[1,0]))
makes two arrays that may be thought of as 2x2 matrices. To multiply them in the usual sense of matrix multiplication,
c = a.dot(b)
array([[2, 1], [4, 3]])
d = b.dot(a)
array([[3, 4], [1, 2]])
A matrix inverse is found in the linalg package of NumPy
e = np.linalg.inv(a)
With this we can solve linear equations. Suppose that you have the system of 3 equations in 3 unknowns:
2x + y = 1 3y + z = 1 x + 4z =
We would write
a = np.array([[2, 1, 0], [0, 3, 1], [1, 0, 4]]) b = np.array([,,])
The linear algebra formulation is a.X = b, with solution X = a^-1 . b where a^-1 is the inverse of a. In NumPy we would find
x = np.linalg.inv(a).dot(b) print x
which as a program returns the solution
array([[ 0.36], [ 0.28], [ 0.16]])
NumPy also supports vector math with np.cross. For example if
a = [1, 1, 0] b = [0, 0, 1] c = np.cross(a, b) print c array([ 1, -1, 0])
d = np.dot(a, b) print d
Or, to illustrate the dot product in another example
a = np.array([1,0,0]) b = np.array([2,1,3]) d = np.dot(a,b) print d
For 1-D vectors this is the same as the inner product
d = np.inner(a,b) print d
Fourier Transforms in NumPy
NumPy provides Fourier Transforms in several functions, including the one-dimension discrete Fast Fourier Transform or FFT with the function fft(a), and the one-dimensional FFT of real data with rfft(a). It also has n-dimensional Fourier Transforms as well.
As a very simple example, let's consider an exponential decay modulating a sine wave adapted from our dampled cosine plotting example to use NumPy arrays:
# Import the plotting and math packages import matplotlib.pyplot as plt import numpy as np
# Define initial constants f0 = 5. a0 = 100. tdecay = 2. timestep = 0.01 timemax = 100.
time = np.arange(0,timemax,timestep) amplitude = np.exp(-time/tdecay)*np.cos(2.*np.pi*f0*time)
With NumPy we do not have to loop to calculate each point of the damped oscillator's path.
Now we add a Fourier Transform with lines
# Use an FFT to calculate its spectrum spectrum = np.fft.fft(amplitude) # Find the positive frequencies frequency = np.fft.fftfreq( spectrum.size, d = timestep ) index = np.where(frequency >= 0.) # Scale the real part of the FFT and clip the data clipped_spectrum = timestep*spectrum[index].real clipped_frequency = frequency[index]
The array "spectrum" has the spectrum of the original data, and "frequency" has the corresponding frequencies. We clip the negative frequencies.
# Create a figure fig = plt.figure()
# Adjust white space between plots fig.subplots_adjust(hspace=0.5)
# Create x-y plots of the amplitude and transform with labeled axes
data1 = fig.add_subplot(2,1,1) plt.xlabel('Time') plt.ylabel('Amplitude') plt.title('Damping') data1.plot(time,amplitude, color='red', label='Amplitude') plt.legend() plt.minorticks_on() plt.xlim(0., 10.)
data2 = fig.add_subplot(2,1,2) plt.xlabel('Frequency') plt.ylabel('Signal') plt.title('Spectrum of a Damped Oscillator') data2.plot(clipped_frequency,clipped_spectrum, color='blue', linestyle='solid', marker='None', label='FFT', linewidth=1.5) plt.legend() plt.minorticks_on() plt.xlim(3., 7.)
# Show the data plt.show()
This will make an interactive plot that looks like this