Difference between revisions of "NumPy, SciPy and SciKits"

From AstroEd
Jump to navigation Jump to search
Line 88: Line 88:
 
  my_new_array = my_3d_array.ravel()
 
  my_new_array = my_3d_array.ravel()
  
The data remain unchanged, but they are sliced and indexed in different ways depending on the shape of the array.
+
The original data remain unchanged, but they are sliced and indexed in different ways depending on the shape of the array and the choices for indexing it.
 +
 
 +
In summary, NumPy arrays may be created with
 +
 
 +
  np.arange()
 +
  np.zeros()
 +
  np.empty()
 +
 
 +
reshaped with
 +
 
 +
  myarray.reshape()
 +
  myarray.ravel()
 +
 
 +
written to files in a binary format, and read back with
 +
 
 +
  np.save('myfile.npy', myarray)
 +
  newarray = np.load('myfile.npy')
 +
 
 +
In the following we will see how to index within arrays, operate on arrays, and have arrays functionally combine with one another.
  
 
=== Indexing ===
 
=== Indexing ===

Revision as of 05:20, 21 February 2013

Python provides a framework on which numerical and scientific data processing can be built. As part of our short course on Python for Physics and Astronomy we will look at the capabilities of the NumPy, SciPy and SciKits packages. This is a brief overview with a few examples drawn primarily from the excellent but short introductory book SciPy and NumPy by Eli Bressert (O'Reilly 2012).

NumPy

NumPy adds arrays and linear albegra to Python, with special functions, transformations, the ability to operate on all elements of an array in one stroke.

Arrays

Arrays are at the heart of NumPy. The program

import numpy as np

mylist = [1, 3, 5, 7, 11, 13]
myarray = np.array(mylist)
print myarray

creates a list and makes an array from it. You can create an array of 30 32-bit floating point zeros with

myarray = np.zeros(30, dtype=np.float32)

The dtype argument is optional (defaults to float64) and can be

  • unit8, 16, 32, 64
  • int8, 16, 32, 64
  • float16, 32, 64, 128
  • complex64, 128

Arrays may also have character data, as in this

test = np.array(['a very','friendly','dog'])

which will a dtype of

test.dtype
dtype('|S8')

meaning a string of 8 characters (set in this instance by the "friendly" element of the list). Arrays must have all elements the same size.


Arrays can be arranged to be multi-dimensional. In our example of zeros, we could instead have

myarray = np.zeros((3,4,5))
print myarray

which will show

array([[[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]],
      [[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]],
      [[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]]])

Notice that the array is created with np.zeros(), and the argument is (3,4,5). This means an array of 3x4x5 elements, all zero (float64, by default), grouped as 3 elements, each of 4 elements, each of 5 elements.

An array can be reshaped after it is made. The array stays the same way in memory, but the grouping changes.

myarray = np.array( [1,3,5,7,11,13] )

makes a linear array of 6 elements, and

myarray.reshape(2,3)
 

changes its shape, so it will be

array([[ 1,  3,  5],
      [ 7, 11, 13]])

To create an array of 1000 elements from 0 to 999, the function

my_1d_array = np.arange(1000)

It is reshaped into an array 10x10x10 with

my_3d_array = my_1d_array.reshape((10,10,10))

or an multi-dimensional array may be flattened

my_new_array = my_3d_array.ravel()

The original data remain unchanged, but they are sliced and indexed in different ways depending on the shape of the array and the choices for indexing it.

In summary, NumPy arrays may be created with

 np.arange()
 np.zeros()
 np.empty()

reshaped with

 myarray.reshape()
 myarray.ravel()

written to files in a binary format, and read back with

 np.save('myfile.npy', myarray)
 newarray = np.load('myfile.npy')

In the following we will see how to index within arrays, operate on arrays, and have arrays functionally combine with one another.

Indexing

Once an array is created, you can refer to it as a whole, or to its elements one-by-one. Create a list, turn it into an array, reshape it, and print it with something like this

#Make a list
mylist = [1,2,3,4,5,6,7,8,9,10,11,12]

#Make an array from a list
myarray = np.array(mylist)
#Reshape the array
my3by4 = myarray.reshape(3,4)
print my3by4

to see

array([[ 1,  2,  3,  4],
      [ 5,  6,  7,  8],
      [ 9, 10, 11, 12]])

The (2,3) element in the reshaped array is

print my3by4[2][3]
12

Remember that indexing is from 0, so the "2" here means the 3 "group", and each group had 4 elements. The "3" means the fourth element of that group, which is 12.

Therefore, the notation is just like matrices, with the first index the row and the second index the column, enumerated from (0,0) at the upper left as printed.


Lists in Python are indexed similarly. Make a grouped list this way

alist = [[2,4],[6,8]]
print alist[0][1]

displays

4

as expected. With NumPy you can also get access to a selected column --

arr = np.array(alist)
print arr[0][1]

shows "4". The way to have column 1 alone is

print arr[:,1]

shows us a list of column 1,

[4 8]

and

print arr[1,:]

shows the list of row 1,

[6 8]


It's also possible to find elements conditionally. Let's make an array of 16 elements from 0 to 15, shape it to 4x4, and find those elements greater than 5

# Create the array 
myarray = np.arange(16).reshape(4,4)
print myarray 


array([[ 0,  1,  2,  3],
      [ 4,  5,  6,  7],
      [ 8,  9, 10, 11],
      [12, 13, 14, 15]])


# Index the array
myindex = np.where(myarray <5)
print(myindex)


(array([0, 0, 0, 0, 1]), array([0, 1, 2, 3, 0]))


mynewarray = myarray[myindex]
print mynewarray


array([0, 1, 2, 3, 4])

Functions broadcasting over an array

The term broadcasting in NumPy refers to applying an operation to every element of an array. Consider this simple example

cube = np.zeros((2,2,2))

that makes a 2x2x2 cube of zeros. You can add 1 to every element, then divide them all by 3, with

newcube = (cube + 1.)/3.
print newcube
array([[[ 0.33333333,  0.33333333],
       [ 0.33333333,  0.33333333]],
      [[ 0.33333333,  0.33333333],
       [ 0.33333333,  0.33333333]]])

Perhaps more interestingly, create an array of multiples of Pi with

manypi = np.arange(8)*np.pi/4.

and then find the sine of these angles

manysines = np.sin(manypi)

The first one will produce the array of angles

array([ 0.        ,  0.78539816,  1.57079633,  2.35619449,  3.14159265,
       3.92699082,  4.71238898,  5.49778714])

and the second will give the array of their sines

array([  0.00000000e+00,   7.07106781e-01,   1.00000000e+00,
        7.07106781e-01,   1.22464680e-16,  -7.07106781e-01,
       -1.00000000e+00,  -7.07106781e-01])


NumPy math has the usual trigonometric functions, and others. A full list is in the documentation, and some useful ones are for element-by-element

  • add(x1,x2)
  • sub(x1,x2)
  • multiply(x1,x2)
  • divide(x1,x2)
  • power(x1,x2)
  • square(x)
  • sqrt(x)
  • exp(x)
  • log(x)
  • log10(x)
  • absolute(x)
  • negative(x)
  • sin(x), cos(x), tan(x)
  • arcsin(x), arccos(x), arctan(x)
  • arctan2(x,y) returns the angle whose tangent is the ratio of x and y
  • sinh(x), cosh(x), tanh(x)
  • arcsinh(x), arccosh(x), arctanh(x)
  • degrees(x) or rad2deg(x) convert from radians
  • radians(x) or deg2rad(x) convert from degrees
  • rint(x) round to the nearest integer
  • fix(x) round to the nearest integer toward zero


The broadcast feature of NumPy arrays can be combined with selective indexing in an interesting way. Consider this program to make an array of random numbers in a Gaussian distribution with mean 0, and sigma 1

a = np.random.randn(100)

Print this will show 100 values, some negative, some positive. We can select only those larger than a cutoff, work with them, and put them back in the array

index = a > 0.75
b = a[index]
b= b ** 3 - 100.
a[index] = b

Printing index will show an array of True and False that mask the original array. The array b will only those elements that satisified the condition used to make the index. The index is used again, to put modified elements back into the original array.