Elements of Python programming

From AstroEdWiki
Jump to navigation Jump to search

At this step in our short course on Python for Physics and Astronomy you have Python running, and have seen how it works interactively and with executable files. Let's explore what we can do with simple useful programming. Some essential topics are

  • Getting data into and out of a program
  • Storing data as numbers and text
  • Accessing data efficiently in lists, tuples, and dictionaries
  • Performing logical and mathematical operations on data
  • Controlling program flow (coming up in the next section)


Input and output

Python accepts data from the command line when it starts an application, locally stored files, files or other input from the web, through ports -- typically as serial or TCPIP data, or from attached instruments that communicate through specialized device drivers.


Input from a console and keyboard

To have a Python program accept data from a user at a console, include lines like these in Python 3


newtext = input()
print newtext

to take the input as text and print it. You can prompt for the input too

newtext = input('Write what you like: ')
print 'This is what I like: ', newtext


In Python 2.7 there is also a Python command "input()" which treats incoming text as Python code. That is, if you input "1+2" it will return "3". With Python 3.0 this has changed, so some care is needed if you write for the new Python. The effect of Python 3's "input" is Python2's "raw_input". To have Python2's "input()" that evaluated the expression, in Python 3 you would explicitly ask for that outcome with eval(input()). You can see why using the newer Python 3.0 with older programs can raise some problems, though they are usually easy to fix.

The input is text, but suppose we want a number instead. If we know it's a number, then

newtext = input('Input a number >> ')
x = float(newtext)
print 'My number was ', x


should do it. But, if you try this and input text that is not a number, the program will generate an error and respond with something like this

python3 input_number.py
Input a number >> x
Traceback (most recent call last):
  File "input_number.py", line 2, in <module>
    x = float(newtext)
ValueError: could not convert string to float: x

How would we know if we have a number, given arbitrary text in the data, and avoid this error? One way is to use isdigit() --

newtext = input('Input a number >> ')
if newtext.isdigit():
  x = float(newtext)
  print ('My number was ', x)
else:
  print ('That is not a number.')

In this you see that isdigit() tests whether newtext is a number. It returns a True or False which is used by the "if" statement to control what to do with the data. We will look at such flow control more later.

You may also want to read data by splitting a line of text into pieces, for example with something like this --

newtext = input('Input the item name and quantity >>')
print newtext.split()

When to this last one you input "eggs 12", newtext.split() will return ['eggs','12']. That is, it makes a list of the items that are on the line. You can now go through that list and look for the information you want, one entry at a time.


Now for Python 3 another change is that print becomes a function that takes an argument. Where in Python 2.x we would write

 print x

in Python 3 we write

 print (x)

From here on if you see the older version of print without parentheses that's a good clue the code was written for Python 2. Here is our example updated to Python 3:

Style: Indentation, comments, and names of variables and functions

In Python it is conventional to use underscores to separate words or components in a name. You may use "my_number" for example to identify a single number, or "my_numbers" as a variable for a list or a tuple that includes more than one. Some programmers prefer so-called "camel case" names, such as "myNumber" instead. That is not recommended in Python unless you are working on a program that has already implemented it. The best guideline is to keep it clear and readable by yourself as you develop a program, and eventually by others when they try to figure out what you have done.

Indentation is enforced in Python. Start a line on the left side and avoid continuing it to the next line unless there are parentheses or brackets that clearly indicate that. The Python continuation character "\" can be used lightly and with care if the additional whitespace would not be misinterpreted. You could write

a = 1+\
  2+\
  3
print(a)
>>6

Notice the charcter goes immediately after the symbol "+" so there is no ambiguity about your intent. Also, the next line has to be indented because returning to the left margin completes the definition of a.

You could also do it this way

print (1 +
  2+
  3)
>> 6

using the implied continuation with the help of the parentheses.

However, writing this

a = 1+
  2+
  3

will generate an error.

How much to indent is your choice. The style guide recommends 4 spaces. My preference is for just 2. Some programmers like 8, the standard spacing of a tab indentation. You'll find that if you have several levels of indentation, the loss of workspace and shifting content to the right can be very annoying and hard to read. Using 2 or 4 spaces makes a readable program where your eye can follow the indentation levels easily.

Lastly, add abundant comments as you work. A comment line is one that begins with a "#" symbol Comment lines should be indented just like the program content they apply to. You can add a comment after a program element on the same line, though this is sometimes hard to read. However you do it, leaving comments will help you later when you return to an old program and have forgotten what it was doing, or why.

For the same reasons, a block of comments at the very beginning of a program should describe its function. These can be done with the same "#" comment flag, although it is common practice to use three quote marks at the start of a long block of comment text that is terminated with another set of three quotes. Long strings defined this way are called "docstrings" in Python. Since they can span many lines it is a convenience to use, but the meaning of them is not the same as a comment line. A docstring is executed and stored, not ignored. It can be used to create a large block of text you may need later, or to define text that can be used to document (hence the name) the program. Unless that is your intent, use "#" and keep it short and simple.


Input from the command line

In Linux, MacOS, or other Unix-like environments you can pass information to a program on the command line. There's a straightforward way to do this in Python using sys:

import sys
if len(sys.argv) == 1:
  sys.exit("Usage: convert_fits infile.fits outfile.fits newbitpix")
  exit()
elif len(sys.argv) == 4:
  infits = sys.argv[1]
  outfits = sys.argv[2]
  newbitpix = sys.argv[3]
else:
  sys.exit("Usage: convert_fits infile.fits outfile.fits newbitpix ")
  exit() 

Alternatively there is argparse, a standard command-line parsing module, and this example from the Python on-line tutorial to find xy:

import argparse
parser = argparse.ArgumentParser(description="calculate X to the power of Y")
group = parser.add_mutually_exclusive_group()
group.add_argument("-v", "--verbose", action="store_true")
group.add_argument("-q", "--quiet", action="store_true")
parser.add_argument("x", type=int, help="the base")
parser.add_argument("y", type=int, help="the exponent")
args = parser.parse_args()
answer = args.x**args.y


Printing to the display

When you have data or text to display, you'd use a "print" function to have the data appear on the console as the program executes. (In Windows, you may follow this with a input() so that the console will not disappear before you read it.) In Linux or MacOS, you would usually run Python from a console, and the printed information appears on the display and remains visible after the program has finished. Unix-like environments "print" to the standard output, stdout, and may be redirected to a file. For example running a Python program from the command line that generates output, you could write

python myprogram.py >> myfile.txt

and the output would go into myfile.txt instead of displaying. Similarly, output can be parsed to send the error information to a separate file

python myprogram.py 1> myfile.txt 2>myerrors.txt

sends the stdout to myfile.txt and stderr to myerrors.txt . These options are not available in Windows.


To print text the command is

print ('This will print on the screen.\n')

where the quoted (' and " have the same effect) text is sent. The '\n' is a line feed.


To print variables simply use them in the print statement

x = 1
y = 2
z = 3
h = 'Help me!!'
print (x,y,z,h,'\n')

will display the values of x, y, or z regardless of whether they are numbers or text.

1 2 3 Help me!! 

Of course, printing can be formatted. If you are familiar with C or Fortran, you'll find similarities that will help creating formatted output. In this instance, it's also helpful to remember that the "print" function is converting internal data into displayed "text", so the formatting is really a way of controlling how some text and numerical data are mapped onto text that is then displayed.

Formatting is available in both versions 2.7 and 3 in two ways

  • Formatting expressions that are like C's printf are commonly used.
  • Formatting methods are unique to Python and use operators that act on text (strings).

Before we can really use these effectively we will need to explain what we mean by "strings", "integers", "floats" and other data types. But, here are a few examples that illustrate how this works. From Mark Lutz' Learning Python we have this summary


To format strings using expressions

  1. Insert a % "operator". To the left of it, put a string that that is operated on by the instruction that immediately follows.
  2. To the right of the % and its instruction, provide the objects that are inserted into the format string on the left.

Here's an example:

'That is  %d  %s cat! % (1, 'fluffy')

which will print

That is 1 fluffy cat!
  

Here %d is an integer format in the style of C, and %s is a string format conversion code. Other common type codes

s String
d Integer "double"
i Integer
x Hex (also X for capital letters)
e Exponent (also E)
f Floating point

The general structure with formatting commands is

%[(name)][flags][width].[precision]typecode

For example, in interactive Python try

>> x=1.2345678901234567890

If you ask for "x"

>> x

Python will respond with

1.2345678901234567

to the precision of its floating point storage. You can format this by

>> '%6.2f'%x  (or with spaces for clarity, %6.2f' % x but no spaces after the first %

to which Python will respond

'  1.23'

You see that it left 6 places for the text, used a precision of 2 decimal places, and right-adjusted the text to the field. If you ask for more precision than you've allowed, Python will expand the field as needed. To have the data left-adjusted, put a minus sign in the formatting like this

>> '%-6.2f'%x
'1.23  '

In a program, rather than interactively, statements like these work in a print command

>>  print ('%6.5e'%x)
1.2346e+00


The alternative is a new "format method" scheme that is being developed for Python 3. In this the method acts on string object to create a new string. Here's very simple example of what it looks like to format the division of 25 by 7

>> '{0:.4f}'.format(25. / 7.)
3.5714

The first "0" is a position, and often there will be many similar {} to hold the data in the following format. You see the familiar "f" character to tell Python how to treat the data.

>> '%.4f'% (25. / 7.)

would have the same effect.

Finally, there is yet another way to use a format method that is perhaps clearer

>> format(3.5714, '.2f')
3.57

which is neat if there's only one variable.

Generally the most commonly used is the % expression which is embodied in Python 2.7 and in Python 3.

Input from a file and writing to files

It's more likely you will want to input data to a program from a file on your computer. Opening and reading a file in Python is very easy --

mydata = open('datafile.dat', 'r')

opens the file named datafile.dat for read-only, and assigns it to the object "mydata". You can read the data as text

mytext = mydata.read()

and the entire file is now contained in mytext. If you do this on the Python command line, and then enter "mytext", you'll see the context of your file (with end of line characters like \n too).

As with any text, we can split it into parts with

mytext.split()

which generates a list of space-delimited data from the file, ignoring the end of line's. You can read individual lines sequentially with

myline = mydata.readlines()

which returns a sequential list with the lines as items in the list.

myline.split()

When you are finished reading the file, you close it with

mydata.close()

Similarly, to write a file you would open it for writing

mynewdata = open('newdata.dat', 'w')

write text to it

mynewdata.write('This is a line of text.\n')

and continue with other lines

mynewdata.write('1 2 3\n')

until you are finished

mynewdata.close()

The close() is essential because, without it, the computer's buffers may not flush the contents of the file to the disk.

Whenever you are writing data to a file, it may be formatted with the same techniques used in formatting displayed data.


Numbers, text, and data types

So far we have see integers (e.g. 0, 1, 2 ...), floating point (e.g 3.14159), and strings (e.g. "deer in the headlights"). How are these quantities stored, and how do we access them in whole or part?


Binary numbers

Computer data are stored as a sequence of bits which may be 1 or 0. A byte is a sequence of 8 bits, and is taken to represent a number that is the sum of powers of 2:

0 0 0 0 0 0 0 1 = 2**0 = 1
0 0 0 0 0 0 1 0 = 2**1 = 2
0 0 0 0 0 1 0 0 = 2**2 = 4
0 0 0 0 1 0 0 0 = 2**3 = 8
0 0 0 1 0 0 0 0 = 2**4 = 16
0 0 1 0 0 0 0 0 = 2**5 = 32
0 1 0 0 0 0 0 0 = 2**6 = 64
1 0 0 0 0 0 0 0 = 2**7 = 128

In this way, any value from 0 to 255 can be represented by turning on the bits in the byte:

0 0 0 0 0 0 1 1 = 2*0 + 2*1 = 3

Binary numbers are often referred to in hexadecimal or "hex" code, which counts up to 15 and is given in powers of 16 rather than powers of 2. The counting sequence is 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F. So, 4 bits of a byte can be a hex code, and it takes two of them to represent an 8-bit byte. In this terminology, decimal 1 is binary 1 and hex 1. Decimal 255 is binary 11111111 and hex FF.

To see the hexadecimal form of an integer, use formatted print this way

print(format(12,'x'))
c

To see the binary form, use the 'b' format this way

print(format(12,'b'))
1100

To make longer numbers we string together bytes, and for a 64-bit computer, we treat these in chunks of 8 bytes per word. You can think of computer memory of a linear space with bits one after another, organized as bytes, and words, and sorted out by programs into text and numbers that are stored in these bits. Normally, you would not even worry over such workings, unless you need to "set a bit", control a logical decision, or sometimes control an instrument by changing internal values in a word. There are some "gotchas" to be at least aware of.

Negative integers are indicated by setting a sign bit. We see in the above example we start counting at 0 and go up to 255, but what if we want to have a -1? The standard way is to use the "most significant bit", that is the one on the far left in the display above, to indicate that the value is negative. When that bit is set to 1, the number gets a minus sign. Clearly after we reach 127 (decimal), and go one step higher to 128 we'd turn on the minus sign. Thus 1 0 0 0 0 0 0 0 is taken to be -127, and as we turn on more bits, we count up from -127 to -1. An integer stored in this way is said to be "signed", just as one that runs from 0 to 255 is said to be unsigned. The concept of signed and unsigned integers is not limited to 8-bit data, and is found even for much bigger integer storage allocation.

Another gotcha is the order in which bits and bytes are arranged in memory to associate with the numbers they represent. Integers are stored in memory as a sequence of bytes, combined in the simplest ways

  • Little-endian -- increasing numeric significance with increasing memory addresses
  • Big-endian -- decreasing numeric significance with increasing memory addresses

The x86 processor architecture uses little-endian, both within the byte and from byte to byte within a larger "word". The least significant bit has the lowest memory addresses. Consequently, in our pictoral representation of a binary number above, the memory address increases from right to left. Knowledge of how this works at the machine level is needed to program microcontrollers in instrumentation.


Now we see how integers are stored in memory, what about floating point? Fortunately, we rarely need to know the details, except that for most computers floating point numbers are stored in a succession of bytes, with most of the space allocated to the signficant part of the number in a power of 2 format standardized by IEEE. In the common base b=2, finite numbers are stored as three integers: s = a sign (zero or one), c = a significand or coefficient, and q = an exponent. The numerical value of a finite number is then

(−1)s × c × bq


Floating point with 64-bit storage uses 52 bits for c and has nearly 16 digits of precision. The exponent (of 2) must be in the range from -1022 to +1023, limiting the range of decimal values to 10**308.


Finally, there is the issue of how to store text, that is, how to encode text into numbers that are stored in memory. Each letter is assigned to a bit pattern, that is to an 8-bit integer. The old standard "ASCII" encoding utilizes only the first 127 values, but this has been extended to encompass symbols and characters used in various languages. You can see the full list at [www.ascii-code.com www.ascii-code.com]. As a short list, here are a few

 0 00 00000000 Null
10 0A 00001010 Line feed (\n)
13 0D 00001101 Carriage return (\r)
27 1B 00011011 Escape
32 20 00100000 Space
48 30 00110000 0
49 31 00110001 1
65 41 01000001 A
66 42 01000010 B
97 61 01100001 a
98 62 01100010 b

In modern UTF-8, the Unicode standard for the World Wide Web, the mapping may change with choice of language. It is backward compatible with ASCII, but uses 64-bits by using from one to four bytes. Python 3 introduced a distinction between strings, which may be encoded with, for example UTF-8, and bytes storing 8-bit data. This distinction can cause issues when strings are managed in Python 3 with older Python 2 code. However, it really helps when reading files that may be encoded, which for Python 2 would often cause confusion or failure.

For our purposes, text storage may be regarded as a sequence of bytes with each one representing a different character following the ASCII assignments. The number of bytes per character is left up to the encoding, and we have to be aware that data representing strings, and data representing bytes, are not equivalent.


Integers

In Python the variables are dynamically typed. That is, there use determines the type of data they store. This is different from other languages, like C, where the type has to be stated before the variable is used. We have already seen this in the examples.

>> x = 1./3.
>> print (x)
0.333333333333
>> y = 1/3
>> print (y)
0
>> y = x
>> print (y)
0.333333333333

When we calculate x its type is set to floatin point because it is the result of dividing two floating point numbers. Print x and you get all the precision the machine has. But, when we calculate y as the division of two integers, then we get 0 because y is an integer. Yet, we can set y = x and now y is a floating point number taking on the value of x.

In all these cases the symbols refer to values stored in memory. They act like those values, not like "pointers" to the memory where the value is stored.

An integer may be found from a float by the int() operation on the float:

>> z = 11/9 
>> print (z)
1.2222222222222223
>> z = int(z)
>> print (z)
1

There are other basic operations on integers in Python, among them

float() turns an integer or numeric string into a floating point
 % means "modulo", that is representation of the modulus of the value in that base, so 10%7 is 3.
 - negation
 ** raise to a power
 int() creates an integer from a string
 long() creates a long integer from a string
 abs() returns the absolute value of an number
 factorial() returns the factorial of an integer 

but you need the math package for that.


Floating point and math

Floating point numbers, like other variables, are dynamically typed. They have the full precision of the machine, typically 64-bits. Many of the useful math operations on floating point numbers are in the math package, and would require

import math

to access the functions. With that, you things like math.pi and math,e to return the values of pi and e, the trigonometric functions, and others less obvious but often needed. All of these would need math. in front of them when you used them:

floor(x) the largest integer less than or equal to x 
trunc(x) truncate x to an integral value r

For example if x=-1.1

>> math.floor(x) 
-2.0
>> math.trunc(x)
-1.0
fabs(x) the absolute value
fsum() an iterable summation
isnan() checks for a NaN, not a number
isinf() checkes for positive or negative infinity
log(x[,base]) where [,base] is optional and it defaults to e
log10(x)
pow(x,y) returns x^y
sqrt(x) 
cos(x) and other similar ones
acos(x) and other similar ones
atan2(y,x) returns the angle with awareness of quadrant based on signs of y and x
degrees(x) returns angle in degrees given angle in radians
radians(x) returns angle in radians given angle in degrees
acosh(x) and other similar hyperbolic functions
erf(x) error function
erfc(x) complementary error function
gamma(x) gamma function


There is also a complex math library, which we will leave for another time.


Characters and text

Data may be strings, that is long sequences of characters. You do not have to allocate space for them before you make the assignment, unlike other languages. For example, you can write ...

>> mystring ='My kingdom for a horse (Richard III).'
>> print (mystring)
'My kingdom for a horse (Richard III).'

Strings are set off by single tick's, though a quote " will do too, and may be needed if the string includes a tick. Triple quotes """ start a long string which continues in until the next """ and are used when including blocks of text in a program.

When you create the string, Python allocates memory for it and then refers to that object with the symbol you use. As long as the symbol is in use, the object exists. As soon as the symbol is changed or removed, the object's memory space is freed by "garbage collection".

Individual characters in a string are accessed by refering to an index count. In the above example,

>> mystring[3]
'k'

while a range of characters is indicated with [3:7], where "3" is the starting place and "7" the character after the last one you are selecting

>>mystring[3:7]
king

The letter k is the fourth character in the string. Fourth? Yes, strings, like other variables in Python are zero-indexed. That is, the first element is [0], the second [1], and so on.


A long string can be separated into words with the split() function:

>> import string
>> mystring = "A  string with numbers like 1, 2, and 3."
>> mystring.split()
['A',  'string', 'with', 'numbers', 'like', '1,', '2,', 'and', '3.']

There are built-in Python functions to convert an integer to a character and reverse

>> ord('X')
88
>> chr(88)
'X'

While we can see single characters in a string with something like mystring[7], strings are immutable and we cannot reassign mystring[7] to another character with "=". Strings are an immutable sequence, that is, they cannot be changed in place. To change a string, you create a new one, and assign the old name to the new string. Garbage collection then frees the memory that had been used by the old sequence.

>> S = 'yellow'
>> S = S + 'cats'
>> S
'yellowcats'
>> S = S[0:6]+' '+'cats'
>> S
'yellow cats'
>>S[0:6]
'yellow'
>>S[6:]
' cats'

The index can be negative. Here, S[0] is the first element of the string, and S[-1] is the last element, S[-2] the second to last, an so on

>>S[-3]
'a'

You can replace parts of a string, really creating a new string from the old one and keeping the name, this way

>> S = 'interstellar dust'
>> S = S.replace('r d','r gas and d')
S
'interstellar gas and dust'

This brings us to the subject of lists and such.


Lists, tuples, dictionaries, and statements

Lists

In Python, a list is an ordered collection of objects that are referred to by an offset index, much like a string, but more powerful. Lists may contain numbers, strings, or other lists. They can be changed in place, and Python manages memory so that you do not have to think about that. While it seems that the objects are "in" the list, actually the list is sequence of references to objects (like an array of pointers in C, but much easier to work on).

>> L = []

is an empty list.

>> L = [0, 10, 100, 1000]

is a list of 4 items indexed from 0 to 3

>> L[3]
1000
>> L[1:3]
[10, 100]

As with strings, when a range is specified in a list, the second value is the index for the entry after the last one.

>> L[2:4]
[100, 1000]

Notice that Python indicates lists with [ ] brackets in the assignment, and when it prints them.

Individual objects in a list can be changed in place. Given a list L = ['Earth','mass',1]

>> L[0] = 'Mars'
>> L[2] = '0.107'
>> L
['Mars', 'mass', '0.107']

Operations on lists include ones to find the length, and to repeat the list

>> L = [1, 2, 3]
>> len(L)
3
>> L * 3
[1, 2, 3, 1, 2, 3, 1, 2, 3]

You can grow a list with an append which pushes an object onto the "stack" that is the list

>> L.append(4)
[1, 2, 3, 4]

or by concatentating

>> L = L + [5]
>> L
[1, 2, 3, 4, 5]

and find the length of the list

>> len(L)
5

To remove the last item, pop it off the list. The function pop() returns the popped object and changes the list

>> L.pop()
5
>> L
[1, 2, 3, 4]

You can extend a list

>> L.extend([5, 6, 7, 8, 9, 10])
>> L
[1, 2, 3, 4, 50, 6, 7, 8, 9, 10]

Items can be removed from a list

>> L.remove(50)
L
>> [1, 2, 3, 4, 6, 7, 8, 9, 10]

You can test if an object is in a list

>> L = [1, 2, 3, 4, 5.82]
>> 3 in L
True
>> 5 in L
False
>> 5.82 in L
True

You can sort a list by value and by alphabetical ordering

>> L = [4, 7, 2.3, 75, 92, -10]
>> L.sort()
>> L
[-10, 2.3, 4, 7, 75, 92]
>> L = ['alpha', 'gamma', 'beta']
>> L.sort()
>> L
['alpha', 'beta', 'gamma']


Dictionaries

Dictionaries are data types that are indexed by a key, rather than an offset. Where in C you might program a loop to compare and search for a value or an item, with a dictionary in Python you simply ask for the item associated with a key. The item can be a value, a string, a list, or another dictionary. Dictionaries are simply unordered collections of objects that you can access by asking for the object's key. Here's a very simple example

>> messier = {1 : 'SNR', 2 : 'globular cluster', 51 : 'galaxy', 42 : 'nebula'}

creates a dictionary called "messier" with keys 1, 2, 51, and 42.

To find the data associated with the key, we query

>> messier[51]
'galaxy'

We add to it by assigning new keys

>> messier[41] = 'open cluster'
>> messier[41]
'open cluster'

In this example the key is an integer, but it could be string. For example

>> starmass = {'Sun' : 1.0 , 'Sirius' : 2.02 , 'Rigel' : 18}
>> starmass['Sirius']
2.02

You can test if an entry is in the dictionary

>> 'Sirius' in starmass
True
>> 'Betelgeuse' in starmass
False

You can add to a dictionary as you go. For example, we may want to include Messier object 42, the Orion Nebula, which would be done with

messier[42] = 'nebula in Orion'


Tuples

A "tuple" is an ordered, immutable, group of objects. Like a string and a list, a tuple is accessed by offset. Tuples are distinguished by ()

>> T1 = (1,3,5,7,11,13)

They can be concatenated

>> T2 = (17,19)
>> T = T1 + T2
>> T
(1, 3, 5, 7, 11, 13, 17, 19)

Since a tuple is immutable, you can be sure that once it is created it will be the same in other parts of a program. It may be used as the key in a dictionary too. Compare this to a list, which is a data structure that may change.


Examples

For examples of Python illustrating input, output, data types, lists, and dictionaries, see the examples section.


Assignments

For the assigned homework to use these ideas, see the assignments section.