Difference between revisions of "Programming for Physics and Astronomy"
|Line 100:||Line 100:|
*Fast management of large data sets with Pandas, and conventional databases and spreadsheets
*Fast management of large data sets with Pandas, and conventional databases and spreadsheets
If that's not enough, then there's the [
If that's not enough, then there's the [://..////Zen of Python]:
*Beautiful is better than ugly.
*Beautiful is better than ugly.
Latest revision as of 04:54, 21 February 2018
We begin our short course on Python for Physics and Astronomy by considering the role of computing and the need for programming skills in current research.
Only 50 years ago, most physics and astronomy research relied on the analytical skills of the scientist, on the tools of classical mathematics that were taught to them as students, and in some cases on data management and numerical analysis done by hand. Today, cutting edge research often requires high speed computing for simulation and data analysis, interactive tools to enhance extraction of relevant information from multi-parameter databases, access to automated and robotic instrumentation, and management of incomprehensibly large data sets. The issue for the researcher in training is not whether computing skills are needed, but which ones are most critical.
Broadly classed, there are several options:
- Packaged commercial, proprietary, licensed programs and tools (e.g. Excel, Maxim ...)
- Licensed proprietary programming environments (e.g. IDL, Matlab, Mathematica ...)
- Open source tools (e.g. GDL, ds9, Grace, Sage ...)
- Programming languages (e.g. C, C++, Fortran, Java, Python ...)
In order to decide which of these apply to your own research, consider a larger question of what role computer science plays in contemporary physics and astronomy, and in what direction your research field is headed. Then, pick the tools that solve the problem at hand, realizing that the skills you develop at each step raise you up to reach a solution for the next, unknown, problem. In some cases, continued reliance on an old, inefficient, but proven, method only delays the need to acquire new skills and knowledge.
An interesting perspective on the significance of large data base sciences was offered by Chris Mattmann in a Nature Commentary, in which he pointed out that the Square Kilometer Array (SKA), scheduled to have first light in 2020, will generate 22,000,000,000 terabytes (TB) of data per year! In the optical regime, the Large Synoptic Survey Telescope (LSST) has a 3.2 giga-pixel (3200 mega-pixels) camera taking images in 15 second exposures throughout the night. The resulting images will offer a nightly record of nearly the entire sky in an open, publically accessible database reaching 24th magnitude in single exposures, and 27th magnitude in stacked images of fields of 10 square degrees. There will be multi-dimensional data products from the LSST that will require exceptional unique tools to use effectively.
Current physics and astronomy research relies on several languages and computing environments, and there is no single choice that is optimal for every problem. Typically, we would consider first what prior work has been done that can be used, what programming skills are required to add to the prior work, or to develop new applications, and the support that's available for the individual researcher when, inevitably, they need help. Here are a few common ones.
Fortran (from "Formula Translating") made its first appearance in research use with the availability of IBM mainframe computers on university campuses and research centers in the 1960's. It is still a popular programming language, especially for high performance computing. Its original version was constrained by the use of punched cards for input and output, and vestiges of that remain in the system today. There are commercial optimized compilers available for most computing systems, and the effective and well-maintained GNU open-source compiler (gfortran) is available for Linux, Windows, and MacOS.
Fortran's handling of text input, processing, and output is awkward, and there is no standard graphical user interface. It's strong point today is in massively parallel computing.
The C programming language was developed at AT&T Bell Labs around 1970, and has become the most widely used programming language today. Derivatives, like C++, Java and even Perl and Python share features of its structure and syntax. The language is highly standardized, easily commented, and consequently readable if carefully annotated. Because it underpins most graphical user interfaces, there are libraries to utilize Motif, GTK, and Qt in C programs. Free open-source C compilers are available for Windows, Linux, and MacOS. The the Linux world, the standard compiler is the GNU compiler collection or GCC, "gcc" on the command line. It is included in every base Linux installation.
C is an excellent programming language for almost any application, and there are routines available in the public domain for many applications in physics and astronomy computing. Compiled C programs are readily optimized and excecute with speeds that take advantage of the most recent hardware in desktop and large multi-core computing environments. For example Nvidia's graphical processing unit (GPU) computing is fully supported, enabling thousands of separate processors or "cores" to be tasked to solve large problems.
The drawbacks to C are that development and debugging can be tedious in a write-compile-test-rewrite process, and that adding a GUI to an application is painstaking even in an integrated development environment (IDE). If an IDE such as Eclipse or Netbeansis used, then the resulting code cannot be easily read and debugged outside of the IDE, so the inherent advantages of a simple text file for each routine and readable code is often lost. The compiled code must be run in the environment in which it was written, so that C programs have to be compiled, and often debugged, for each target operating system.
The Java programming lanuage was developed by Sun Microsystems in the 1990's and made openly available under the GNU Public License (GPL) in 2007. When Sun merged into Oracle, a public open source version of Java known as Open Java Development Kit or OpenJDK became the community standard for collaborative development.
However, to date Java has not been widely used in astronomy so that when it is employed, the programmer has to create tools to handle most key astronomical functions. Conveniently, Java and C are similar, and translation of the common C code to Java is usually straightforward. Two new applications, AstroCC and AstroImageJ, from Karen Collins at the University of Louisville offer professional verified code to handle fundamental astronomy, image processing, and photometry.
IDL and GDL
The Interactive Data Language or IDL is popular in astronomy and medical imaging. It is a proprietary system (originally developed for astronomy) that can be very expensive to license except as a student. However, it offers a variety of well-tested routines that have been contributed by the original developers and users.
The GNU Data Language (GDL) is a free implementation of the same programming command set that, for the most part, is equilvalent to IDL. The IDL Astronomy User's Library at NASA works in both IDL and GDL. GDL has a useful interface to Python, so it can be utilized together with other comprehensive programming tools.
The primary disadvantage to IDL has been its cost, which makes it difficult for users not associated with well-supported research institutions to utilize. As GDL continues to improve, its IDL-like environment may see wider use in Physics and Astronomy.
Matlab, Mathematica, Sage, and now SymPy
Two very well established tools for computer-based algebra and analysis are Mathematica and Matlab. Both are proprietary, and costly. Mathematica is widely used in mathematics and to a lesser degree in physics, biophysics, chemistry, and engineering. Matlab is seen in engineering and to a lesser degree in physics. Matlab is modular, so that a user would purchase only the parts of the system needed.
Neither Matlab nor Mathematica have wide use in astronomy, and the available libraries for those disciplines are limited. The open source Sage is a promising alternative available for free. The new SymPy computer symbolic mathematics system built in pure Python is an easily added packaage.
Mathematica offers very good support for symbolic algebra and calculus, and for interactive multi-dimensional graphics. Matlab interfaces with some instrumentation, and with the LabVIEW programming environment from National Instruments. Experience with Matlab and LabVIEW is very worthwhile for careers in engineering and related commercial disciplines.
The drawback to Mathematica or Matlab is their proprietary nature, which greatly restricts the distribution and reuse of code. Further, while they may be free to use or available at low cost to students at universities that subscribe to the licences, to others both may be very expensive. For this reason, for new work in astronomy and physics, other systems are preferable if they will meet the need. Think Python.
The Image Reduction and Analysis Facility or IRAF is a software system developed at the National Optical Astronomy Observatories. It is currently released as free software without licensing restrictions and has some community support from its NOAO website and an independent users group at iraf.net.
IRAF offers powerful well tested tools for working with astronomical data, and there are versions of it in use at many major observatories. New capability adding the Virtual Observatory (VO) toolbox makes it useful for the next generation of astronomical databases as well. Its strong point is the management of data from specfic observatory instruments for which the developers have created unique software. It is less useful, and considerably more cumbersome, for the routine processing of astronomical image and spectroscopic data that were not acquired within that framework. Unlike the other languages mentioned here, IRAF is not a general purpose computing environment, but a collection of applications specific to astronomy.
Python (named after Monty Python's Flying Circus, not the Burmese snake) is a high level programming language that is finding wide acceptance in astronomy, physics, engineering, and computer science. While it is used as a scripting language, it is also modular and produces standalone executable code. Python is free, open source, and available for MacOS, Windows, and Linux. Most Linux operating systems come with Python installed. The official Python website has installation instructions and supporting documentation.
If you are looking for reasons to use Python, here is a list or a few from an STSci tutorial on using Python in Astronomy:
- Very general and powerful programming language, yet easy to learn.
- Strong, but optional, Object Oriented Programming support
- Very large user and developer community, very extensive and broad library base
- Free with a non-restrictive Open Source license
- Becoming the standard scripting language for astronomy
- Many books and on-line documentation resources available (for the language and its libraries)
- Extensible plotting framework (matplotlib)
- Usable within many windowing frameworks (GTK, Tk, WX, Qt)
- Powerful image handling (multiple simultaneous LUTS, optional resampling/rescaling, alpha blending, etc)
- Fast management of large data sets with Pandas, and conventional databases and spreadsheets
If that's not enough, then there's the Zen of Python:
- Beautiful is better than ugly.
- Explicit is better than implicit.
- Simple is better than complex.
- Complex is better than complicated.
- Flat is better than nested.
- Sparse is better than dense.
- Readability counts.
- Special cases aren't special enough to break the rules.
- Although practicality beats purity.
- Errors should never pass silently.
- Unless explicitly silenced.
- In the face of ambiguity, refuse the temptation to guess.
- There should be one and preferably only one obvious way to do it.
- Although that way may not be obvious at first unless you're Dutch.
- Now is better than never.
- Although never is often better than right now.
- If the implementation is hard to explain, it's a bad idea.
- If the implementation is easy to explain, it may be a good idea.
- Namespaces are one honking great idea. Let's do more of those!