Programming for Physics and Astronomy

From AstroEd
Revision as of 20:46, 6 February 2013 by WikiSysop (talk | contribs) (→‎C)
Jump to navigation Jump to search

Only 50 years ago, most physics and astronomy research relied on the analytical skills of the scientist, on the tools of classical mathematics that were taught to them as students, and in some cases on data management and numerical analysis done by hand. Today, cutting edge research often requires high speed computing for simulation and data analysis, interactive tools to enhance extraction of relevant information from multi-parameter databases, access to automated and robotic instrumentation, and management of incomprehensibly large data sets. The issue for the researcher in training is not whether computing skills are needed, but which ones are most critical.

Broadly classed, there are several options:

  • Packaged commercial, proprietary, licensed programs and tools (e.g. Excel, Maxim ...)
  • Licensed proprietary programming environments (e.g. IDL, Matlab, Mathematica ...)
  • Open source tools (e.g. GDL, ds9, Grace, Sage ...)
  • Programming languages (e.g. C, C++, Fortran, Java, Python ...)
  • Web resources (e.g. HTML, Javascript, PHP, Perl ...)

In order to decide which of these apply to your own research, consider a larger question of what role computer science plays in contemporary physics and astronomy, and in what direction your research field is headed. Then, pick the tools that solve the problem at hand, realizing that the skills you develop at each step raise you up to reach a solution for the next, unknown, problem. In some cases, continued reliance on an old, inefficient, but proven, method only delays the need to acquire new skills and knowledge.

An interesting perspective on the significance of large data base sciences was offered by Chris Mattmann in a Nature Commentary, in which he pointed out that the Square Kilometer Array (SKA), scheduled to have first light in 2020, will generate 22,000,000,000 terabytes (TB) of data per year! In the optical regime, the Large Synoptic Survey Telescope (LSST) has a 3.2 giga-pixel (3200 mega-pixels) camera taking images in 15 second exposures throughout the night. The resulting images will offer a nightly record of nearly the entire sky in an open, publically accessible database reaching 24th magnitude in single exposures, and 27th magnitude in stacked images of fields of 10 square degrees. There will be multi-dimensional data products from the LSST that will require exceptional unique tools to use effectively.

Programming languages for physics and astronomy applications

Current physics and astronomy research relies on several languages and computing environments, and there is no single choice that is optimal for every problem. Typically, we would consider first what prior work has been done that can be used, what programming skills are required to add to the prior work, or to develop new applications, and the support that's available for the individual researcher when, inevitably, they need help. Here are a few common ones.


Fortran (from "Formula Translating") made its first appearance in research use with the availability of IBM mainframe computers on university campuses and research centers in the 1960's. It is still a popular programming language, especially for high performance computing. Its original version was constrained by the use of punched cards for input and output, and vestiges of that remain in the system today. There are commercial optimized compilers available for most computing systems, and the effective and well-maintained GNU open-source compiler (gfortran) is available for Linux, Windows, and MacOS.

Fortran's handling of text input and output is awkward, and there is no standard graphical user interface. It's strong point today is in massively parallel computing.


The C programming language was developed at AT&T Bell Labs around 1970, and has become the most widely used programming language today. Derivatives, like C++, Java and even Perl and Python share features of its structure and syntax. The language is highly standardized, easily commented, and consequently readable if carefully annotated. Because it underpins most graphical user interfaces, there are libraries to utilize Motif, GTK, and Qt in C programs. Free open-source C compilers are available for Windows, Linux, and MacOS. The the Linux world, the standard compiler is the GNU compiler collection or GCC, "gcc" on the command line. It is included in every base Linux installation.

C is an excellent programming language for almost any application, and there are routines available in the public domain for many applications in physics and astronomy computing. Compiled C programs are readily optimized and excecute with speeds that take advantage of the most recent hardware in desktop and large multi-core computing environments. For example Nvidia's graphical processing unit (GPU) computing is fully supported, enabling thousands of separate processors or "cores" to be tasked to solve large problems.

The drawbacks to C are that development and debugging can be tedious in a write-compile-test-rewrite process, and that adding a GUI to an application is painstaking even in an integrated development environment (IDE). If an IDE such as Eclipse or Netbeansis used, then the resulting code cannot be easily read and debugged outside of the IDE, so the inherent advantages of a simple text file for each routine and readable code is often lost. The compiled code must be run in the environment in which it was written, so that C programs have to be compiled, and often debugged, for each target operating system.



Matlab and Mathematica