You are using an older browser that does not support current Web standards. Although this site is viewable in all browsers, it will look much better in a browser that supports Web standards.

Pitt HomeFind PeopleContact Us

Tools

Overview

I use a number of open source software tools in my work. The tools span a range of documentation, programming, scientific computing, statistics. But more important than the tools is what to do with them. For a very good tutorial on the use of tools for scientific computing, go to the Software Carpentry site, where both a lengthy tutorial as well as video course in software engineering for scientific programming is available.

Bibliographic tools

I use Zotero to maintain bibliographic references as well as keep notes on journal articles. Zotero is a Firefox add-in that can be integrated with MS Word, OpenOffice.org and LaTeX (through generating BibTeX files). The Zotero website includes downloads, documentation as well as instructional videos on its use. Zotero can also be installed through the Tools-> Add-ons menu in Firefox.

R

The R Project is a data analysis platform built on the S language that is often used for statistical analysis and computing. As it has matrix capabilities due to LAPACK it can also be used for a variety of scientific computing purposes as well. I use the following packages regularly:
  • Sweave - Literate programming tools for LaTeX and R
  • RBGL - R interface to the Boost Graph Library
  • rgdal - R interface to OGR/GDAL libraries for transforming spatial data formats
  • RSQLite - R interface to the SQLite database system
  • glpk - An interface to GLPK. Note that it includes an interface into the GLPK API and use of GMPL, which the rglpk package does not.
  • sp, spatial - Spatial packages for working with spatial data
  • Rcmdr - R Commander. A GUI for R that is based on Tcl/Tk and is started from within R
  • HSAUR2 - Handbook of Statistical Analysis Using R, 2nd ed
  • rjags/R2jags - Interfaces to Just Another Gibbs Sampler (JAGS)
  • MCMCpack - Monte Carlo Markov Chain package

Getting Started

Tutorials and introductory material

More information on R can be found in the R website. In particular, there are a number of good tutorials in the Contributed Documentation of the R website http://cran.r-project.org. The "Introduction to R" that is included with R is somewhat terse and intended for those involved in statistical computing. But it has a reasonable introduction for getting started and the Sample Session in the Appendix is a reasonable first step. After you get R installed and working, you may want to go to some of the materials below. The first is a guide to installing and setting up your environment. [2] and [3] are instruduction to R with basic statistics as examples. [4] is a basic introduction to using R for data analysis. [5] looks at R as a analysis environment (vice a statistical one) along the lines of Matlab, Python (with Pylab), IDL, etc.
  • Mathesaurus A site with quick reference cards for common tasks in R, Python (Numpy), Matlab/Octave and IDL.
  • Installation of R, of R packages, and editor Environments by John Maindonald. This document was written to get someone started. In particular, someone starting in R needs to install R, but also needs to learn how to install packages (libraries). In most cases, one also will want an editor for writing scripts and interacting with R. Tinn-R is one they highlight for use on Windows and seems to give people good results for someone who wants a Notepad+ style editor. (which are most non-programmers)
  • Simple R by John Verzani. This is the early version of a book by the author. An introduction to statistics using R.
  • R for Beginners by Emmanuel Paradis. Another introduction to statistics using R
  • Using R for Data Analysis and Graphics by John Maindonald. Just as it says. This is the same John Maindonald referenced in the installation guide at the beginning of this list. The datasets and some functions are in the DAAG package that can be installed into R.
  • Using R for Scientific Computing by Karline Soetaert. This is essentially using R as a data analysis environment a la Matlab instead of only for statistics. It is also an introduction to the marelac and marelacteaching libraries.
  • Tinn-R is a free and reasonably simple editor for R.
  • StatET is an Eclipse plugin for using R and Sweave. Being an Eclipse plugin, it is most useful for programmers.

Python

I use Python with the following extensions as a Matlab-like environment based around the Scipy libraries AKA Pylab. Using these libraries (especially when using the builds provided by the Enthough Python Distribution allows for the speed of compiled C or Fortran, but the syntax of Python (i.e. OO, Procedural, or Functional programming as best fits the problem at hand).
  • iPython - A command line interface similar to that found in Matlab, Mathematica, Maple, etc. As of version 0.13 includes an option for a browser based notebook interface.
  • Numpy - Numeric extensions including matrix operations (LAPACK)
  • Scipy - A number of other scientific libraries with interfaces to Python
  • Matplotlib - Plotting tools for Python
  • Pandas - High performance data structures and data analysis tools
  • Sympy - A symbolic math (computer algebra system) written in Python
  • Simpy - A discrete event simulation library
To get started learning Python as well as tools for using Python for Scientific programming (interactive analysis, high performance computing, interaction with C and Fortran, etc.) there are a number of tutorials from the 2009 Scipy Conference that can be found in the Internet Archive. In particular, you should start with the First tutorial which covers using the interactive iPython prompt. (it is available as a PDF as well)

I also use Sage which is a mathematical environment which uses Python to interface between a number of open source project giving capabilities which are a combination of those found in Mathematica, Maple, Matlab, and others including:
  • Computer Algebra and calculus
  • Linear algebra
  • Number theory
  • Plotting
  • and others
You can try Sage out at its online notebook at http://sagenb.com and go through its tutorial.

Databases

There are two databases I use, SQLite and Postgres.

SQLite

SQLite is a public domain relational database which implements most (but not all) of the SQL specification. It can be used as a replacement for a client server relational database, but it's real use is as a more robust replacement for text data files and data formats or as an embedded database inside another program. The design goal of SQLite is to be simple. Some resulting features include being self-contained, zero-configuration, serverless, and compact. As a side effect it is also fast. It is characterized by the fact that the database is in a single file (including all tables and queries) which makes it easy to distribute. It can be accessed with little difficulty from almost any computing environment (ODBC, JDBC, almost every programming language has an interface.) This leads it to be embedded in a number of different programs such as Photoshop Lightroom, the Apple iLife family of products, Mozilla Firefox and Google Gears. Wrappers and drivers to a variety of programming platforms are listed at the SQLite site here. It is also heavily tested, with more test code then actual lines of code reaching 100% test coverage for SQlite core. The easiest way to look at a SQLite database is to use the SQLite Manageer Add-on for Mozilla Firefox. Information on the add-on canbe found at the SQLite Manager website.

PostgreSQL/PostGIS

I also use PostGIS, the spatially enabled version of PostgreSQL. PostGIS implements the Simple Features for SQL (SFSQL) specification, which is the same specification implemented by Oracle Spatial and DB2 Spatial Extender. Therefore, it can be used by any program that also requires SFSQL, such as ESRI ArcSDE.

Editors/IDEs

While Vim is embedded in my fingers and I find myself switching to it regularly, it is usually easier to work with a somewhat more featured editor.
  • Jedit runs over Java. It also takes plug ins. I usually use JDiff and the spell checker and the Jython plugins. Code2HTML converts source code to HTML or LaTeX (useful for printing out code). Once upon a time there were working LaTeX plugin but it does not seem to work very well. Since I usually have a terminal/console window open when I use LaTeX this does not seem to be an issue.
  • Eclipse is an Integrated Development Environment (IDE). It was originally created for Java, but a number of plug-ins have been developed for it. Some add ons that I use include:
    • Pydev - Python IDE
    • TeXlipse - LaTeX editor
    • StatEt - R ans Sweave environment
    • C/C++ Development Tools (CDT) - C/C++ Development
    • BIRT - Business Intelligence Reporting Tools (Database access and reporting
    • uDig - Open source GIS

Version Control

Because all of us make mistakes, and like working with others, I am starting to use Mercurial for version control.
Top