R Statistics

From OpenWetWare
Revision as of 02:22, 26 March 2007 by Jakob Suckale (talk | contribs) (using R, more links, intro)
Jump to navigationJump to search
back to stats portal

R is a free software for statistical analysis and graphics.
It runs on various UNIX platforms, Windows, and MacOS.
The latest version 2.4.1 was released on 2006-12-18, rated 4.3 out of 5[1].
Since 1997 an international core team of about 15 people develops R.

screenshot of R running on Unix

What is R?

(Taken from R for beginners)

R is a system for statistical analyses and graphics created by Ross Ihaka and Robert Gentleman. R is both a software and a language considered as a dialect of the S language created by the AT&T Bell Laboratories. S is available as the software S-PLUS commercialized by Insightful2 There are important erences in the designs of R and of S: those who want to know more on this point can read the paper by Ihaka & Gentleman (1996) or the R-FAQ, a copy of which is also distributed with R. R is freely distributed under the terms of the GNU General Public Licence; its development and distribution are carried out by several statisticians known as the R Development Core Team.

R is available in several forms: the sources (written mainly in C and some routines in Fortran), essentially for Unix and Linux machines, or some pre-compiled binaries for Windows, Linux, and Macintosh. The les needed to install R, either from the sources or from the pre-compiled binaries, are distributed from the internet site of the Comprehensive R Archive Network (CRAN) where the instructions for the installation are also available. Regarding the distributions of Linux (Debian, . . . ), the binaries are generally available for the most recent versions; look at the CRAN site if necessary.

R has many functions for statistical analyses and graphics; the latter are visualized immediately in their own window and can be saved in various formats (jpg, png, bmp, ps, pdf, emf, pictex, xg; the available formats may depend on the operating system). The results from a statistical analysis are displayed on the screen, some intermediate results (P-values, regression coef- cients, residuals, . . . ) can be saved, written in a le, or used in subsequent analyses.

The R language allows the user, for instance, to program loops to successively analyse several data sets. It is also possible to combine in a single erent statistical functions to perform more complex analyses. The

R users may benet from a large number of programs written for S and available on the internet6, most of these programs can be used directly with R. At rst, R could seem too complex for a non-specialist. This may not be true actually. In fact, a prominent feature of R is its exibility. Whereas a classical software displays immediately the results of an analysis, R stores these results in an \object", so that an analysis can be done with no result displayed. The user may be surprised by this, but such a feature is very useful. Indeed, the user can extract only the part of the results which is of interest.

Install R

  • choose a download mirror: list of mirror sites for R download
  • download the right package for you (Linux/Windows/Mac)
  • install the package following the OS-specific instructions

Use R

To use R you will have to learn some R commands (see screenshot), i.e. it's not fully menu based like most Windows and Mac software. This might seem tedious but you will soon realise that while slowing you down initially it will speed up your work and make it better after an initial learning period.

There is a lot of free documentation available:

Examples for commonly used statistics

Bioconductor & Microarray data Analysis

Links