R Statistics

From OpenWetWare
Jump to navigationJump to search

R for Statiscal Computing

What is R?

R is a system for statistical analyses and graphics created by Ross Ihaka and Robert Gentleman1. R is both a software and a language considered as a dialect of the S language created by the AT&T Bell Laboratories. S is available as the software S-PLUS commercialized by Insightful2. There are important di�erences in the designs of R and of S: those who want to know more on this point can read the paper by Ihaka & Gentleman (1996) or the R-FAQ3, a copy of which is also distributed with R. R is freely distributed under the terms of the GNU General Public Licence4; its development and distribution are carried out by several statisticians known as the R Development Core Team. R is available in several forms: the sources (written mainly in C and some routines in Fortran), essentially for Unix and Linux machines, or some pre-compiled binaries for Windows, Linux, and Macintosh. The �les needed to install R, either from the sources or from the pre-compiled binaries, are distributed from the internet site of the Comprehensive R Archive Network (CRAN)5 where the instructions for the installation are also available. Regarding the distributions of Linux (Debian, . . . ), the binaries are generally available for the most recent versions; look at the CRAN site if necessary. R has many functions for statistical analyses and graphics; the latter are visualized immediately in their own window and can be saved in various formats (jpg, png, bmp, ps, pdf, emf, pictex, x�g; the available formats may depend on the operating system). The results from a statistical analysis are displayed on the screen, some intermediate results (P-values, regression coef- �cients, residuals, . . . ) can be saved, written in a �le, or used in subsequent analyses. The R language allows the user, for instance, to program loops to successively analyse several data sets. It is also possible to combine in a single program di�erent statistical functions to perform more complex analyses. The 1Ihaka R. & Gentleman R. 1996. R: a language for data analysis and graphics. Journal of Computational and Graphical Statistics 5: 299{314. 2See http://www.insightful.com/products/splus/default.asp for more information 3http://cran.r-project.org/doc/FAQ/R-FAQ.html 4For more information: http://www.gnu.org/ 5http://cran.r-project.org/ 1 R users may bene�t from a large number of programs written for S and available on the internet6, most of these programs can be used directly with R. At �rst, R could seem too complex for a non-specialist. This may not be true actually. In fact, a prominent feature of R is its exibility. Whereas a classical software displays immediately the results of an analysis, R stores these results in an \object", so that an analysis can be done with no result displayed. The user may be surprised by this, but such a feature is very useful. Indeed, the user can extract only the part of the results which is of interest. For example, if one runs a series of 20 regressions and wants to compare the di�erent regression coe�cients, R can display only the estimated coe�cients: thus the results may take a single line, whereas a classical software could well open 20 results windows. We will see other examples illustrating the exibility of a system such as R compared to traditional softwares.

Download and Install R

Links to tutorials

Examples for commonly used statistis

Bioconductor & Microarray data Analysis