User:Timothee Flutre/Notebook/Postdoc/2011/11/16

From OpenWetWare

(Difference between revisions)
Jump to: navigation, search
(try pkg snpStats)
(About statistical modeling: add Jaynes 1985)
(17 intermediate revisions not shown.)
Line 6: Line 6:
| colspan="2"|
| colspan="2"|
<!-- ##### DO NOT edit above this line unless you know what you are doing. ##### -->
<!-- ##### DO NOT edit above this line unless you know what you are doing. ##### -->
-
==Entry title==
+
==About statistical modeling==
-
* try the R/Bioconductor package [http://www.bioconductor.org/packages/devel/bioc/html/snpStats.html snpStats]:
+
* '''intro courses''':
 +
** "OpenIntro Statistics" by Diez, Barr and Cetinkaya-Rundel (free [http://www.openintro.org/stat/textbook.php textbook])
 +
** "Statistics Done Wrong" by Alex Reinhart (free [http://www.refsmmat.com/statistics/ textbook])
 +
** "Mixed effects models for the population approach" by Marc Lavielle and the POPIX team at INRIA (free [http://popix.lixoft.net/index.php?title=Home_page wiki])
 +
** "Graphical Models" by Zoubin Ghahramani (2012, free [http://videolectures.net/mlss2012_ghahramani_graphical_models/ video & slides])
 +
** [http://swirlstats.com/ swirl], R package to learn stats and R simultaneously and interactively
-
library(snpStats)
+
* '''advanced courses''':
-
tmp <- matrix(c(1,3,2,1,3,0,1,3,0,1), ncol=2, dimnames=list(paste("snp", 1:5, sep=""), paste("ind", 1:2, sep="")))
+
** "Advanced Data Analysis from an Elementary Point of View" by Cosma Shalizi (free [http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ book])
-
tmp
+
** "A First Course in Bayesian Statistical Methods" by Peter Hoff (2010, [http://www.amazon.com/gp/product/0387922997 book])
-
tmp2 <- new("SnpMatrix", t(tmp))
+
** "Bayesian Data Analysis" by Andrew Gelman & co (2013, free [http://www.stat.columbia.edu/~gelman/book/slides slides], [http://www.amazon.com/dp/1439840954 3rd edition] of the book)
-
tmp2
+
** "Statistical Decision Theory and Bayesian Analysis" by James Berger (1993, [https://www.springer.com/statistics/statistical+theory+and+methods/book/978-0-387-96098-2 2nd edition] of the book)
-
summary(tmp2)
+
-
print(as(t(tmp2), 'character'))
+
-
print(as(t(tmp2), 'numeric'))
+
-
Unfortunately, it doesn't seem possible to convert a matrix of characters into SnpMatrix, assuming 1=AA, 2=AB, 3=BB and 0=NC:
+
* '''mathematical aspects''':
 +
** "Introduction to Linear Algebra" by Gilbert Strang (free [http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/ videos], [http://www.amazon.com/dp/0980232716 book])
 +
** "Matrix Differential Calculus with Applications in Statistics and Econometrics" by Magnus and Neudecker (2007, free [http://www.janmagnus.nl/misc/mdc2007-3rdedition pdf] for the 3rd edition)
-
tmp <- matrix(c("A/A","B/B","A/B","A/A","B/B","","A/A","B/B","","A/A"), ncol=2, dimnames=list(paste("snp", 1:5, sep=""), paste("ind", 1:2, sep="")))
+
* '''practical, computational aspects''':
-
tmp
+
** "How to share data with a statistician" by Jeff Leek ([https://github.com/jtleek/datasharing procedure] on GitHub), see also the [http://simplystatistics.org/2014/02/03/the-three-tables-for-genomics-collaborations/ advice] on genomics metadata by Raphael Irrizary and "statistical consulting" by Karl Broman ([http://www.biostat.wisc.edu/~kbroman/teaching/misc/consulting.pdf slides])
-
tmp2 <- new("SnpMatrix", t(tmp))
+
** "Exploratory Data Analysis with R" by Jennifer Bryan (free [http://www.stat.ubc.ca/~jenny/STAT545A/2012-lectures/ course])
 +
** "Tutorial on Big Data with Python" by Marcel Caraciolo (free Python [https://github.com/marcelcaraciolo/big-data-tutorial notebooks])
 +
** interpreted languages: obviously [http://openwetware.org/wiki/User:Timothee_Flutre/Notebook/Postdoc/2011/11/07 R], but more and more Python ([https://en.wikipedia.org/wiki/Scipy SciPy] for NumPy, Matplotlib, and pandas, but see also [https://en.wikipedia.org/wiki/Scikit-learn scikit-learn] and [http://statsmodels.sourceforge.net/ statsmodels]), as well as others ([https://en.wikipedia.org/wiki/Julia_%28programming_language%29 Julia])
 +
** C/C++: [http://en.wikipedia.org/wiki/GNU_Scientific_Library GSL], [http://en.wikipedia.org/wiki/Armadillo_%28C++_library%29 Armadillo], [http://en.wikipedia.org/wiki/Eigen_(C%2B%2B_library) Eigen], [http://www.rcpp.org/ Rcpp], [http://mc-stan.org/ Stan]
 +
** editor: obviously [https://openwetware.org/wiki/User:Timothee_Flutre/Notebook/Postdoc/2012/07/25 Emacs] (language-agnostic, org-mode, etc), but also [https://en.wikipedia.org/wiki/RStudio Rstudio] (R-only...) and [https://en.wikipedia.org/wiki/Ipython IPython] (Python-only...)
-
Thus, in the case where one has a matrix of genotypes obtained by Illumina (whether we have AA or A/A), we need to convert it first to the 1/2/3/0 encoding:
+
* '''visualizing, plotting''':
 +
** "Visualizing uncertainty about the future" by Spiegelhalter et al. (Science 2011, [http://dx.doi.org/10.1126/science.1191181 DOI])
 +
** "Let's practice what we preach: turning tables into graphs" by Gelman et al. (The American Statistician 2002, [http://dx.doi.org/10.1198/000313002317572790 DOI])
 +
** "Top ten worst graphs" by Karl Broman ([http://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/ webpage])
-
tmp <- gsub("A/A", 1, tmp)
+
* '''philosophy, history, pragmatism''':
-
tmp <- gsub("A/B", 2, tmp)
+
** "Statistical analysis and the illusion of objectivity" by Berger and Berry (American Scientist 1988, [http://dx.doi.org/10.1016/0278-2316(88)90057-6 DOI], [http://www.medicine.mcgill.ca/epidemiology/joseph/courses/EPIB-675/Berger.Berry.pdf pdf])
-
tmp <- gsub("B/B", 3, tmp)
+
** "Bayesian methods: general background" by E. T. Jaynes (1985, free [http://bayes.wustl.edu/etj/articles/general.background.pdf pdf]) and "Where do we stand on maximum entropy?" by E. T. Jaynes (1978, free [http://bayes.wustl.edu/etj/articles/stand.on.entropy.pdf pdf])
-
tmp <- gsub("^$", 0, tmp)
+
** "Mathematical Models and Reality: A Constructivist Perspective" by Christian Hennig (Foundations of Science 2010, [http://dx.doi.org/10.1007/s10699-009-9167-x DOI])
-
tmp <- matrix(as.numeric(tmp), ncol=ncol(tmp), dimnames=list(rownames(tmp), colnames(tmp)))
+
** "Philosophy and the practice of Bayesian statistics" by Andrew Gelman and Cosma Shalizi (British Journal of Mathematical and Statistical Psychology 2013, [http://dx.doi.org/10.1111/j.2044-8317.2011.02037.x DOI])
-
tmp
+
** "Statistical Inference : the Big Picture" by Robert Kass (Statistical Science 2011, [http://dx.doi.org/10.1214/10-STS337 DOI], free [http://arxiv.org/pdf/1106.2895v2.pdf pdf] on arXiv)
-
tmp2 <- new("SnpMatrix", t(tmp))
+
** "In Praise of Simplicity not Mathematistry! Ten Simple Powerful Ideas for the Statistical Scientist" by Roderick Little (JASA 2013, [http://dx.doi.org/10.1080/01621459.2013.787932 DOI])
-
tmp2
+
** "Des spécificités de l’approche bayésienne et de ses justifications en statistique inférentielle" par Christian Robert (chapitre 2013, [http://hal.archives-ouvertes.fr/docs/00/87/01/24/PDF/Bayes.pdf pdf] gratuit sur HAL)
-
summary(tmp2)
+
-
Then, one can easily look at summary statistics, eg. the histogram of minor allele frequencies, of z-score for HWE, etc, and filter data accordingly:
+
* '''classics''':
-
 
+
** [https://www.ceremade.dauphine.fr/~xian/M2classics.html list] from Christian Robert
-
hist(col.summary(tmp2)$MAF)
+
-
hist(col.summary(tmp2)$z.HWE)
+
 +
* '''litterature, community''':
 +
** [http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos Annals of Statistics], [http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1467-9868 JRSSB], [http://www.tandfonline.com/toc/uasa20/current JASA], [http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoas Annals of Applied Statistics], [http://ba.stat.cmu.edu/ Bayesian Analysis], [http://jmlr.org/ JMRL], [http://books.nips.cc/ NIPS]
 +
** [http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1541-0420 Biometrics], [http://biostatistics.oxfordjournals.org/ Biostatistics], [http://biomet.oxfordjournals.org/ Biometrika]
 +
** [http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.ss Statistical Science], [http://amstat.tandfonline.com/loi/tas#.UsrQx47_7Gg The American Statistician]
 +
** see also on [http://projecteuclid.org/ Project Euclid] and [http://arxiv.org/archive/stat arXiv]
 +
** blogs: [http://andrewgelman.com/ Andrew Gelman], [http://xianblog.wordpress.com/ Christian Robert], [http://normaldeviate.wordpress.com/ Larry Wasserman]
 +
** links with society: [http://onlinelibrary.wiley.com/journal/10.1111/%28ISSN%291467-985X JRSSA], Statistique et Société (free [http://publications-sfds.fr/index.php/stat_soc/index pdfs])
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->

Revision as of 08:52, 9 February 2014

Project name Main project page
Previous entry      Next entry

About statistical modeling

  • intro courses:
    • "OpenIntro Statistics" by Diez, Barr and Cetinkaya-Rundel (free textbook)
    • "Statistics Done Wrong" by Alex Reinhart (free textbook)
    • "Mixed effects models for the population approach" by Marc Lavielle and the POPIX team at INRIA (free wiki)
    • "Graphical Models" by Zoubin Ghahramani (2012, free video & slides)
    • swirl, R package to learn stats and R simultaneously and interactively
  • advanced courses:
    • "Advanced Data Analysis from an Elementary Point of View" by Cosma Shalizi (free book)
    • "A First Course in Bayesian Statistical Methods" by Peter Hoff (2010, book)
    • "Bayesian Data Analysis" by Andrew Gelman & co (2013, free slides, 3rd edition of the book)
    • "Statistical Decision Theory and Bayesian Analysis" by James Berger (1993, 2nd edition of the book)
  • mathematical aspects:
    • "Introduction to Linear Algebra" by Gilbert Strang (free videos, book)
    • "Matrix Differential Calculus with Applications in Statistics and Econometrics" by Magnus and Neudecker (2007, free pdf for the 3rd edition)
  • practical, computational aspects:
    • "How to share data with a statistician" by Jeff Leek (procedure on GitHub), see also the advice on genomics metadata by Raphael Irrizary and "statistical consulting" by Karl Broman (slides)
    • "Exploratory Data Analysis with R" by Jennifer Bryan (free course)
    • "Tutorial on Big Data with Python" by Marcel Caraciolo (free Python notebooks)
    • interpreted languages: obviously R, but more and more Python (SciPy for NumPy, Matplotlib, and pandas, but see also scikit-learn and statsmodels), as well as others (Julia)
    • C/C++: GSL, Armadillo, Eigen, Rcpp, Stan
    • editor: obviously Emacs (language-agnostic, org-mode, etc), but also Rstudio (R-only...) and IPython (Python-only...)
  • visualizing, plotting:
    • "Visualizing uncertainty about the future" by Spiegelhalter et al. (Science 2011, DOI)
    • "Let's practice what we preach: turning tables into graphs" by Gelman et al. (The American Statistician 2002, DOI)
    • "Top ten worst graphs" by Karl Broman (webpage)
  • philosophy, history, pragmatism:
    • "Statistical analysis and the illusion of objectivity" by Berger and Berry (American Scientist 1988, DOI, pdf)
    • "Bayesian methods: general background" by E. T. Jaynes (1985, free pdf) and "Where do we stand on maximum entropy?" by E. T. Jaynes (1978, free pdf)
    • "Mathematical Models and Reality: A Constructivist Perspective" by Christian Hennig (Foundations of Science 2010, DOI)
    • "Philosophy and the practice of Bayesian statistics" by Andrew Gelman and Cosma Shalizi (British Journal of Mathematical and Statistical Psychology 2013, DOI)
    • "Statistical Inference : the Big Picture" by Robert Kass (Statistical Science 2011, DOI, free pdf on arXiv)
    • "In Praise of Simplicity not Mathematistry! Ten Simple Powerful Ideas for the Statistical Scientist" by Roderick Little (JASA 2013, DOI)
    • "Des spécificités de l’approche bayésienne et de ses justifications en statistique inférentielle" par Christian Robert (chapitre 2013, pdf gratuit sur HAL)
  • classics:
    • list from Christian Robert


Personal tools