# User:Timothee Flutre/Notebook/Postdoc/2011/11/16

(Difference between revisions)
Jump to: navigation, search
 Revision as of 15:43, 16 November 2011 (view source) (try pkg snpStats)← Previous diff Revision as of 07:52, 9 February 2014 (view source) (→About statistical modeling: add Jaynes 1985)Next diff → (17 intermediate revisions not shown.) Line 6: Line 6: | colspan="2"| | colspan="2"| - ==Entry title== + ==About statistical modeling== - * try the R/Bioconductor package [http://www.bioconductor.org/packages/devel/bioc/html/snpStats.html snpStats]: + * '''intro courses''': + ** "OpenIntro Statistics" by Diez, Barr and Cetinkaya-Rundel (free [http://www.openintro.org/stat/textbook.php textbook]) + ** "Statistics Done Wrong" by Alex Reinhart (free [http://www.refsmmat.com/statistics/ textbook]) + ** "Mixed effects models for the population approach" by Marc Lavielle and the POPIX team at INRIA (free [http://popix.lixoft.net/index.php?title=Home_page wiki]) + ** "Graphical Models" by Zoubin Ghahramani (2012, free [http://videolectures.net/mlss2012_ghahramani_graphical_models/ video & slides]) + ** [http://swirlstats.com/ swirl], R package to learn stats and R simultaneously and interactively - library(snpStats) + * '''advanced courses''': - tmp <- matrix(c(1,3,2,1,3,0,1,3,0,1), ncol=2, dimnames=list(paste("snp", 1:5, sep=""), paste("ind", 1:2, sep=""))) + ** "Advanced Data Analysis from an Elementary Point of View" by Cosma Shalizi (free [http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ book]) - tmp + ** "A First Course in Bayesian Statistical Methods" by Peter Hoff (2010, [http://www.amazon.com/gp/product/0387922997 book]) - tmp2 <- new("SnpMatrix", t(tmp)) + ** "Bayesian Data Analysis" by Andrew Gelman & co (2013, free [http://www.stat.columbia.edu/~gelman/book/slides slides], [http://www.amazon.com/dp/1439840954 3rd edition] of the book) - tmp2 + ** "Statistical Decision Theory and Bayesian Analysis" by James Berger (1993, [https://www.springer.com/statistics/statistical+theory+and+methods/book/978-0-387-96098-2 2nd edition] of the book) - summary(tmp2) + - print(as(t(tmp2), 'character')) + - print(as(t(tmp2), 'numeric')) + - Unfortunately, it doesn't seem possible to convert a matrix of characters into SnpMatrix, assuming 1=AA, 2=AB, 3=BB and 0=NC: + * '''mathematical aspects''': + ** "Introduction to Linear Algebra" by Gilbert Strang (free [http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/ videos], [http://www.amazon.com/dp/0980232716 book]) + ** "Matrix Differential Calculus with Applications in Statistics and Econometrics" by Magnus and Neudecker (2007, free [http://www.janmagnus.nl/misc/mdc2007-3rdedition pdf] for the 3rd edition) - tmp <- matrix(c("A/A","B/B","A/B","A/A","B/B","","A/A","B/B","","A/A"), ncol=2, dimnames=list(paste("snp", 1:5, sep=""), paste("ind", 1:2, sep=""))) + * '''practical, computational aspects''': - tmp + ** "How to share data with a statistician" by Jeff Leek ([https://github.com/jtleek/datasharing procedure] on GitHub), see also the [http://simplystatistics.org/2014/02/03/the-three-tables-for-genomics-collaborations/ advice] on genomics metadata by Raphael Irrizary and "statistical consulting" by Karl Broman ([http://www.biostat.wisc.edu/~kbroman/teaching/misc/consulting.pdf slides]) - tmp2 <- new("SnpMatrix", t(tmp)) + ** "Exploratory Data Analysis with R" by Jennifer Bryan (free [http://www.stat.ubc.ca/~jenny/STAT545A/2012-lectures/ course]) + ** "Tutorial on Big Data with Python" by Marcel Caraciolo (free Python [https://github.com/marcelcaraciolo/big-data-tutorial notebooks]) + ** interpreted languages: obviously [http://openwetware.org/wiki/User:Timothee_Flutre/Notebook/Postdoc/2011/11/07 R], but more and more Python ([https://en.wikipedia.org/wiki/Scipy SciPy] for NumPy, Matplotlib, and pandas, but see also [https://en.wikipedia.org/wiki/Scikit-learn scikit-learn] and [http://statsmodels.sourceforge.net/ statsmodels]), as well as others ([https://en.wikipedia.org/wiki/Julia_%28programming_language%29 Julia]) + ** C/C++: [http://en.wikipedia.org/wiki/GNU_Scientific_Library GSL], [http://en.wikipedia.org/wiki/Armadillo_%28C++_library%29 Armadillo], [http://en.wikipedia.org/wiki/Eigen_(C%2B%2B_library) Eigen], [http://www.rcpp.org/ Rcpp], [http://mc-stan.org/ Stan] + ** editor: obviously [https://openwetware.org/wiki/User:Timothee_Flutre/Notebook/Postdoc/2012/07/25 Emacs] (language-agnostic, org-mode, etc), but also [https://en.wikipedia.org/wiki/RStudio Rstudio] (R-only...) and [https://en.wikipedia.org/wiki/Ipython IPython] (Python-only...) - Thus, in the case where one has a matrix of genotypes obtained by Illumina (whether we have AA or A/A), we need to convert it first to the 1/2/3/0 encoding: + * '''visualizing, plotting''': + ** "Visualizing uncertainty about the future" by Spiegelhalter et al. (Science 2011, [http://dx.doi.org/10.1126/science.1191181 DOI]) + ** "Let's practice what we preach: turning tables into graphs" by Gelman et al. (The American Statistician 2002, [http://dx.doi.org/10.1198/000313002317572790 DOI]) + ** "Top ten worst graphs" by Karl Broman ([http://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/ webpage]) - tmp <- gsub("A/A", 1, tmp) + * '''philosophy, history, pragmatism''': - tmp <- gsub("A/B", 2, tmp) + ** "Statistical analysis and the illusion of objectivity" by Berger and Berry (American Scientist 1988, [http://dx.doi.org/10.1016/0278-2316(88)90057-6 DOI], [http://www.medicine.mcgill.ca/epidemiology/joseph/courses/EPIB-675/Berger.Berry.pdf pdf]) - tmp <- gsub("B/B", 3, tmp) + ** "Bayesian methods: general background" by E. T. Jaynes (1985, free [http://bayes.wustl.edu/etj/articles/general.background.pdf pdf]) and "Where do we stand on maximum entropy?" by E. T. Jaynes (1978, free [http://bayes.wustl.edu/etj/articles/stand.on.entropy.pdf pdf]) - tmp <- gsub("^\$", 0, tmp) + ** "Mathematical Models and Reality: A Constructivist Perspective" by Christian Hennig (Foundations of Science 2010, [http://dx.doi.org/10.1007/s10699-009-9167-x DOI]) - tmp <- matrix(as.numeric(tmp), ncol=ncol(tmp), dimnames=list(rownames(tmp), colnames(tmp))) + ** "Philosophy and the practice of Bayesian statistics" by Andrew Gelman and Cosma Shalizi (British Journal of Mathematical and Statistical Psychology 2013, [http://dx.doi.org/10.1111/j.2044-8317.2011.02037.x DOI]) - tmp + ** "Statistical Inference : the Big Picture" by Robert Kass (Statistical Science 2011, [http://dx.doi.org/10.1214/10-STS337 DOI], free [http://arxiv.org/pdf/1106.2895v2.pdf pdf] on arXiv) - tmp2 <- new("SnpMatrix", t(tmp)) + ** "In Praise of Simplicity not Mathematistry! Ten Simple Powerful Ideas for the Statistical Scientist" by Roderick Little (JASA 2013, [http://dx.doi.org/10.1080/01621459.2013.787932 DOI]) - tmp2 + ** "Des spécificités de l’approche bayésienne et de ses justifications en statistique inférentielle" par Christian Robert (chapitre 2013, [http://hal.archives-ouvertes.fr/docs/00/87/01/24/PDF/Bayes.pdf pdf] gratuit sur HAL) - summary(tmp2) + - Then, one can easily look at summary statistics, eg. the histogram of minor allele frequencies, of z-score for HWE, etc, and filter data accordingly: + * '''classics''': - + ** [https://www.ceremade.dauphine.fr/~xian/M2classics.html list] from Christian Robert - hist(col.summary(tmp2)\$MAF) + - hist(col.summary(tmp2)\$z.HWE) + + * '''litterature, community''': + ** [http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos Annals of Statistics], [http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1467-9868 JRSSB], [http://www.tandfonline.com/toc/uasa20/current JASA], [http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoas Annals of Applied Statistics], [http://ba.stat.cmu.edu/ Bayesian Analysis], [http://jmlr.org/ JMRL], [http://books.nips.cc/ NIPS] + ** [http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1541-0420 Biometrics], [http://biostatistics.oxfordjournals.org/ Biostatistics], [http://biomet.oxfordjournals.org/ Biometrika] + ** [http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.ss Statistical Science], [http://amstat.tandfonline.com/loi/tas#.UsrQx47_7Gg The American Statistician] + ** see also on [http://projecteuclid.org/ Project Euclid] and [http://arxiv.org/archive/stat arXiv] + ** blogs: [http://andrewgelman.com/ Andrew Gelman], [http://xianblog.wordpress.com/ Christian Robert], [http://normaldeviate.wordpress.com/ Larry Wasserman] + ** links with society: [http://onlinelibrary.wiley.com/journal/10.1111/%28ISSN%291467-985X JRSSA], Statistique et Société (free [http://publications-sfds.fr/index.php/stat_soc/index pdfs])

## Revision as of 07:52, 9 February 2014

Project name Main project page
Previous entry      Next entry

## About statistical modeling

• intro courses:
• "OpenIntro Statistics" by Diez, Barr and Cetinkaya-Rundel (free textbook)
• "Statistics Done Wrong" by Alex Reinhart (free textbook)
• "Mixed effects models for the population approach" by Marc Lavielle and the POPIX team at INRIA (free wiki)
• "Graphical Models" by Zoubin Ghahramani (2012, free video & slides)
• swirl, R package to learn stats and R simultaneously and interactively
• advanced courses:
• "Advanced Data Analysis from an Elementary Point of View" by Cosma Shalizi (free book)
• "A First Course in Bayesian Statistical Methods" by Peter Hoff (2010, book)
• "Bayesian Data Analysis" by Andrew Gelman & co (2013, free slides, 3rd edition of the book)
• "Statistical Decision Theory and Bayesian Analysis" by James Berger (1993, 2nd edition of the book)
• mathematical aspects:
• "Introduction to Linear Algebra" by Gilbert Strang (free videos, book)
• "Matrix Differential Calculus with Applications in Statistics and Econometrics" by Magnus and Neudecker (2007, free pdf for the 3rd edition)
• practical, computational aspects:
• "How to share data with a statistician" by Jeff Leek (procedure on GitHub), see also the advice on genomics metadata by Raphael Irrizary and "statistical consulting" by Karl Broman (slides)
• "Exploratory Data Analysis with R" by Jennifer Bryan (free course)
• "Tutorial on Big Data with Python" by Marcel Caraciolo (free Python notebooks)
• interpreted languages: obviously R, but more and more Python (SciPy for NumPy, Matplotlib, and pandas, but see also scikit-learn and statsmodels), as well as others (Julia)
• C/C++: GSL, Armadillo, Eigen, Rcpp, Stan
• editor: obviously Emacs (language-agnostic, org-mode, etc), but also Rstudio (R-only...) and IPython (Python-only...)
• visualizing, plotting:
• "Visualizing uncertainty about the future" by Spiegelhalter et al. (Science 2011, DOI)
• "Let's practice what we preach: turning tables into graphs" by Gelman et al. (The American Statistician 2002, DOI)
• "Top ten worst graphs" by Karl Broman (webpage)
• philosophy, history, pragmatism:
• "Statistical analysis and the illusion of objectivity" by Berger and Berry (American Scientist 1988, DOI, pdf)
• "Bayesian methods: general background" by E. T. Jaynes (1985, free pdf) and "Where do we stand on maximum entropy?" by E. T. Jaynes (1978, free pdf)
• "Mathematical Models and Reality: A Constructivist Perspective" by Christian Hennig (Foundations of Science 2010, DOI)
• "Philosophy and the practice of Bayesian statistics" by Andrew Gelman and Cosma Shalizi (British Journal of Mathematical and Statistical Psychology 2013, DOI)
• "Statistical Inference : the Big Picture" by Robert Kass (Statistical Science 2011, DOI, free pdf on arXiv)
• "In Praise of Simplicity not Mathematistry! Ten Simple Powerful Ideas for the Statistical Scientist" by Roderick Little (JASA 2013, DOI)
• "Des spécificités de l’approche bayésienne et de ses justifications en statistique inférentielle" par Christian Robert (chapitre 2013, pdf gratuit sur HAL)
• classics:
• list from Christian Robert