Dahlquist:BOSC ISMB 2016 Notes
From OpenWetWare
Jump to navigationJump to search
Notes from the Bioinformatics Open Source Conference (BOSC) 2016, NetBio SIG 2016, and Intelligent Systems for Molecular Biology (ISMB) 2016 held in Orlando, Florida from July 8-12, 2016.
BOSC/NetBio SIG Day 1 2016-07-08
NetBio SIG
- Megan Crow spoke at NetBio Sig on Single-cell gene networks from co-expression
- mentioned review on single cell RNA Seq, Grün, D., & van Oudenaarden, A. (2015). Design and analysis of single-cell sequencing experiments. Cell, 163(4), 799-810.
- bar code RNAs before PCR amplification
- MultiQC: Aggregate results from bioinformatics analyses across many samples into a single report
- Anastasia Baryshnikova: Systematic Functional Annotation of the 2016 Yeast Genetic Interaction Network (NetBio SIG)
- Database of Systematic Phenotypes in Yeast
- Baryshnikova, A. (2016). Systematic Functional Annotation and Visualization of Biological Networks. Cell systems.
- Baryshnikova, A. (2016). Exploratory Analysis of Biological Networks through Visualization, Clustering, and Functional Annotation in Cytoscape. Cold Spring Harbor Protocols, 2016(6), pdb-prot077644.
- SAFE: spatial analysis of functional enrichment
- sum of edges in a neighborhood a certain distance around a particular node
- different GO terms show different enrichment landscapes, region-specific vs. multiregional, helps to find GO terms at a certain level of specificity (could have applications for "intelligent" trimming of GO to a GO slim)
- Idea for GRNsight: use the weight parameters to influence the force graph layout of the network. Larger weights bring nodes closer together; smaller weights push nodes further apart
- Olga Troyanskaya: Gene Function and Regulation in Biological Networks (NetBio SIG)
- Warning to think about negatives in machine learning. Machine learning will learn the simplest case, so need to be careful to define it so that it doesn't do this.
- Barry Demchak: Cytoscape Cyberinfrastructure: Quality Network Analysis Done Quicker and Cheaper (NetBio SIG)
- microservice approach, late binding, scalability, reusability, language choice
- NDEx (the Network Data Exchange) is an open source software platform where scientists and organizations can share, store, manipulate and publish biological network knowledge. Users can take advantage of the free Public NDEx Server while companies and organizations can decide to install their own NDEx Server application locally. The NDEx Project is developed in tight association with the Cytoscape project.
- ELSA load balancing and queuer
BOSC
- Blog summary of BOSC at GigaScience
- BOSC Keynote: Jennifer Gardy who is an Assistant Professor of Population and Public Health at the University of British Columbia and a Senior Scientist at the British Columbia Centre for Disease Control (BCCDC)
- Slides: The open-source outbreak: can datga prevent the next pandemic?
- "We have a strategic plan. It’s called doing things." – Herb Kelleher (Southwest Airlines)
- Petersen, T. N., Rasmussen, S., Hasman, H., Carøe, C., Bælum, J., Schultz, A. C., ... & Aarestrup, F. M. (2015). Meta-genomic analysis of toilet waste from long distance flights; a step towards global surveillance of infectious diseases and antimicrobial resistance. Scientific reports, 5.
- "poopulation"
- EpiCollect.net: Mobile / Web Application for Smartphone data collection
- can we infer a transmission tree from a phylogenetic tree?
- Investigate handsontable, a JavaScript Excel-like Spreadsheet library as something that might be useful for GRNsight or GRNmap in the future.
- Biodocker.org
- Common Workflow Language
- NextFlow Workbench, Fabien Campagne slides
- researchobjects.org: The Genomics Knowledge Platform (GKP)
- GKP is a scalable knowledge architecture. At full-scale, it offers a cost effective way of integrating multiple and vast data sets into a comprehensive, modular, and extensible system. The Genomics Knowledge Platform has arisen in response to the issues facing the genomic research community today. Its manifesto includes:
- Horizontal integration of multiple, varied data sources
- Scientific frame of reference for problem definition
- An easy to operate, powerful query and analysis toolset that is highly adaptable
- Data management capable of handling the data volume associated with drug discovery
- Collaborative data access and results sharing
- Fully customizable componentry
- OpenStand: global advocates for open standards and technology development
- Investigate uptime robot for GRNsight
- GenomeSpace: Frictionless connection of bioinformatics tools
- workflow, pipeline
- has written connectors converting data formats between tools (not all to all, but ones that make sense and take requests)
- MISO: An open source LIMS for small-to-large scale sequencing centres
- NGSEP: open source comprehensive solution to high throughput sequencing data: poster/lightning talk by Jorge Duitama
- potential use with students, good tutorial
Other
- GOTrack
- tracks GO over time, can compare analyses with different version of GO to explore the stability and reliability of GO Enrichment hit lists!
- Gillis, J., & Pavlidis, P. (2013). Assessing identity, redundancy and confounds in Gene Ontology annotations over time. Bioinformatics, bts727.
- found this when looking for GOTrack Pascale Gaudet and Christophe Dessimoz Gene Ontology: Pitfalls, Biases, Remedies
BOSC SIG Day 2 2016-07-09
- MetaR from Fabien Campagne lab
- potentially collaborate on a grant; broader impacts of teaching MetaR to undergrads
- GenePattern Notebooks
- an integrative analytical environment for genomic research; can do RNAseq analysis
- available on Indiana University supercomputer cluster
- ReportMD
- uses R markdown to generate linked HTML reports
- Talk by Nils Gehlenborg: Reproducible Research in the Cloud with the Refinery Platform
- Refinery Platform
- provenance graphs
- Beaulieu-Jones, B. K., & Greene, C. S. (2016). Reproducible Computational Workflows with Continuous Analysis. bioRxiv, 056473.
- Talk by Abigail Cabunoc Mayes: Collaborative Software Development: Lessons from Open Science
- Mozilla Science Lab, including link to fellowship
- Working Open Workshop
Inclusion and Diversity
ISMB Day 1 2016-07-10
Talks
- COSI/SIG SysMod (formerly BioPathways)
- iPath: Interactive Pathways Explorer
- Talk by Lars Juhl Jensen
- STRING-DB, when comparing data from different protein-protein interaction data, need to calibrate raw quality scores against gold standard (KEGG), see von Mering et al 2005
- text-mining snafu, human gene nomenclature committee approved human gene called "SDS"
- Talk by Natasa Przulj
- http://www.nature.com/articles/srep04547 [Yaveroğlu, Ö. N., Malod-Dognin, N., Davis, D., Levnajic, Z., Janjic, V., Karapandza, R., ... & Pržulj, N. (2014). Revealing the hidden language of complex networks. Scientific reports, 4.]
- small motifs/legos in networks
- Nataša Pržulj, Noël Malod-Dognin (2016) Network analytics in the age of big data Science 08 Jul 2016:Vol. 353, Issue 6295, pp. 123-124 DOI: 10.1126/science.aah3449
- http://www.nature.com/articles/srep04547 [Yaveroğlu, Ö. N., Malod-Dognin, N., Davis, D., Levnajic, Z., Janjic, V., Karapandza, R., ... & Pržulj, N. (2014). Revealing the hidden language of complex networks. Scientific reports, 4.]
- Oxford Journal Biology Methods and Protocols
- talked to Jennifer Boyd at the booth and suggested the ability to update published protocols with new versions.
Posters
- N13 - Resampling-Based Read-Level Normalization of RNA-Seq for Differential Expression Analysis by Gregory Grant Lab, University of Pennsylvania
- E05 - Reliable differential expression calls across labs, by use of a simple reference sample by Paweł P. Łabaj, Chair of Bioinformatics, Boku University Vienna, Austria and David P. Kreil, Chair of Bioinformatics, Boku University Vienna, Austria
- Short Abstract copied from ISMB 2016 site here.
- Genome-scale expression profiling has become a key tool of functional genomics, critically supporting progress in molecular biology and biomedical research in the post-genomic era. The deduction of gene function remains a major bottleneck in improving our understanding of living systems at the molecular level. Typical applications include the acceleration of unbiased genome-wide screens for candidate genes that are implicated in phenotypes and processes of interest by differential expression calling. The rapid improvement of next generation sequencing (NGS) platforms has triggered a wave of new findings based on whole transcriptome sequencing (RNA-Seq). NGS technology, however, has been shown to suffer from different sources of unwanted variation affecting interpretation of the results. In the controlled setup of the SEQC benchmark study, we have recently shown that unwanted variation is largely due to library preparation. Appropriate tools for factor analysis like PEER or SVASeq can identify and remove confounding factors. With such corrections for site effects we could improve specificity without any loss of sensitivity. Going beyond comparisons in the original SEQC study, we here present results for a range of realistic effect strengths. Moreover, we demonstrate the benefits that can be gained by analysing novel results in the context of other experiments. In particular, use of a standardized reference sample much improves reliability across labs.
- Short Abstract copied from ISMB 2016 site here.
- O05 - Learning biological networks from gene knockdown data by Yuriy Sverchkov, University of Wisconsin--Madison, United States of America
- O71 - Assessing the Differential Significance of Transcription Factors by Leslie D. Seitz, Fairview High School
- O69 - Differentially expressed genes are not uniformly distributed by Debra Goldberg, University of Colorado
- O83 - LoTo: A Method for the Comparison of Local Topology between Gene Regulatory Networks by Tomas Perez-Acle, Computational Biology Lab. Fundación Ciencia para la Vida and and Centro Interdisciplinario de Neurociencia de Valparaiso, Chile
Books
- Gandrud, C. (2013). Chapman & Hall/CRC The R Series : Reproducible Research with R and R Studio. Bosa Roca, US: CRC Press. Retrieved from http://electra.lmu.edu:2110
- library has e-book, but it looks like it is the first edition instead of second edition, limited to a certain number of pages
- Korpelainen, E., Tuimala, J., & Somervuo, P. (2014). Chapman & Hall/CRC Mathematical and Computational Biology : RNA-seq Data Analysis : A Practical Approach. Boca Raton, US: Chapman and Hall/CRC.
- library has e-book, limited to a certain number of pages.
ISMB Day 2 2016-07-11
Workshop on Education in Bioinformatics (WEB)
- Phil Bourne: The NIH Commons: A Cloud-based Training Environment
- Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., ... & Bouwman, J. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific data, 3.
- NIH Data Commons
- Instead of funding individual investigators for computational infrastructure, give them credit to use NIH cloud instead, would include training.
- Nirav Merchant: How to Scale Science and People Using the Cloud (slides available after conference on ISMB app)
- University of Arizona
- "computational thinking had become central to our course work"
- local, in-house computational expert: "victim of their own success (denial of service attacks on their availability)"
- users want (chair), send them to home depot
- Democratizing Innovation, Eric Von Hippel
- need managed cloud: CyVerse Atmosphere, iPlant Collaborative, Jetstream
- Matthew Vaughn: Packaging computational biology tools for broad distribution and ease-of-reuse (need to get slides)
- Panel includes these three plus Annette McGrath of the Australian Bioinformatics Network
Talks
- Keynote: Sandrine Dudoit
- SCONE - [R] - SCONE (Single-Cell Overview of Normalized Expression), a package for single-cell RNA-seq data quality control (QC) and normalization. This data-driven framework uses summaries of expression data to assess the efficacy of normalization workflows.f
- SCONE one page description
- GitHub repository
- David Gibbs: Solving the influence maximization problem on biological networks; a case study involving the cell cycle regulatory network in Saccharomyces cerevisiae
- Scott Simpkins: Scalable tools for quantitative analysis from sequencing-based chemical-genetic genetic interaction screens
- BEAN-counter for processing barcode sequencing data from multiplexed experiments. Originally designed for chemical genomics experiments performed in the Myers/Boone Labs, it is applicable to any experiment in which pools of genetically barcoded cells are grown under different conditions, with the resulting barcode DNA isolated from those cells combined into one 2nd-gen sequencing run via the use of indexed PCR primers.
- not open source, must obtain license from http://license.umn.edu/technologies/20170001_bean-counter-quantitative-scoring-of-chemical-genetic-interactions
- map index tag to condition
- parse fastq file
- interaction z score
- batch effect correction (unsupervised or ?)
- visualization
- look out for "variance as a function of gene counts"
- BEAN-counter for processing barcode sequencing data from multiplexed experiments. Originally designed for chemical genomics experiments performed in the Myers/Boone Labs, it is applicable to any experiment in which pools of genetically barcoded cells are grown under different conditions, with the resulting barcode DNA isolated from those cells combined into one 2nd-gen sequencing run via the use of indexed PCR primers.
Posters
- O30 - Explicit Modeling of Differential RNA Stability Improves Inference of Transcription Regulation Networks by Konstantine Tchourine , NYU - Center for Genomics and Systems Biology
- when using YEASTRACT data, require 1 direct and 2 direct evidence
- Kemmeran KO dataset
- Sameith, K., Amini, S., Koerkamp, M. J. G., van Leenen, D., Brok, M., Brabers, N., ... & Apweiler, E. (2015). A high-resolution gene expression atlas of epistasis between gene-specific transcription factors exposes potential mechanisms for genetic interactions. BMC biology, 13(1), 1.
- Ma, S., Kemmeren, P., Gresham, D., & Statnikov, A. (2014). De-novo learning of genome-scale regulatory networks in S. cerevisiae. Plos one, 9(9), e106479.
- Zheng, J., Benschop, J. J., Shales, M., Kemmeren, P., Greenblatt, J., Cagney, G., ... & Krogan, N. J. (2010). Epistatic relationships reveal the functional organization of yeast transcription factors. Molecular systems biology, 6(1), 420.
- Neymotin, B., Athanasiadou, R., & Gresham, D. (2014). Determination of in vivo RNA kinetics using RATE-seq. Rna, 20(10), 1645-1652.--key paper
- DTA, 4-thiouracil
- Shalem et al. (2008) Molecular Systems Biology--I already have this paper
- Munchel, S. E., Shultzaberger, R. K., Takizawa, N., & Weis, K. (2011). Dynamic profiling of mRNA turnover reveals gene-specific and system-wide regulation of mRNA decay. Molecular biology of the cell, 22(15), 2787-2795.
- O54 - AuPairWise: biologically focused RNA-seq quality control using co-expression by Sara Ballouz, Cold Spring Harbor Laboratory
- published in Ballouz, S., & Gillis, J. (2016). AuPairWise: a method to estimate RNA-seq replicability through co-expression. PLoS Comput Biol, 12(4), e1004868.
- mentions Seqc/Maqc-Iii Consortium. (2014). A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nature biotechnology, 32(9), 903-914.
- recommendations to discard low expressing genes (1/3 of measured dynamic range)
- discard genes with lower log fold changes between conditions < log2FC 1~2
- want self-correlation of replicates across conditions, but when replicates are not available, use co-expressed gene pairs
- has list of housekeeping genes for humans
- in discussion with another delegate, recommended a minimum of 7 replicates, randomized to reduce batch effects
- O60 - Title: FAIRDOM: Data and Model Management for all by Natalie Stanford, University of Manchester, United Kingdom of Great Britain and Northern Ireland
- http://researchobjects.org/
- http://fair-dom.org/
- FAIR: Findable, Accessible, Interoperable, Reusable
- N14 - Benchmark Analysis of RNA-Seq Aligners by Gregory Grant, University of Pennsylvania
- Sent me PDF of poster via e-mail
- bottom line, Tophat performs poorly, but is the most popular alignment programs
- B20 - The impact of amplification on differential expression analyses by RNA-seq by Swati Parekh, Ludwig-Maximilians University Munich
- E14 - Cross-platform normalization of microarray and RNA-seq data for machine learning applications by Jeffrey Thompson, Dartmouth College
- [Jeffrey A. Thompson et al.. (2015). Training Distribution Matching (TDM) R Package. Zenodo. 10.5281/zenodo.32852 http://zenodo.org/record/32852#.V4TwGaJFqug]
- note use of research/data sharing site that gives doi
- [Jeffrey A. Thompson et al.. (2015). Training Distribution Matching (TDM) R Package. Zenodo. 10.5281/zenodo.32852 http://zenodo.org/record/32852#.V4TwGaJFqug]
- Spoke with Dominik Otto at my poster
- He referred me to a paper by his mentor:
- Fiedler, B., Mochizuki, A., Kurosawa, G., & Saito, D. (2013). Dynamics and control at feedback vertex sets. I: Informative and determining nodes in regulatory networks. Journal of Dynamics and Differential Equations, 25(3), 563-604.
- Mochizuki, A., Fiedler, B., Kurosawa, G., & Saito, D. (2013). Dynamics and control at feedback vertex sets. II: A faithful monitor to determine the diversity of molecular activities in regulatory networks. Journal of theoretical biology, 335, 130-146.
- He referred me to a paper by his mentor:
Other notes
- Goblet: Global Organisation for Bioinformatics Learning, Education & Training
- Android 6 marshmallow allows you to deny access to apps (like location, etc.)
- CatterPlots!!!!!
- Conesa, A., Madrigal, P., Tarazona, S., Gomez-Cabrero, D., Cervera, A., McPherson, A., ... & Mortazavi, A. (2016). A survey of best practices for RNA-seq data analysis. Genome biology, 17(1), 1.
- List of articles in WikiProject Computational Biology
- Biology Methods & Protocols
- new journal from Oxford University Press
- ScienceGateways.org
- online community for science and engineering research and education
- web resource for accessing data, software, computing services, and equipment specific to the needs of a science or engineering discipline