User:Robert M. MacCallum/WTFGSB Reportback

From OpenWetWare
Jump to navigationJump to search


Welcome Trust Functional Genomics and Systems Biology Workshop

30 November to 1 December 2009

More details and programme

A few talks have no notes, usually because they were too specific.

Day one

Edison Liu: ‘Integrative Study of Estrogen Receptor Biology in Human Cancer’

Estrogen (or is it EGF) receptor (ER) binding site analysis (ChIP and bioinf) - "Cosmic" score, correlation with RNA PolII binding and H3K4meX marks.

Some functional binding is 1Mb away from gene!! Only 9% in 5k "promoter".

Cool ChIA-PET (ChiA-seq) method to determine chromosomal loops.Ideas.png

Looping for efficient transcription, grouping of coregulated genes ("looped out" genes don't respond to ER)

Johan Rung: ‘A multi-stage genome-wide association study detects a novel risk locus near IRS1 for type 2 diabetes, insulin resistance, and hyperinsulinemia’

GWAS for type 2 diabetes

F Pradezynski: ‘Systems Level Approach of Hepatitis C Virus Infection’

Y2H between various virus proteomes and human proteins.

Many human pathways interfered with, in particular the ones you'd expect (interferon reponse)

Seems to be a remarkable number of targets (100s) from such a few viral proteins.Ideas.png

Chris Bakal: ‘Describing the Systems Architecture of Cell Morphogenesis’

Wounding, cell morphology, image analysis -> 100+ feature profile of cell's, morphology.

"canalised" morphology space (jumps between states)

Keith Baggerly: ‘The Importance Of Reproducibility In High-Throughput Biology: A Case Study’

Reproducibility in hi-thru biology

This was a fascinating story of a genuine attempt to reproduce a diagnostic/predictive approach using microarray data (for sensitivity to cancer drugs).Ideas.png

The data was in GEO but when analysed again, the gene lists, heat maps etc were completely different.

Eventually an "off by one" error was found, caused, equally, by pasting data from excel and the non-existence of documentation for the software (R package).

Later papers from the offending authors had further errors (mislabeled drugs, repeated figures from earlier work). Letters to the editor were responded with "we did it again and got the same results" (can you believe it!).

In the end, the study had led to clinical trials and so Baggerly and colleagues published a proper paper exposing the problems in a statistical journal. Soon after that the medical journals were on the case and the trials were stopped.

Nick Luscombe: ‘Nucleoporins, chromosomal organisation and gene regulation.’

Nuclear lamins known to tether transcriptionally inactive DNA

Nucleoporins now shown to be assoc with active gene expression.

Also through ChIP some proteins bind to enable X chromosome dosage compensation.

Mark Gerstein: ‘Understanding Protein Function on a Genome-scale using Networks’

A review of several years' network work. Including some Venter ocean sample sequence analysis (map to pathways, correlate with environmental factors with some canonical ..... method (is this like bi-clustering?))

Yoram Louzoun: 'Immunomic analysis of viruses CD8+ T cell epitope repertoire'

Not in programme.

Mentioned an epitope prediction approach called SIR (Size of Immune Repertoire score) which models MHC peptide binding.

Early viral proteins have less epitopes than late proteins.Ideas.png

Some scheduled speakers didn't speak in this session.

Day two, session one

Seth Grant: ‘System Biology of The Synapse and Behaviour’

Complexity of post-synaptic molecular machinery (several thousand proteins). Conserved in invertebrates (50% of prots) and single celled (25%). Evolution of the machinery (including plasticity) preceded evolution of synapses.Ideas.png

Very slow evolution.

Many diseases.

Caleb Webber: ‘Identifying CNV genes that contribute to developmental delay and autism’

CNV in mouse

What's special about pathological CNVs? (vs. benign)

Human CNVs look up mouse phenotypes (somehow!)


(Didn't follow this very frenetic talk 100%)

Florian Markowetz: ‘Mapping Dynamic Histone Acetylation Patterns to Gene Expression in Nanog-depleted Murine Embryonic Stem Cells’

ES cell histone modifications

days 1 3 5 of ES development - 4 analyses

Protein MS ChIP-chip histone Rna pol II Microarrays

day 0 nanog TF downreg -> network of TFs

clustering of smoothed histone profiles (around TSS)

when mRNA upreg, small local acetylation around TSS when mRNA down, wider deacetylation around TSS.

increased correlation between H acet and gene expression through time (more at day 5 than day 1) genome-wide

predict gene expr from histone acetylation using LOTS of ML methods (in R)

Grant Belgard: ‘Transcriptome-Wide Functional Anatomy of Mouse Cortical Layering Revealed Through Deep Sequencing’

brain transcriptomics

by sequencing

6 layers of neocortex

many cell types spanning several layers

paired end 50bp reads

(you get some intronic reads)

some intergenic regions detected (a few percent of reads)

layer specific genes, various layers show various GO enrichments.

John Hogenesch: ‘A journey through the clock network’

Circadian clock genes through hi-thru func genomics. nice robot video.

siRNA screen (seems to be tunable to desired knockdown level)

clock pathway is robust - surprising lack of lethal knock outs

Day two, session two

Peter Hoen: ‘Functional Genomics as a Readout In Therapy Development’

(standing in for Gert-Jan van Ommen)

Duchenne muscular dystrophy

antisense therapy

Andrew Teschendorff: ‘Pathway-Centric Classification of Breast Cancer’

classification of breast cancer

Dan Geschwind: ‘Human-Specific Transcriptional Regulation of Cns Development Genes By Foxp2’

transcriptional regulation of CNS development genes by FoxP2

looked at human vs chimp regulation of genes (microarray) in a cell line.

many genes respond differently (up and down)

But why? The 2 AA diffs are not in known DNA binding domain

6 genes regulated via proximal promoter (luciferase reporter)

validated in vivo

haNCS human accelerated non coding sequences (look this up)

Horvath weighted gene co-expression network analysis. WGCNAIdeas.png

Recent paper showing two mitochondrial network types in neurons (synaptic and cell body)

Compare human vs chimp networks

Douglas Kell: ‘The cellular uptake of pharmaceutical drugs: a problem not of biophysics but of systems biology’

Suit and tie alert!

networks described in unambiguous fashion, SBML, ChEBI SMILES etc for small molecules.

uptake of drugs, via transporters (proteins).

Day two, session three

Genevieve Konopka: ‘Comparative Gene Expression in Primate Brain Using Nextgen Sequencing’

Language genes

Can't do multi-species (human, chimp, macaque) on a human affy chip.

Next gen sequencing! Four brain regions.

"Sequencing wins"

Networks from WGCNA

Tom Freeman: ‘Identification of Expression Networks in Immunity’

Networks in immunity

focus: macrophage

mentioned proteasome (did I see that on VB expression map wrt immunity?)

graphical markup for pathways

some kind of flow simulations through them

biolayout express software - looks good (has enrichment analysis built in)Ideas.png

Day two, session four

Frank Holstege: ‘Understanding regulatory circuitry through expression-profile phenotypes’


1200 regulatory components, TFs, kinases, ch remodelers, RNA processing -> mutations and expression microarrays

GASSCO dye correction algorithm (two colour!)Ideas.png

done so far deletome

some kinases have no diff expr, is it because they are inactive in standard conditions or is it because of redundancy?

The use some synthetic genetic interaction prediction to choose pairs

find signals!

some kinases redundant with phosphatase! it's cross talk between two pathways (somehow).

different types of redundancy:

  1. complete
  2. quantitative (double has more effect than single(s))
  3. incongruent (effects in single are not in double)

also used the data for protein complex prediction

Stefan Weimann: ‘Modeling and Experimental Testing of Cell Cycle Regulation by the Erbb- Protein and Mirna Network in Breast Cancer’

new targets for drug resistant breast cancer

ErbB signalling network

the drug is an ErbB2 antibody

Louis Serrano: ‘Systems Biology of a Small Bacterium’

Mycoplasma pneumoniae

689 ORFs + 44 RNAs

free living

maybe only 10-11 TFs (E. coli 100 or so)

full complement of chromatin remodelling

plan was to do loads of -omics + electron microscopy

transcriptomics: arrays 62 conditions, tiling array

detailed look at transcripts (reverse strand ncRNA, no idea of mechanism) multiple TSSs

where you have operons encoding 4 genes, you don't just see mRNA of all four, you get different levels of each gene, somehow...

same SOS response as subtilis, but without the TFs! very interestingIdeas.png

plenty of regulatory complexity

metabolome: KEGG didn't work out, had to do lots of manual work to build metabolic map. defined minimal medium.

know reactions are there, but 10-12 enzymes are not known

200 molecules per protein per cell

so small that you're "living in a stochastic world" - each reaction is like rolling a dice, how does it survive is an interesting question.Ideas.png

Day three, session one

Jurg Bahler: ‘Differential marking of intronic and exonic DNA regions with respect to RNA polymerase II occupancy, histone density, and H3K36me3 MODIFICATION patterns’

Pre-post splicing levels measured with RNA seq.

Splicing efficiency regulated

Co-transcriptional splicing. Look for relationship between splicing and chromatin - H3K36me3 lower in introns.

Measured transcript levels after transcription blocking compound - measure decay, however drugs have side-effects. Better to measure PolII occupancy and RNA abundance and estimate decay with a formula.

Looked at response to oxidative stress. See patterns of expression with stable transcription.

Jaak Vilo: ‘Network reconstruction and mining of high-throughput data’

Network reconstruction. Analysis tools. GraphWeb NAR db issue.

"MEM" query similar expression in multiple datasets. You could put everything in together (like VB expr maps) but there could be crap data included. Instead they do a post-analysis of separate queries (one per dataset) using ranks. P-value for enrichment of low ranks. "Low" depends on a rank threshold - try all and find lowest p-value.

"multi-experiment matrix"Ideas.png

web tool may have anopheles affy data (they get it from ArrayExpress) - it does but can't figure out which gene or probe symbols to query with!

really nice annotation cloud mouse-over!

Adler et al Genome Biology 2009 in press

Annelies Fieuw: ‘Integrative analysis of coding and non-coding gene expression and copy numbers in neuroblastoma’


look for more genes implicated in pathology

looked for correlated m(i)RNA expression and genome copy number

Geoffrey Faulkner: ‘Transposed Elements are Massively Transcribed in Mammalian Cells’


1/2 human genome. only 100 mobile though. mostly Alu SINEs and L1 LINEs. something about neurons recently in Nature

plenty more immobile

most transposon insertions are incomplete and disrupt genes.Ideas.png

CAGE - 25bp 5' tags, somehow find TE promoters through sequencing.

the TEs could provide alternative TSSsIdeas.png 100,000 possible 700 validated with ESTs about half validated with another sequencing run

see correlated expression between TE and nearby gene

likely to be positive regulators

RNAi against TE transcripts - have phenotypes (myoblast morphology)


Eileen Furlong: ’Making global predictions of cis-regulatory activity’

need map of cis-regulatory elements and their inputs

lots of chip-chip through development.

usually two antibodies for each TF (for consensus) and strict FDR

>19k peaks!

look for combinatorial binding

they've found 8000 CRMs (for mesoderm development) 2000 target genes

further expts determine +ve or -ve effect of CRMs

recommends Nature Genetics v36 2006, Reinitz group - models of TF neworks for eve stripe 2 enhancer

80% of literature CRMs are in the chip set ("atlas")

predict 5 different expression pattern from binding signatures with SVM, predict expression of chip CRMs and I guess validate experimentally, 4 classes 80% validated.

One combinatorial TF binding code does not have one output (from their data)Ideas.png

Day three, session two

Wolfgang Huber: ‘Detecting genetic interactions and multiparametric dynamic phenotypes in RNAi perturbation microscopy imaging assays’

transcriptome characterisation in yeast

3' nucleosome depleted regions

nucleosome depleted regions are shared in bidirectional promoters.

in yeast, antisense transcripts interfere with sense, also interferes with H modification (somehow) and also activating transcription

48 (wild?) strains and transcriptomics ((look for association with SNPs))

genes with antisense xscripts have more diff regulation (across strains, env conditions, species)Ideas.png

genes with antisense are more often OFF

anticorrelation of sense/antisense xscripts

is it "opportunistic" transcription? (pol finds open chromatin and xscribes) (Stuhl review?)

more TF binding sites for coding than antisense

strain data QTL for expression

  • coding: 75% distal effects
  • antisense: 50/50 local/distal (not sure of interpretation)

transcripts usually extend 100bp into the promoter of the opposite transcript. this is necessary for anti-correlated xscript levels

TATA also contributes to regulated expression

"likely true for higher euks"Ideas.png

no evidence for translation of reverse xscripts so far

Felix Naef: ‘Rhythmic protein-DNA interactomes and circadian transcription regulatory networks’

understand gene expression programs under circadian control (e.g. genes expressed in heart at 10am, same kind of thing in liver)

known "E box" element (driven by TF BMAL1?) driven by circadian system, but it can't drive all downstream effects because of their timing - is there another element?

something special about tandem arrangement of E-box element. CLOCK/BMAL1 heterodimer binding

Bussemaker 2001 cis-reg network algorithm?

Not solved phase specifity problem yet.

Caroline Brorsson: ‘A Genome-Wide SNPxSNP Search for Epistasis Identifies Gene-Gene Interactions in Type 1 Diabetes’

Type 1 diabetes

12000 cases, 13000 controls -> 40 regions associated with T1D

mostly immune genes

small effects of each variant

looking for epistasis in GWAS

Mark McCarthy: ‘The End of the Beginning: Genetic Success and the Long Road to Functional Inference’


plenty of loci found now (~20?) but no decent prediction, AUC with variants only = 0.6, with BMI+Age=0.78! explains around 5% of predisposition

agilent array for CNVs, none found for T2D or some other diseases.

Current focus is on common variations - maybe we should look for rarer (1%) variations...Ideas.png

Day three, session three

Matthias Uhlen: ‘Human Proteome Atlas’

goal: antibodies against all proteins

very important resource, look it up.Ideas.png

have got an epitope predictor - training data available

working towards a subcellular "index" for all proteins.

first draft proteome by 2014

long term goal, paired antibodies (shouldn't rely on one!)

all Abs available through sigma.

antibody specificity; proteins 10^7 dynamic range in cells; paired antibodies against different epitopes make it more specific

looking into FRET paired antibodies - seems to work.

don't have paired for the majority

some kind of validation with commercial abs, 40% work nicely??? wiki based community based validation of antibodies.

systems level analysis:

how many proteins are tissue specific (one cell type) < 2%

or at a larger level (say, brain) = 10%

some new tissue specific genes found though.

how many prots in a cell? ~65% of all

brain had fewest

levels of proteins do distinguish cell types

antibody "array" using beads, multiplexed 384 abs x 384 samples in one run!

used on blood plasma - personalised medicine and biomarkers

going to do 20 diseases (400 patients per disease) biobanks

George Koumbaris: ‘X-chromosome disorders: Identification of underlying mechanisms’

breakpoint analysis of X-chromosome disorders

Jean-Baptise Cazier: ‘Methodological Aspects of Metabonome Quantitative Trait Locus Mapping in Organ Extracts using Nuclear Magnetic Resonance Profiling’

metabolome from fat, plasma, urine, etc with NMR spectra (40,000 points)

  1. through time: (no liver)
  2. multiple animals
  3. treatments
  4. species


300 SNPs

found a locus (on chr14) and a metabolite (benzoate, a gut microbial metabolite)

Gavin Sherlock: ‘Molecular Characterization of the Fitness Landscape in Asexually Evolving Populations of Saccharomyces Cerevisiae'

yeast evolution (in the lab)

count and isolate clones in a population.

R Y and G dies in population

equal proportions, glucose limitation (for selection)

haploid, no sex, several 100 generations (I think)

observe "clonal interference"

isolate clones (facs and then split into 7 then fitness measure then solexa sequencing)

see mutants and amplifications (saw hexose transporter amplic)

clones arising later have more mutations

each of 5 lineages were distinct (no shared mutations) even if same colour

some mutations may be hitchhikers (not adaptive) ; some kind of sex under selection somehow lets you figure out which mutations are adaptive

Day three, session four

Steve Oliver: ‘Conservatism and Innovation in the Design and Evolution of a Simple Eukaryote’

more yeast evolution. hemizygous mutants in competition with each other. most genes in one copy give happy yeast, some show haploid insufficiency. some others are haploid proficient.

these are "high flux" genes

after whole genome duplication, they are more likely to stay in two copies.

more conserved

Chris Pacheco: ‘Missplicing of Cyclin-G Associated Kinase is a Risk Factor for Developing Parkinson’s Disease’


Alvis Brazma: A global map of major transcriptional states of the human genome’

9000 raw data files from Affy U133A in GEO and ArrayExpress

After QC, 5372 samples remained (206 studies, 163 labs -> 369 conditions)

only about 25% "normal"

18000 genes

PCA: main component blood vs rest, second axis malignancyIdeas.png

cell lines are very different

3rd is tissue of origin

first 3 components explain 37% of the variability

also a MDS (like Tom Freeman's graphs)

h clustering

6 main classes: brain, muscle, x, y, z, q (beyond this, the signal is weak - maybe lab effects)

take a leukaemia cluster, figure out genes

also introduced gene expression atlas at ebi