Julius B. Lucks/Meetings and Notes/SMBE2007/methods comp genomics

= Jeff Chuang (Boston College): Sequences conserved by selection across mouse and human malaria species = Tue Jun 26 07:54:00 EDT 2007

Talk

 * 41% world pop exposed to malaria carrying mosq.
 * 1E6 deaths
 * transmitted through Anopheles
 * lives in red blood cells

Plasmodium Gene Regulation

 * falciparum causes most fatalities
 * < 10 reg sequences characterized
 * simple gene expression program
 * 80% ORFs expressed periodically in red blood cell stage
 * yeast 10% expressed in comparable stage
 * small number of apparent transcription factors

Detecting Functional Sequences

 * look for sequence conservation
 * Genome Research, 15, 205, 2005
 * works for yeast
 * Challenge 1 - phylogenetic distances between P. falciparum and other specios non-ideal
 * mouse best sequenced (3X-8X): P. berghei, P. yoelli, P. chaboudi - total dS aprox 0.5
 * comp to P. falcifarum dS >> 1
 * Challenge 2 - most malaria spec extremely AT-rich genomes (> 80% AT)
 * AT cons could be due to chance
 * questionability of alignments
 * PhastCons (Seipel and Haussler 2005) - phylogenitic HMM - differential mutation matrix
 * in UCSC genome browser
 * not good for malaria - saturates subst rates
 * Gumby (Prabhakar, 2006)
 * conservation based scoring function using Karlin-Altschul statistics
 * Regulatory Potential (Elnitski, Hardison, 2003)

Empiracal Method that Corrects for Base Composition

 * intergenic sequences and orthologs
 * align these (MUSCLE)
 * aplly composition and conservation-dependant sccoring function to sliding windows - corrects for base comp

How much intergenic sequence conserved

 * Mouse-Human 5%
 * Sensu stricto yeast - 50%
 * 3 mouse malariai - 2.4%
 * including falciparum - 0.1%

Questions
= Harold Drabkin : Function annotation using ontologies - the Mouse Genome Informatics system uses the Gene Ontology = Tue Jun 26 07:54:38 EDT 2007
 * Mouse Genome Informatics - Jackson Lab - Maine

Links

 * MGI
 * GO

brief summary of MGI

 * genome database ...
 * sequence to phenotype/disease
 * manually curated experimental literature (w/ controlled vocabs)

how MGI collecs and summarizes orthology data

 * homologene, inparanoid, HGNC, Tree fam
 * AA alignment, NA aling, synteny, conserved map location
 * 19 species with homol to mouse genes (Human, chimp, dog, rat ...)

GO and MGI

 * GO - 3 ontologies in 1
 * molecular function
 * process function participates in
 * where in cell takes place (cellular component)
 * 18,000 genes, 170,000 annotations - from 8,000 papers
 * directed acyclic graph - any one term can have multiple heritage
 * 2 relationships
 * is a
 * part of
 * GO annotation
 * statement that gene product has a part molec function, involved in a process, located in certain cell comp
 * as det by part method, described in part ref
 * GO_term:evd_code:ref
 * evidence codes
 * experimental
 * ida-inferred direct assay
 * ipi-inferred phys interact.
 * imp - mutant phenotype
 * igl - genetic int.
 * predictive
 * iss - sequence or structural similarity

Reference Genome Initiative

 * 12 model organisms - related to human diseases
 * Graph of Ontology connections between model organisms

Questions

 * defining orthologs - 1to1?
 * 1_to_1, 1_to_many, many_to_1 - up to curator

= Daniel Blankenberg : Making the analyses of multiple-species whole-genome alignments accessible to everyone = Tue Jun 26 07:55:59 EDT 2007
 * Nekrutenko - Penn State

Multiple Spec Alignments

 * genomic align collection of local align where each sub-genomic region that aligns is a block

MAF format

 * format for alignments

Alignment Manipulation

 * extract alignment blocks which fall in a region
 * remove species
 * remove blocks
 * determine coverage statistics
 * convert to FASTA or other - nothing supports MAF
 * block based - multiple block - concatenated
 * interval based - start-stop - gene
 * extract coding seq alignments

Galaxy

 * web-based
 * connection to UCSU and biomart
 * 130 tools - interval operations, alignment manips, viewers, EMBOSS, hi-fi

Example

 * goal: determine non-canonical mammalian genes on chrom 22
 * workflow
 * obtain genomic coords for genes
 * extract mult species
 * stitch together alignments for coding exons
 * det freq of each tree

Questions
= Yi Zhou : BLASTO - A tool for searching orthologous groups = Tue Jun 26 07:56:35 EDT 2007
 * Landweber - Princeton
 * Dept. Ecol & Evol. Biol.
 * NAR, 2007
 * BLASTO

Talk

 * ortholog - basis for phylo inference, genome evol studies, functional annotation
 * best estimated when complete genomes avail - reciprocal best hits, sim clusters
 * NCBI COG - unicell org, 53 prok, 3 euk
 * NCBI KOG - 7 euk
 * OrthoMCL
 * MultiParanoid
 * TIGR EGO (DNA)
 * Func annotation
 * comp gene tree - require position of quere species on ref tree
 * OrthoSTrapper
 * Rio
 * SIFTER
 * KOGnitor
 * BLASTO
 * all curr well-known multi-species databases
 * no compl genome req or phylo placement of curr species
 * modified blast that treats orthol groups as a unit
 * function prediction, putative phylo relationship inference
 * method
 * sig score of indiv sequences using BLAST
 * retriev orthog group information
 * comp sig score for ecah group based on indiv seq seq score
 * outputs sig score for orthog group - average likelihood of score within each group
 * reduce noise comp to one-way best hits

Questions
= Ian Schenk : Genomic algebra fro the masses - providing flexible operations on genomic interval data with a user-friendly web resource = Tue Jun 26 07:56:57 EDT 2007
 * Nekrutenko Lab Penn State
 * GalaxyOps

Talk

 * genomic data
 * sequences
 * alignments
 * datapoints (scores)
 * intervals
 * chromosome, start and end - 1 dim DNA map

Questions
= Gordon Plague : Nice neigbors and safe landings - orientation bias of genes flanking transposable elements in bacteria = Tue Jun 26 07:57:33 EDT 2007

Talk

 * insertion sequences (ISs)
 * < 2500 bp
 * most frequent TEs in bacteria
 * QR-1RL-transposase-IRR-QR
 * prsA
 * intragenic IS elements rara - knock out genes
 * most IS elements intergenic
 * are all intergenic regions equiv for IS insertion?
 * bacterial genes have 4 diff neighboring gene orientations
 * both leading leading
 * leading-lagging
 * lagging-leading
 * lagging-lagging
 * Bact genomic arch
 * most circular chromosomos
 * high coding densiting (no introns/exons)
 * org into operons
 * leu: leuL-leuA-leuB-leuC-leuD
 * IS el hops into middle of gene orientations - could disrupt 1 reg region (lead-lead, lag-lag could be on same operon)
 * lagging-leading - IS could disrupt 2 promoters (setup needs to promoters)
 * hypoth - least common place for IS el to hop
 * leading-lagging - IS el cannot disrupt promoter sequences
 * hypoth - insert here the most since disrupt the most
 * Y. pestis
 * 1300 - 36% - lead lead
 * 546 - 15% - lead-lag
 * 579 - 16% - lag-lead
 * 1183 - 33% - lag-lag
 * data supports both hypotheses