Julius B. Lucks/Meetings and Notes/SMBE2007/methods comp genomics
From OpenWetWare
Jeff Chuang (Boston College): Sequences conserved by selection across mouse and human malaria species
Tue Jun 26 07:54:00 EDT 2007
Words to Look Up
Talk
- 41% world pop exposed to malaria carrying mosq.
- 1E6 deaths
- transmitted through Anopheles
- lives in red blood cells
Plasmodium Gene Regulation
- falciparum causes most fatalities
- < 10 reg sequences characterized
- simple gene expression program
- 80% ORFs expressed periodically in red blood cell stage
- yeast 10% expressed in comparable stage
- small number of apparent transcription factors
Detecting Functional Sequences
- look for sequence conservation
- Genome Research, 15, 205, 2005
- works for yeast
- Challenge 1 - phylogenetic distances between P. falciparum and other specios non-ideal
- mouse best sequenced (3X-8X): P. berghei, P. yoelli, P. chaboudi - total dS aprox 0.5
- comp to P. falcifarum dS >> 1
- Challenge 2 - most malaria spec extremely AT-rich genomes (> 80% AT)
- AT cons could be due to chance
- questionability of alignments
- PhastCons (Seipel and Haussler 2005) - phylogenitic HMM - differential mutation matrix
- in UCSC genome browser
- not good for malaria - saturates subst rates
- Gumby (Prabhakar, 2006)
- conservation based scoring function using Karlin-Altschul statistics
- Regulatory Potential (Elnitski, Hardison, 2003)
Empiracal Method that Corrects for Base Composition
- intergenic sequences and orthologs
- align these (MUSCLE)
- aplly composition and conservation-dependant sccoring function to sliding windows - corrects for base comp
How much intergenic sequence conserved
- Mouse-Human 5%
- Sensu stricto yeast - 50%
- 3 mouse malariai - 2.4%
- including falciparum - 0.1%
Questions
Harold Drabkin : Function annotation using ontologies - the Mouse Genome Informatics system uses the Gene Ontology
Tue Jun 26 07:54:38 EDT 2007
- Mouse Genome Informatics - Jackson Lab - Maine
Links
Words to Look Up
Talk
brief summary of MGI
- genome database ...
- sequence to phenotype/disease
- manually curated experimental literature (w/ controlled vocabs)
how MGI collecs and summarizes orthology data
- homologene, inparanoid, HGNC, Tree fam
- AA alignment, NA aling, synteny, conserved map location
- 19 species with homol to mouse genes (Human, chimp, dog, rat ...)
GO and MGI
- GO - 3 ontologies in 1
- molecular function
- process function participates in
- where in cell takes place (cellular component)
- 18,000 genes, 170,000 annotations - from 8,000 papers
- directed acyclic graph - any one term can have multiple heritage
- 2 relationships
- is a
- part of
- GO annotation
- statement that gene product has a part molec function, involved in a process, located in certain cell comp
- as det by part method, described in part ref
- GO_term:evd_code:ref
- evidence codes
- experimental
- ida-inferred direct assay
- ipi-inferred phys interact.
- imp - mutant phenotype
- igl - genetic int.
- predictive
- iss - sequence or structural similarity
- experimental
orthology-directed GA annotation
Reference Genome Initiative
- 12 model organisms - related to human diseases
- Graph of Ontology connections between model organisms
Questions
- defining orthologs - 1to1?
- 1_to_1, 1_to_many, many_to_1 - up to curator
Daniel Blankenberg : Making the analyses of multiple-species whole-genome alignments accessible to everyone
Tue Jun 26 07:55:59 EDT 2007
- Nekrutenko - Penn State
Words to Look Up
Talk
Multiple Spec Alignments
- genomic align collection of local align where each sub-genomic region that aligns is a block
MAF format
- format for alignments
Alignment Manipulation
- extract alignment blocks which fall in a region
- remove species
- remove blocks
- determine coverage statistics
- convert to FASTA or other - nothing supports MAF
- block based - multiple block - concatenated
- interval based - start-stop - gene
- extract coding seq alignments
Galaxy
- web-based
- connection to UCSU and biomart
- 130 tools - interval operations, alignment manips, viewers, EMBOSS, hi-fi
Example
- goal: determine non-canonical mammalian genes on chrom 22
- workflow
- obtain genomic coords for genes
- extract mult species
- stitch together alignments for coding exons
- det freq of each tree
Questions
Yi Zhou : BLASTO - A tool for searching orthologous groups
Tue Jun 26 07:56:35 EDT 2007
- Landweber - Princeton
- Dept. Ecol & Evol. Biol.
- NAR, 2007
- BLASTO
Words to Look Up
Talk
- ortholog - basis for phylo inference, genome evol studies, functional annotation
- best estimated when complete genomes avail - reciprocal best hits, sim clusters
- NCBI COG - unicell org, 53 prok, 3 euk
- NCBI KOG - 7 euk
- OrthoMCL
- MultiParanoid
- TIGR EGO (DNA)
- Func annotation
- comp gene tree - require position of quere species on ref tree
- OrthoSTrapper
- Rio
- SIFTER
- KOGnitor
- BLASTO
- all curr well-known multi-species databases
- no compl genome req or phylo placement of curr species
- modified blast that treats orthol groups as a unit
- function prediction, putative phylo relationship inference
- method
- sig score of indiv sequences using BLAST
- retriev orthog group information
- comp sig score for ecah group based on indiv seq seq score
- outputs sig score for orthog group - average likelihood of score within each group
- reduce noise comp to one-way best hits
Questions
Ian Schenk : Genomic algebra fro the masses - providing flexible operations on genomic interval data with a user-friendly web resource
Tue Jun 26 07:56:57 EDT 2007
- Nekrutenko Lab Penn State
- GalaxyOps
Words to Look Up
Talk
- genomic data
- sequences
- alignments
- datapoints (scores)
- intervals
- chromosome, start and end - 1 dim DNA map
Questions
Gordon Plague : Nice neigbors and safe landings - orientation bias of genes flanking transposable elements in bacteria
Tue Jun 26 07:57:33 EDT 2007
Words to Look Up
Talk
- insertion sequences (ISs)
- < 2500 bp
- most frequent TEs in bacteria
- QR-1RL-transposase-IRR-QR
- prsA
- intragenic IS elements rara - knock out genes
- most IS elements intergenic
- are all intergenic regions equiv for IS insertion?
- bacterial genes have 4 diff neighboring gene orientations
- both leading leading
- leading-lagging
- lagging-leading
- lagging-lagging
- Bact genomic arch
- most circular chromosomos
- high coding densiting (no introns/exons)
- org into operons
- leu: leuL-leuA-leuB-leuC-leuD
- IS el hops into middle of gene orientations - could disrupt 1 reg region (lead-lead, lag-lag could be on same operon)
- lagging-leading - IS could disrupt 2 promoters (setup needs to promoters)
- hypoth - least common place for IS el to hop
- leading-lagging - IS el cannot disrupt promoter sequences
- hypoth - insert here the most since disrupt the most
- lagging-leading - IS could disrupt 2 promoters (setup needs to promoters)
- Y. pestis
- 1300 - 36% - lead lead
- 546 - 15% - lead-lag
- 579 - 16% - lag-lead
- 1183 - 33% - lag-lag
- data supports both hypotheses