Julius B. Lucks/Meetings and Notes/SMBE2007/methods comp genomics

From OpenWetWare

Jump to: navigation, search

Contents

Jeff Chuang (Boston College): Sequences conserved by selection across mouse and human malaria species

Tue Jun 26 07:54:00 EDT 2007

Words to Look Up

Talk

  • 41% world pop exposed to malaria carrying mosq.
    • 1E6 deaths
  • transmitted through Anopheles
    • lives in red blood cells

Plasmodium Gene Regulation

  • falciparum causes most fatalities
  • < 10 reg sequences characterized
  • simple gene expression program
    • 80% ORFs expressed periodically in red blood cell stage
    • yeast 10% expressed in comparable stage
  • small number of apparent transcription factors

Detecting Functional Sequences

  • look for sequence conservation
  • Genome Research, 15, 205, 2005
    • works for yeast
  • Challenge 1 - phylogenetic distances between P. falciparum and other specios non-ideal
    • mouse best sequenced (3X-8X): P. berghei, P. yoelli, P. chaboudi - total dS aprox 0.5
    • comp to P. falcifarum dS >> 1
  • Challenge 2 - most malaria spec extremely AT-rich genomes (> 80% AT)
    • AT cons could be due to chance
    • questionability of alignments
  • PhastCons (Seipel and Haussler 2005) - phylogenitic HMM - differential mutation matrix
    • in UCSC genome browser
    • not good for malaria - saturates subst rates
  • Gumby (Prabhakar, 2006)
    • conservation based scoring function using Karlin-Altschul statistics
  • Regulatory Potential (Elnitski, Hardison, 2003)

Empiracal Method that Corrects for Base Composition

  • intergenic sequences and orthologs
  • align these (MUSCLE)
  • aplly composition and conservation-dependant sccoring function to sliding windows - corrects for base comp

How much intergenic sequence conserved

  • Mouse-Human 5%
  • Sensu stricto yeast - 50%
  • 3 mouse malariai - 2.4%
    • including falciparum - 0.1%

Questions

Harold Drabkin : Function annotation using ontologies - the Mouse Genome Informatics system uses the Gene Ontology

Tue Jun 26 07:54:38 EDT 2007

  • Mouse Genome Informatics - Jackson Lab - Maine

Links

Words to Look Up

Talk

brief summary of MGI

  • genome database ...
  • sequence to phenotype/disease
  • manually curated experimental literature (w/ controlled vocabs)

how MGI collecs and summarizes orthology data

  • homologene, inparanoid, HGNC, Tree fam
    • AA alignment, NA aling, synteny, conserved map location
  • 19 species with homol to mouse genes (Human, chimp, dog, rat ...)

GO and MGI

  • GO - 3 ontologies in 1
    • molecular function
    • process function participates in
    • where in cell takes place (cellular component)
  • 18,000 genes, 170,000 annotations - from 8,000 papers
  • directed acyclic graph - any one term can have multiple heritage
  • 2 relationships
    • is a
    • part of
  • GO annotation
    • statement that gene product has a part molec function, involved in a process, located in certain cell comp
    • as det by part method, described in part ref
    • GO_term:evd_code:ref
  • evidence codes
    • experimental
      • ida-inferred direct assay
      • ipi-inferred phys interact.
      • imp - mutant phenotype
      • igl - genetic int.
    • predictive
      • iss - sequence or structural similarity

orthology-directed GA annotation

Reference Genome Initiative

Questions

  • defining orthologs - 1to1?
    • 1_to_1, 1_to_many, many_to_1 - up to curator

Daniel Blankenberg : Making the analyses of multiple-species whole-genome alignments accessible to everyone

Tue Jun 26 07:55:59 EDT 2007

  • Nekrutenko - Penn State

Words to Look Up

Talk

Multiple Spec Alignments

  • genomic align collection of local align where each sub-genomic region that aligns is a block

MAF format

  • format for alignments

Alignment Manipulation

  • extract alignment blocks which fall in a region
  • remove species
  • remove blocks
  • determine coverage statistics
  • convert to FASTA or other - nothing supports MAF
    • block based - multiple block - concatenated
    • interval based - start-stop - gene
  • extract coding seq alignments

Galaxy

  • web-based
  • connection to UCSU and biomart
  • 130 tools - interval operations, alignment manips, viewers, EMBOSS, hi-fi

Example

  • goal: determine non-canonical mammalian genes on chrom 22
  • workflow
    • obtain genomic coords for genes
    • extract mult species
    • stitch together alignments for coding exons
    • det freq of each tree

Questions

Yi Zhou : BLASTO - A tool for searching orthologous groups

Tue Jun 26 07:56:35 EDT 2007

  • Landweber - Princeton
  • Dept. Ecol & Evol. Biol.
  • NAR, 2007
  • BLASTO

Words to Look Up

Talk

  • ortholog - basis for phylo inference, genome evol studies, functional annotation
  • best estimated when complete genomes avail - reciprocal best hits, sim clusters
  • NCBI COG - unicell org, 53 prok, 3 euk
  • NCBI KOG - 7 euk
  • OrthoMCL
  • MultiParanoid
  • TIGR EGO (DNA)
  • Func annotation
    • comp gene tree - require position of quere species on ref tree
    • OrthoSTrapper
    • Rio
    • SIFTER
    • KOGnitor
  • BLASTO
    • all curr well-known multi-species databases
    • no compl genome req or phylo placement of curr species
    • modified blast that treats orthol groups as a unit
    • function prediction, putative phylo relationship inference
  • method
    • sig score of indiv sequences using BLAST
    • retriev orthog group information
    • comp sig score for ecah group based on indiv seq seq score
    • outputs sig score for orthog group - average likelihood of score within each group
  • reduce noise comp to one-way best hits

Questions

Ian Schenk : Genomic algebra fro the masses - providing flexible operations on genomic interval data with a user-friendly web resource

Tue Jun 26 07:56:57 EDT 2007

Words to Look Up

Talk

  • genomic data
    • sequences
    • alignments
    • datapoints (scores)
    • intervals
      • chromosome, start and end - 1 dim DNA map

Questions

Gordon Plague : Nice neigbors and safe landings - orientation bias of genes flanking transposable elements in bacteria

Tue Jun 26 07:57:33 EDT 2007

Words to Look Up

Talk

  • insertion sequences (ISs)
    • < 2500 bp
    • most frequent TEs in bacteria
    • QR-1RL-transposase-IRR-QR
    • prsA
  • intragenic IS elements rara - knock out genes
  • most IS elements intergenic
  • are all intergenic regions equiv for IS insertion?
  • bacterial genes have 4 diff neighboring gene orientations
    • both leading leading
    • leading-lagging
    • lagging-leading
    • lagging-lagging
  • Bact genomic arch
    • most circular chromosomos
    • high coding densiting (no introns/exons)
    • org into operons
      • leu: leuL-leuA-leuB-leuC-leuD
  • IS el hops into middle of gene orientations - could disrupt 1 reg region (lead-lead, lag-lag could be on same operon)
    • lagging-leading - IS could disrupt 2 promoters (setup needs to promoters)
      • hypoth - least common place for IS el to hop
    • leading-lagging - IS el cannot disrupt promoter sequences
      • hypoth - insert here the most since disrupt the most
  • Y. pestis
    • 1300 - 36% - lead lead
    • 546 - 15% - lead-lag
    • 579 - 16% - lag-lead
    • 1183 - 33% - lag-lag
  • data supports both hypotheses

Interval Genomic Data

Questions

Personal tools