AlexLabNotebook/ChrIIIRebuild/9/24/05-9/30/05

From OpenWetWare
Jump to navigationJump to search

9/24/05 - 9/30/05

Last week: 9/19/05-9/23/05

Next week: 10/1/05-10/7/05

Goals this week:

  • Figure out 3' ends of all genes, based on 3'-end prediction website -- sent gene sequences [generated by extractgeneseqs.py] to Joel Graber.
  • Figure out flanking 5' and 3' sequences required for Ty transposon function -- done.
  • Figure out flanking 5' and 3' sequences required for ncRNA function

Reannotation

  • There are discrepancies between the positions of the start and stop codons of chr III genes as supplied by SGD and the positions based on the UCSC genome browser and nibFrag. Had to write some code to update the start/stop positions in my MySQL instance -- reannotategenes.py
  • Also had to manually update start/stop positions for three genes:
    • YCL012C: this actually isn't annotated as a gene in the UCSC genome browser. Set start to 101783, end to 101316.
    • YCL008C
    • YCR006C

List of overlapping regions

Generated by findoverlaps.py

With 0 bp flanking sequence [ie only true overlaps]:

Total overlap length: 55079 bases.


With 500 bp upstream of start codon and 500 bp downstream of stop codon of a gene and all other feature types bounded by their SGD start/end annotations:

Total overlap length: 278316 bases.

Ty elements

  • No functional Ty5 elements have been found in S.cerevisiae [per Lesage & Todeschini].
  • No additional 5' and 3' flanking sequence outside the full element [ie 5' LTR + coding region + 3' LTR] are required for transcription of the genes in Ty2. Transcription starts 240 bp into the 5' LTR and ends 285 bp into the 3' LTR, per Farabaugh et al, '89
  • Insertion of either full Ty elements or just their LTRs can impact gene expression, both increasing and decreasing it [Lesage & Todeschini]. Will need to look at each of the Ty insertions on chr III and make an educated guess about whether and how they could affect gene expression of neighboring genes.

Figuring out [5'] flanking sequence

Via conservation

Look at conservation across evolutionary distance and use that to determine how much of the flanking sequence is "important".

Pseudo-code:

initial block = alignment block that 5' end of gene falls into
if (initial block is high-scoring)
label_1:
  if (block extends all the way to nearest upstream gene)
     flanking sequence = all of intergenic sequence
     exit
  else
     take up to end of high-scoring block
     if (another high-scoring block nearby)
        goto label_1
     else
        exit
else
  if (high-scoring block nearby)
     goto label_1
  else
     take 500 bp upstream of start codon
     exit


Need to determine what constitutes a "high-scoring" block and what being "nearby" means.

Via "canonical" promoters

Identify "canonical" promoters eg cell-cycle regulated, GCN4-regulated etc, assign a canonical promoter to each gene according to what's known about the gene and then synthesize the canonical promoters in front of the genes, rather than synthesizing the WT sequence. Using these canonical promoters would go further towards the goal of building a custom, understood chromosome than using the WT sequence and might be a better "engineering" choice. However, it would then be harder to make meaningful comparisons to WT yeast, so it'd be a worse "science" choice.

  • Starting point: Segal E et al, Nat Genetics '03: "Module networks: Identifying regulatory modules and their condition-specific regulators from gene expression data."
  • Data is available from here.
  • List of genes assigned to modules assigned 55 genes on chromosome III to a module, leaving 127 unassigned [per getgenemodules.py]. Could be because they only looked at ~2300 genes to begin with; could possibly re-run their analysis with expression data sets that include the rest of the genes on chr III to map more chr III genes to modules.

Visualization

Need to be able to visualize both existing chromosome and new chromosome. Some possibilities:

  • See whether Ben Fry has anything I could use
  • Add custom tracks to the UCSC Genome Browser, as described here
  • Use VectorNTI
  • See whether I can [re]use the visualization stuff developed by David Gifford's group.
  • BioBricks Registry
  • Write my own visualization tool
  • This paper lists some existing genome visualization tools, but none of them seems to have the functionality to show control elements ie promoters etc.
  • This site also lists some visualization tools.


Last week: 9/19/05-9/23/05

Next week: 10/1/05-10/7/05