9/24/05 - 9/30/05

Last week: 9/19/05-9/23/05

Next week: 10/1/05-10/7/05

Goals this week:

Figure out 3' ends of all genes, based on 3'-end prediction website -- sent gene sequences [generated by extractgeneseqs.py] to Joel Graber.
Figure out flanking 5' and 3' sequences required for Ty transposon function -- done.
Figure out flanking 5' and 3' sequences required for ncRNA function

Reannotation

There are discrepancies between the positions of the start and stop codons of chr III genes as supplied by SGD and the positions based on the UCSC genome browser and nibFrag. Had to write some code to update the start/stop positions in my MySQL instance -- reannotategenes.py
Also had to manually update start/stop positions for three genes:
- YCL012C: this actually isn't annotated as a gene in the UCSC genome browser. Set start to 101783, end to 101316.
- YCL008C
- YCR006C

List of overlapping regions

Generated by findoverlaps.py

With 0 bp flanking sequence [ie only true overlaps]:

Bases 1-4322, length: 4321 bases
Bases 13282-14119, length: 837 bases
Bases 15214-16880, length: 1666 bases
Bases 23523-23981, length: 458 bases
Bases 44623-46963, length: 2340 bases
Bases 48653-52340, length: 3687 bases
Bases 78948-82274, length: 3326 bases
Bases 84810-90768, length: 5958 bases
Bases 106970-107413, length: 443 bases
Bases 108017-110666, length: 2649 bases
Bases 114318-114936, length: 618 bases
Bases 131982-133118, length: 1136 bases
Bases 137740-139043, length: 1303 bases
Bases 142698-143077, length: 379 bases
Bases 151518-151856, length: 338 bases
Bases 193289-200170, length: 6881 bases
Bases 200434-205389, length: 4955 bases
Bases 208127-209602, length: 1475 bases
Bases 210710-211537, length: 827 bases
Bases 211863-213764, length: 1901 bases
Bases 228087-228783, length: 696 bases
Bases 254364-258647, length: 4283 bases
Bases 263969-264484, length: 515 bases
Bases 272308-274080, length: 1772 bases
Bases 294400-295326, length: 926 bases
Bases 300825-302214, length: 1389 bases

Total overlap length: 55079 bases.

With 500 bp upstream of start codon and 500 bp downstream of stop codon of a gene and all other feature types bounded by their SGD start/end annotations:

Bases 1-4322, length: 4321 bases
Bases 9206-14849, length: 5643 bases
Bases 15214-29436, length: 14222 bases
Bases 30949-41224, length: 10275 bases
Bases 41665-78418, length: 36753 bases
Bases 78448-82774, length: 4326 bases
Bases 83054-84625, length: 1571 bases
Bases 84810-90768, length: 5958 bases
Bases 90823-114936, length: 24113 bases
Bases 116874-123500, length: 6626 bases
Bases 127964-142664, length: 14700 bases
Bases 142698-143077, length: 379 bases
Bases 143128-149397, length: 6269 bases
Bases 151102-168491, length: 17389 bases
Bases 170378-176930, length: 6552 bases
Bases 176992-178793, length: 1801 bases
Bases 179012-290290, length: 111278 bases
Bases 292384-295326, length: 2942 bases
Bases 300325-303523, length: 3198 bases

Total overlap length: 278316 bases.

Ty elements

No functional Ty5 elements have been found in S.cerevisiae [per Lesage & Todeschini].
No additional 5' and 3' flanking sequence outside the full element [ie 5' LTR + coding region + 3' LTR] are required for transcription of the genes in Ty2. Transcription starts 240 bp into the 5' LTR and ends 285 bp into the 3' LTR, per Farabaugh et al, '89
Insertion of either full Ty elements or just their LTRs can impact gene expression, both increasing and decreasing it [Lesage & Todeschini]. Will need to look at each of the Ty insertions on chr III and make an educated guess about whether and how they could affect gene expression of neighboring genes.

Figuring out [5'] flanking sequence

Via conservation

Look at conservation across evolutionary distance and use that to determine how much of the flanking sequence is "important".

Pseudo-code:

initial block = alignment block that 5' end of gene falls into

if (initial block is high-scoring)
label_1:
  if (block extends all the way to nearest upstream gene)
     flanking sequence = all of intergenic sequence
     exit
  else
     take up to end of high-scoring block
     if (another high-scoring block nearby)
        goto label_1
     else
        exit
else
  if (high-scoring block nearby)
     goto label_1
  else
     take 500 bp upstream of start codon
     exit

Need to determine what constitutes a "high-scoring" block and what being "nearby" means.

Via "canonical" promoters

Identify "canonical" promoters eg cell-cycle regulated, GCN4-regulated etc, assign a canonical promoter to each gene according to what's known about the gene and then synthesize the canonical promoters in front of the genes, rather than synthesizing the WT sequence. Using these canonical promoters would go further towards the goal of building a custom, understood chromosome than using the WT sequence and might be a better "engineering" choice. However, it would then be harder to make meaningful comparisons to WT yeast, so it'd be a worse "science" choice.

Starting point: Segal E et al, Nat Genetics '03: "Module networks: Identifying regulatory modules and their condition-specific regulators from gene expression data."
Data is available from here.
List of genes assigned to modules assigned 55 genes on chromosome III to a module, leaving 127 unassigned [per getgenemodules.py]. Could be because they only looked at ~2300 genes to begin with; could possibly re-run their analysis with expression data sets that include the rest of the genes on chr III to map more chr III genes to modules.

Visualization

Need to be able to visualize both existing chromosome and new chromosome. Some possibilities:

See whether Ben Fry has anything I could use
Add custom tracks to the UCSC Genome Browser, as described here
Use VectorNTI
See whether I can [re]use the visualization stuff developed by David Gifford's group.
BioBricks Registry
Write my own visualization tool
This paper lists some existing genome visualization tools, but none of them seems to have the functionality to show control elements ie promoters etc.
This site also lists some visualization tools.

Last week: 9/19/05-9/23/05

Next week: 10/1/05-10/7/05

AlexLabNotebook/ChrIIIRebuild/9/24/05-9/30/05

Contents

9/24/05 - 9/30/05

Reannotation

List of overlapping regions

Ty elements

Figuring out [5'] flanking sequence

Via conservation

Via "canonical" promoters

Visualization

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools