Julius B. Lucks/Bibliography/Zeldovich-PLoSCompBio-2007
From OpenWetWare
Jump to navigationJump to search
Notes on [1]
- Have T_opt = optimal growing temperature for a variety of organisms
- for every possible set of amino acids, labeled by a 20-element vector full of 1's and 0's (membership), fill the 1's with the actual frequency of that amino acid in the genome - do for each of 83 genomes
- take these vectors with frequency information and sum elements to get an F-value for that combination of aa's
- regress the F-values against the T_opts - calculate r2
- search over all possible aa combination vectors - find the combination that gives the best r2
- this combination is IVRWREL
- control to verify this is not due to biased nucleotide content
- shuffle the nucleotides of the genomes, and recompute translated protein sequences
- observe that pattern of amino acids that predict T_opt change in shuffled genomes
- therefore reason that amino acid usage independant of nucleotide patterns (if they were the same, then nucleotide patters could be said to 'explain' the amino acid patterns)
- something funny about this test
- find that GC content not related to T_opt, or fraction of IVYWREL in the genome (F6)
- is correlation between A+G content and T_opt
- if take sequences, and reverse translate with no codon bias, get A+G content that correlates with T_opt (F7) almost as strongly as what get with observed DNA sequences (r = 0.48 vs. r = 0.60)