Julius B. Lucks/Bibliography/Zeldovich-PLoSCompBio-2007

From OpenWetWare
Jump to: navigation, search

Notes on [1]

  • Have T_opt = optimal growing temperature for a variety of organisms
  • for every possible set of amino acids, labeled by a 20-element vector full of 1's and 0's (membership), fill the 1's with the actual frequency of that amino acid in the genome - do for each of 83 genomes
    • take these vectors with frequency information and sum elements to get an F-value for that combination of aa's
    • regress the F-values against the T_opts - calculate r2
    • search over all possible aa combination vectors - find the combination that gives the best r2
  • this combination is IVRWREL
  • control to verify this is not due to biased nucleotide content
    • shuffle the nucleotides of the genomes, and recompute translated protein sequences
    • observe that pattern of amino acids that predict T_opt change in shuffled genomes
      • therefore reason that amino acid usage independant of nucleotide patterns (if they were the same, then nucleotide patters could be said to 'explain' the amino acid patterns)
    • something funny about this test
  • find that GC content not related to T_opt, or fraction of IVYWREL in the genome (F6)
  • is correlation between A+G content and T_opt
    • if take sequences, and reverse translate with no codon bias, get A+G content that correlates with T_opt (F7) almost as strongly as what get with observed DNA sequences (r = 0.48 vs. r = 0.60)


  1. Zeldovich KB, Berezovsky IN, and Shakhnovich EI. Protein and DNA sequence determinants of thermophilic adaptation. PLoS Comput Biol. 2007 Jan 12;3(1):e5. DOI:10.1371/journal.pcbi.0030005 | PubMed ID:17222055 | HubMed [Zeldovich-PLoSCompBio-2007]