Codon usage optimization
The relative frequency of codon use varies widely depending on the organism and organelle. Many design programs for synthetic protein coding sequences allow the choice of organism. The codon usage database has codon usage statistics for many common and sequenced organisms. However, many times expression in more than one organism is desirable, often E. coli and a target organism, or S. cerevisiae and a target organism.
For these applications, a compromise codon usage table is required. The codon usage table database lists the relative frequency of each possible codon for a particular amino acid. By multiplying these relative frequencies and taking the square root, we calculate the geometric mean of each probability, which reflects the desirable compromise value. The resulting numbers are then normalized such that the relative frequencies for each amino acid sum to 1.0, by dividing each result by the sum of all the codon frequencies for each amino acid.
However, simply using the most common host codons has clearly been shown to NOT correlate with high gene expression. The host codon distribution is instead the result of the evolutionary history of the genome and have nothing to do with overexpression of genes. See open access Welch et al for more information.
For reliable expression of heterologous genes it is strongly preferred to use codons that correlate with tRNA's that remain charged during starvation. Genes designed with this subset of codons consistently express 10-100 fold higher than genes designed using the 'most common' codon bias. See this PLoS ONE article for details.
Codon usage tables can be saved and used as input to many of the protein coding region design programs, such as Gene Designer.