T7.1/Reannotation

Overview
We used past experiments and observations to define specific boundaries of functional genetic elements on the bacteriophage T7 genome. We followed the standard naming conventions developed by Studier and Dunn (Dunn & Studier, 1983; Studier & Dunn, 1983). Our annotation can be found here.

Protein Coding Domains
The definition of a protein coding domain that we used here is a contiguous stretch of DNA that, when transcribed, produces an mRNA that specifies the amino acid sequence of a protein. The T7 protein coding domains were first characterized by the isolation and analysis of randomly generated amber mutants. Nineteen genes were identified by mapping mutants that disrupt T7 DNA synthesis, particle maturation, and lysis (Studier, 1969; Haussman & Gomez, 1967; Haussman & LaRue, 1969). Two additional genes, T7 DNA ligase and protein kinase, were isolated via loss of function and deletion, respectively (Masamune et al, 1971); the genetic analysis of ligase and kinase mutants was carried out using mutant host strains that do not support the growth of ligase or kinase defective phage (Studier, 1969). Up to thirty T7 proteins were observed by pulsing phage-infected cells with radioactive amino acids (Studier & Maizel, 1969; Studier, 1973). Further experiments, such as electrophoretic mobility shifts of amber mutants, provided evidence for up to 38 T7 proteins (Studier, 1981). Sequencing of the genome confirmed the previously constructed genetic maps (Dunn & Studier, 1983). But, analysis of the complete genome sequence also revealed that the set of protein coding domains found via mutagenesis, screening, and mapping was not exhaustive, and that additional unidentified open reading frames occupied most of the remainder of the genome. Some of these unidentified open reading frames can be labeled as putative protein coding domains based on the inferred strengths of adjacent upstream ribosome binding sites. In all, up to 57 genes encoding 60 potential proteins have been found or postulated (Molineux, 2005). In the few cases where multiple possible start codons exist, we used the most upstream start codon to define the beginning of the protein coding domain.

Ribosome Binding Sites
The definition of a ribosome binding site (RBS) that we used here is a contiguous stretch of DNA that, when transcribed, produces a region of RNA that interacts with the ribosome and allows for the initiation of protein synthesis. The T7 ribosome binding sites were first postulated by analysis of the sequence data upstream of protein coding domain start codons; DNA sequence complementary to the E. coli 16S rRNA suggested a functioning RBS. Direct observation of proteins during T7 infection provides additional support for the function of a subset of RBSs (Studier & Maizel, 1969).

RNA Polymerase Promoters
The definition of a promoter that we used here is a contiguous stretch of DNA that interacts with an RNA polymerase molecule and allows for the initiation of mRNA synthesis. At least 22 RNA polymerase promoters help to coordinate transcription dynamics during T7 infection.

E. coli
The E. coli RNA polymerase promoters on the T7 genome (A0, A1-3, B, C, and E) were first mapped by in vitro transcription studies (Davis & Hyman, 1970; Minkley & Pribnow, 1973; Golomb & Chamberlin, 1974; Niles & Condit, 1975; McAllister & McCarron, 1977; Stahl & Chamberlin, 1977; Kassavetis & Chamberlin, 1979; Panayotatos & Wells, 1979) and subsequently confirmed by sequencing (Oakley & Coleman, 1977; Boothroyd & Hayward, 1979; Rosa, 1979; Osterman & Coleman, 1981; Carter & McAllister WT, 1981; Dunn & Studier, 1983). Results of in vitro transcription reactions using T7 genomic DNA as template agreed with the available in vivo transcription data (Studier, 1973; Summer et al, 1973; McAllister & Wu, 1978; McAllister et al, 1981). However, the cloning of random sections of the T7 genome into a plasmid that selected for transcription activity from the cloned fragment identified other possible promoters (Studier & Rosenberg, 1981). Sequence analysis of the cloned sections identified ~10 regions with homology to known promoters; footprinting assays identified two additional promoters (Dunn & Studier, 1983). While we annotated these additional promoters, we did not incorporate them as functional genetic elements of T7.1. Here, we used regions of at least 60 base pairs, ranging from the –50 to +10 positions, to define the major and minor E. coli promoters (A0, A1, A2, A3, B, C, and E). Also, a boxA recognition site located between A3 and gene 0.3 is thought to be involved with anti-termination of polymerases that initiate from the three strong early promoters, A1, A2, and A3 (Olson et al, 1982).

T7
The T7 RNA polymerase promoter was determined by sequencing the 23 base pair region common to the late T7 promoters (Boothroyd & Hayward, 1979). Here, we used a 35 base pair region to define T7 promoters; our broader definition of T7 promoter elements hoped to include conserved regions beyond the initial 23 base pairs (Dunn & Studier, 1983).

RNA Polymerase Terminators
The definition of a terminator that we used here is a contiguous stretch of DNA that, during transcription, produces a region of mRNA that stops the process of transcription (at some efficiency). The first T7 transcription termination site was identified by mapping the endpoints of mRNA starting from E. coli promoters (Studier, 1972). Later it was shown that termination occurred at the same place in vivo and in vitro (Dunn & Studier, 1973). The termination site was later mapped precisely, sequenced, and subsequently named ‘TE’ (Studier et al, 1979; Dunn & Studier, 1980). A second terminator specific to T7 RNA polymerase was suggested by in vitro transcription studies on digested T7 DNA (Golomb & Chamberlin, 1974; Niles & Condit, 1975). The terminator, named ‘Tø,’ was shown to function in situ (Dunn & Studier, 1980) and on plasmids (McAllister et al, 1981). Both TE and Tø have stem loop structures that are thought to set termination efficiency (Dunn & Studier, 1973). The stem loop and flanking sequence, which includes a poly-uridine tract, were taken together to define the element we used here. While other terminators have been postulated, their precise location and function, if any, during wild-type infection are tenuous (Dunn & Studier, 1983), and thus we did not include them in our annotation.

RNaseIII Recognition Sites
The definition of an RNaseIII recognition site that we used here is a contiguous stretch of DNA that, when transcribed, produces a region of mRNA that is recognized and cleaved (at some efficiency) by RNaseIII. Sites for specific cleavage of T7 RNA by RNaseIII were first shown in vitro and then correlated to in vivo data (Dunn & Studier, 1973). In time, ten RNaseIII sites were mapped and their sites of cleavage identified (Dunn & Studier, 1983). The sites are thought to stabilize the 3’ end of T7 transcripts by providing a stem loop that prevents the binding of scanning single stranded RNA degradation enzymes. A downstream gene often immediately follows an RNaseIII site. Thus, we kept the RNaseIII recognition site elements as short as possible – with a minimum boundary set by the probable stem loop structures (Dunn & Studier, 1983).

DNA Replication Origins
The definition of a DNA replication origin that we used here is a stretch of DNA that is used to initiate the copying of phage DNA during T7 infection. The primary replication origin was mapped to the dual promoter region downstream of ø1.1A and ø1.1B by analysis of replication bubbles in electron micrographs (Dressler et al, 1972; Wolfson et al, 1972) and subsequently sequenced (Saito et al, 1980). The secondary origin at øOL was identified using mutants that lacked the primary origin (Studier & Rosenberg, 1981; Tamanoi et al, 1980). Finally, plasmids containing cloned fragments of T7 DNA were used to screen for regions that act as replication origins during T7 infection; these experiments revealed that øOR and ø13 have origin activity (Dunn & Studier, 1983). While the precise boundaries of the replication origins are unknown, each appears to be linked to a functioning RNA polymerase promoter (Zhang & Studier, 2004). Here, we only annotate and define an element for the primary origin. While we do not include other replication origins as elements, we do preserve the RNA polymerase promoters that are associated with these secondary origins as elements, and thus possibly the secondary origins as well.

Terminal and Short Repeats
The definition of a terminal repeat that we used here is a contiguous stretch of DNA present at both ends of the T7 genome, and a short repeat is a series of direct repeats of DNA near the end of the genome. Both the left and right ends of the T7 genome contain exact 160 base-pair direct repeats (Ritchie et al, 1967). Also, adjacent to the direct repeats on both ends of the genome are regions of DNA that contain 12 regularly arranged and highly conserved seven base pair sequences termed the short-repeats left, SRL, and right, SRR (Dunn & Studier, 1981). The terminal repeats and SRL/R are thought to be involved in concatemer formation, DNA packaging, and particle maturation (Kelly & Thomas, 1969). However, the mechanisms by which the direct repeats and the SRL/R act are unclear. Thus, we treated each end’s direct repeat and SRL/R as a monolithic element (the design of T7.1 does not make any changes to the DNA sequence of these elements).