Talk:Synthetic Biology:Vectors/Barcode

Detection of frameshift mutations

 * Austin 21:27, 18 April 2006 (EDT): ECC for normal mutations is relatively easy. Frameshifts however are difficult. I contacted Robert Gallager about this problem and this is what he responded with (I haven't had the time or energy to figure any more of this out):
 * You have picked a difficult problem for yourself. Almost all of coding theory is done using the assumption of perfect timing. I worked on this problem back in 1961 and wrote a short technical note on it.  You can find a very poor copy of it on my website (http://web.mit.edu/gallager/www/). It is the 3rd entry under Internal memoranda in my publication list.   You will have to find out something about convolutional codes and sequential decoding to make sense of it, so it might not be worth your effort.  I don't think that having an alphabet size of 4 rather than 2 is a major difference. I haven't seen anything else in the literature dealing with this problem, although I haven't been looking. It is sensible to restrict yourself to single anomalies, but the decoder has no block structure and thus no way to define a single anomaly.  That is why I looked at convolutional codes back in 61, since the block structure problem was avoided. Gallager, R. G., "Sequential Decoding for Binary Channels with Noise and Synchronization Errors", Lincoln Group Report, 2502, Summer 1961.

Lookup table approach instead
The current barcode approach assumes that a text string is adequate to represent the plasmid. For this to be true, you may need a universal naming approach. Since pSB1AC3 is not universally understood.

An alternative approach is to use a lookup table on a known server (e.g. openwetware or the Registry) the advantages of this approach is that you can put much more information on the server and you can update the information after the part is in use. In fact, the code "pSB1AC3" only wokrs because there is a lookup table available at the Registry and openwetware. How about if someone in Iran created their own plasmid "pIRgH5x" how would you find what this was?

Once we agree on the lookup table, we could just have numbers encoded, or even have the DNA sequence itself generated uniquely. This would make error correction easier.

An advantage of this approach is that is would produce a much shorter barcode and sequencing reads that go farther into the part's sequence - RR 8 February 2006


 * RS 11:42, 8 March 2006 (EST): Austin's approach to the barcode is pretty univeral. You can encode any bit string.  Such a scheme allows the encoding of URL's directly into DNA.  I think this should achieve the same result as your suggested lookup table, right?

Comments
What is the point of the vector barcode? Is it the same as (or a replacement for) the CDS barcode? Endy 13:06, 2 February 2006 (EST)


 * The vector barcode is designed to allow the user to "read" the name of the vector whenever they sequence a barcode containing plasmid with a verification primer (VF2 or VR). Currently, our vectors are indistinguishable from one another during sequencing.   Some people have complained about this.  I figured that the easiest way to create a plasmid barcode was to actually directly encode the name of the plasmid in the DNA.  I am trying to make this code sufficiently general that any plasmid name could be encoded and that even system names could be encoded if we choose in the future.  Thus, the plasmid barcode serves a slightly different purpose from the CDS barcode in that it is not necessarily designed to be an easy way to diagnose whether a piece of DNA has BioBricks CDSs in it.  I welcome suggestions about a better way to do this.  --RS
 * Reshma, thanks for this explanation. It seems like there are three different functions we could (should?) use barcodes for.  First, for detection.  Second, for identification.  Third, for authentication.  The original BioBricks barcode project was for detection.  This project is (currently) for identification.  I remember some discussions about moving the barcodes away from BioBricks and to the vectors themselves.  Or, having barcodes as parts themselves.  It would be nice to coordinate/get this right once and for all.  If we can figure out what to do then we can propose something as a community-wide standard at SB2.0. Endy 18:28, 4 February 2006 (EST)

Is it useful to have a start and stop sequence to make sure that you have read all of the code? Seeing as most of codon space is being used by the alphanumeric table you could add on an extra letter or number by accident if you didn't know either the length of the barcode or see a defined stop codon. Not a problem in the short term but might crop up in the long run.--BC 16:25, 3 February 2006 (EST)

Early discussions
Is there a plan for the barcode?
 * Should the barcode only be readable by sequencing or is it sufficient to just look for an amplified band in a PCR reaction.
 * If PCR is sufficient we could build in a unique sequence just before the BB prefix and then design a reverse primer to that sequence to use along with VF.
 * It seems like the most likely short-mid term problem is that a researcher would be uncertain as to which BioBrick vector they had, rather than the doomsday question of trying to work out if there is a BioBrick vector somewhere in the drink that turned Drew's hair pink.
 * Given this assumption, could we choose restriction sites, each of which are found uniquely in one of our BioBrick vectors? A researcher could just prep, digest and run on a gel to tell which vector they had.--BC
 * It might be useful to be able to tell the plasmid (and resistance) by colony PCR rather than a prep. A PCR requires less starting material. -Jkm
 * There is no current plan for the barcode. The intention was just to make the identity of the plasmid obvious from a sequencing reaction but this goal is compatible with making the plasmid identifiable via a colony PCR as well.  Choosing a unique restriction site for each vector would be more difficult because that would involve placing additional requirements in the BioBricks standard.  i.e.  Parts cannot have any of the BioBrick enzymes nor this list of restriction enzymes that are identifiers for vectors.  This doesn't seem practical to me.  -- RS
 * I'm not in favor of inserting restriction sites but you can probably get away without using any new enzymes under certain assumptions. First let's assume one always inserts into a new plasmid (3-way ligation, either with or without 3 antibiotic selection). Then you can just insert various combinations of BioBrick enzymes into specific locations into the plasmids and look at the pattern of bands when you cut with them. The benefit of this is let's say you cut a part with ES, run on gel, and based on the band pattern from the plasmid, you know immediately which plasmid it's in, and if it's correct, you isolate the part band and can proceed with the assembly. You have the same problem as below if one of the plasmid pieces is the same length as the part, but now you may have more potential conflicting bands. 3-antibiotic assembly without purification shouldn't really be impacted by a couple more pieces of plasmid floating around. You can also take this idea by defining another single enzyme that will be used for this purpose and you can tell plasmids apart again by the differetn lengths generated after digest. So you definitely don't need one enzyme/plasmid.

One plan that I am currently considering is actually encoding the name of the plasmid in DNA. For instance,

AAA = 0; AAC = 1; AAG = 2; AAT = 3; . . . AGC = 9; AGG = A; AGT = B;. . . GAT = Z;

So that you could literally write out pSB5AC4-P1010.I50020 in DNA. Of course, we may want to make this slightly more intelligent to space out characters, include start and stop strings and avoid key codons like ATG, TAA and TGA. Any comments? --RS