PICA Draft Annotation Guidelines
Ralph Santos PICA Annotation Guide, Draft Version 2008-03-03
Annotation Data Overview
The minimal annotation dataset consists of the following:
(1) A list of declarations regarding nonstandard language features (for now undefined and not included).
(2) two sets of yes/no questions describing the composition and transcription of a part
(2a) At least three questions to describe sequence features present on the two strands of a part
(2b) Two questions to detail how the part is transcribed
(3) A list describing the molecular species comprising the regulatory signals and responses of the part as well as any biochemical species that are affected by those responses.
(4) A list of declarations characterizing part behavior, including at least the following:
(4a) A list of declarations citing pairs of molecular species (each one per ) summarizing in rough terms how regulatory signals and responses are related to each other in terms of relations between terminal species.
(4b) A list of declarations citing pairs of molecular species (each one per ) summarizing in rough terms how biochemical reactions not directly related to regulation are related to the defined terminal species.
The two strand annotations are qualified by the SO feature attribute ('forward'/SO:0001030 or 'reverse'/SO:0001031) indicating that the qualities described by the strand attribute apply to the given strand.
The part composition assertions simply consist of:
indicating whether the named region may be found at least once upon the given strand in the given part. All declarations must describe at least SO:regulatory_region, SO:CDS and SO:terminator (or hyponyms thereof). Generally, any hyponym of SO:sequence_feature is acceptable. So the following could describe a basic promoter part like J23100 as:
(forward (promoter_region Y) (CDS N) (terminator N) (trans-upstream-start N) (trans-downstream-end Y)) (reverse (promoter_region N) (CDS N) (terminator N) (trans-upstream-start N) (trans-downstream-end N))
composition_assertion = strand_assertion, strand_assertion*
strand_assertion = DirectionSymbol, simple_assertion, simple_assertion*
simple_assertion = RegionSymbol, Boolean
The part terminals declaration is a simple listing of all chemical species that interact with the part. For the purposes of annotation, the part includes the nucleic acid directly encoding the part, any mRNA's or proteins transcribed or translated from the encoding nucleic acid, and any species directly participating in any chemical reactions involving the aforementioned.
As there is no barrier creating an isolated interior for a part, species for internal reactions must be included even if they are not directly connected with the primary output of the part. Thus for J45200 the ATF1 gene must be cited as a terminal gene even thought it is not considered a primary output the part (as isoamyl acetate clearly is).
The terminal list must include exactly one citation per chemical species, regardless of how many times that species appears in the brick (so long as it appears at least once).
part_terminal_list = TermSymbol, TermSymbol*
TermSymbol = ( CompoundID | GeneID | SO:CDS )
TermSymbol = CompoundID (PubChem/KEGG/Biocyc, for registered chemical reg. signal) TermSymbol = GeneID (for transcribed products or transcription factors recognized by gene identifier where no compound ID exists) TermSymbol = SO:CDS (referring to any transcription product not entirely described by GeneID, including unspecified products defined by coding sequences external to the part)
Interactions is split into two parts. The first part called 'reg-relations' is for behaviors mediated by gene regulation mechanisms. The second part 'events' is for biochemical reactions not directly mediated by gene regulation.
The overall form of an interaction declaration looks like:
(interaction (reg-relations ...) (events ...) )
Part Regulatory Relationships
Regulatory relationship assertions are made when some regulatory element exists in the part. Declaring relationship assertions when there is no regulatory feature declared in the composition section should be regarded as an error.
A regulatory relationship assertion consists of two parts. The first part indicates what terminals are involved and the second indicates what sort of relationship exists between them.
Note that the second part is optional. This allows the annotator to assert the existence of a relationship without stating anything about the nature of that relationship. Such an assertion may be useful in cases where behavior is still being characterized, or defined elsewhere, or in commercial scenarios held proprietary.
RegRelationAssertion = (termsinvolved,rel_attribute)
termsinvolved = (TermSymbol,TermSymbol) | ((TermSymbol,TermSymbol),PrecededBy)
(TermSymbol,TermSymbol) = arbitrary relation between first and second terminals, no order implied
((TermSymbol,TermSymbol),OBO:preceded_by) = actuation relation exists between first and second TermSymbol
rel_attribute = SO:transcriptionally_regulated or hyponyms
Event declarations cover chemical and physical activities that are not directly related to regulation but still part of the action performed by the part. This includes any species directly participating in reactions involving mRNA transcripts derived from coding sequences or biochemical reactions involving any proteins translated from those mRNA's. This includes actuations, chemical reactions catalyzed by proteins generated by the part, etc.
EventAssertion = GenericActuation | RegisteredReaction
GenericActuation = ((actuator target) (attribute value)) RegisteredReaction = (((actuator target) DbRxnID) [KEGG,Biocyc, etc.] (attribute value))
For generic actuations the attribute should be at minimum the SBO term 'participant' or one of its hyponyms, and the value should cite the terminal providing the control or influence.
- NOTE: submit proposal to consider having BBF petition SBO maintainers to add 'light' as a physical parameter term (maybe adding 'fluorescence' and 'phosphorescence' as hyponyms)