IC Bioinfs/Project1/DAS

From OpenWetWare
Jump to navigationJump to search


  • V useful explanation - CLICK HERE
    • "The distributed annotation system (DAS) is a client-server system in which a single client integrates information from multiple servers. It allows a single machine to gather up genome annotation information from multiple distant web sites, collate the information, and display it to the user in a single view. Little coordination is needed among the various information providers." - Wikipedia
    • "DAS is a specification of a protocol for requesting and returning annotation data for genomic regions. DAS allows sequence annotation to be stored in a decentralised manner, by multiple third-party annotators, and integrated on an as-needed basis by client-side software." - Ensembl
    • "The Distributed Annotation System (DAS) [2] was originally conceived as a mechanism to aggregate and display genome sequence annotations such as transcript predictions. It is built upon the principle that data should remain spread across multiple sites, rather than aggregated into centralised databases. Thus data providers retain control over data access, releases can be more dynamic and changes to file formats or database structures are transparent. DAS has a "dumb server, clever client" architecture, which holds a number of advantages. For example, the minimal resources and time required of data providers to expose their data means more sources can be integrated and more readily. Conversely, one of the main reasons for this ease of implementation is a lack of enforced semantics, which limits applications primarily to visual display. In addition, DAS has been lacking a central registry of available data sources."- Paper: Integrating Biological Data: The DAS. Available at:http://www.biomedcentral.com/1471-2105/9/S8/S3
    • "DAS provides a convention how to encode a DNA or protein sequence and its annotated features into simple XML documents that are exchanged over the Internet. Protein sequences or regions on a chromosome are considered to be the reference objects. DAS servers that provide this sequence information are called reference servers. Only few such servers are available. In contrast to this a large number of annotation servers provide the actual annotation data that is available for the reference objects.

The registration server collects the location of all DAS servers. A DAS client, like Ensembl, can retreive this listing. It then contacts a reference server to obtain the sequence and all the available annotation servers, in order to provide an integrated view of the results." Sanger Trust

    • Entry points DAS contains entry points, elements which vary depending on nature of projects: They can be entire chromosomes or a sequence of contigs.
    • Here an entry point can be a biobrick type?: eg. choose an entry point for a promoter/entry point for an RBS/type of part?
    • The entry point can have sub-entry points. Consider an entry point(e.g. a Biobrick composed by two previously known parts), then a sub-entry point will be each of those parts of which the Biobrick is formed. For what I understood in the example provided below, it's a matter of referring to the features of our annotation. For example, how to find the coordinates of any sub-elements in our annotation. DAS does this in a recursive manner (like peeling layers from an onion), and we can specify annotations by referring to super-links(the "container" components), their links (smaller components), any start-stop possitions or chromosome coordinates, automatically converting any difference in the scale of the annotation. I don't know how would we apply this to the Biobricks. Perhaps what's the possition of the gene in the biobrick in their of origin?
      • Example:To give a concrete example, the C. elegans reference map consists of six chromosome-length entry points. Each chromosome is formed from several contigs called "superlinks", and each superlink contains one or more smaller contigs called "links". Links in turn are composed of one or more fully-sequenced clones. One could refer to an annotation by specifying its start or stop positions in clone, link, superlink, or chromosome coordinates. The distributed annotation system automatically converts any coordinate system into any other. Because coordinates within clones are more stable to revisions than coordinates within links or chromosomes, it is recommended that annotation coordinates be stored relative to the smallest sequencing unit.
    • A single DAS server is designated as either an annotation server or as the reference server.
    • Reference servers- Specialised for returning lists of annotations accross certain regions of the genome.
    • The reference server provides essential structural information about the genome: the physical map which relates one entry point to another, the DNA sequence for each entry point, and some standard authorship information.
    • The reference server can:
      • The raw DNA of the sequence;
      • The annotations of the "component" of a category (E.g., a contig is the component of a chromosome; thus, reference servers can return the annotation for a contig);
      • The annotations of "supercomponents" of a category (E.g., a chromosome is the supercomponent of a contig.).
    • Annotation servers- Using either a free-standing application or a web site (such as Ensembl) that acts like a DAS client, researchers can interrogate one or more annotation servers to retrieve features in a region of interest. The servers return the results using a standard data format, allowing the sequence browser to integrate the annotations and display them in graphical or tabular form.(from ensembl's website)
    • Annotations-
      • TYPE: Selected from a list with biological significance, correspond to EMBL/GenBank feature table tags. Egs- introns, exons, cds, splice3...etc. (registry already has tags)- primers etc.
      • METHOD: how was the annotated feature discovered? Ref to software program- or in this case lab? Im not sure.
      • CATEGORY: broad functional. filter, group and sort annotations. "homology", "variation" and "transcribed" are examples. New annotation types.

A single server can provide both reference sequence information and annotation information. The main functional difference is that the reference sequence server is required to serve the sequence map and the raw DNA, while annotation servers have no such requirement.

    • All DAS requests take the form of a URL. Each URL has a site-specific prefix, followed by a standardized path and query string. The standardized path begins with the string /das. This is followed by URL components containing the data source name and a command.
 For example: 
 ^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^ ^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    site-specific prefix    das  data   command  arguments

Format: <?xml version="1.0" standalone="no"?> <!DOCTYPE DASDSN SYSTEM "http://www.biodas.org/dtd/dasdsn.dtd"> <DASDSN>

   <SOURCE id="id1" version="version">source name 1</SOURCE>
   <DESCRIPTION>descriptive text 1</DESCRIPTION>
   <SOURCE id="id2" version="version">source name 2</SOURCE>
   <DESCRIPTION href="url">descriptive text 2</DESCRIPTION>


  • Ensembl makes use of the DAS in two ways:
    • External data may be integrated into the website.
    • Ensembl data may be integrated into other applications via the Ensembl DAS server.
  • Ensembl allows attachment and configuration of external DAS sources to several Ensembl genome browser displays:
    • Region overview (positional annotations)
    • Region in detail (positional annotations)
    • Gene -> External data (non-positional annotations)
    • Transcript -> External data (non-positional annotations)
    • DAS requests in ENSEMBL
    • DAS request URLs have a specific format:




List of Servers using DAS http://www.dasregistry.org/ Useful paperWORTH READING: http://www.biomedcentral.com/1471-2105/9/S8/S3

From Parts registry

  • Registry API
  • The information in the Registry is being made available to software tools developers through a series of API's (Application Programming Interface). These interfaces will change with time.
  • FASTA Formatted Sequences
  • We will provide a daily update of part sequences, types, subparts, status, and short description for each part and for all parts. Go to http://partsregistry.org/fasta/parts/BBa_C0040 (substitute our desired part name for BBa_C0040) and you will receive a FASTA formatted file with the part's sequence. The header line has this format:
  • '>'[Part name] [First character of status] [Part Id Number] [Part type] [Short description]
  • Note: the short description has unusual characters converted to their two-digit hex value.
  • You can also get all of the parts in a single download (about 30 megabytes) as http://partsregistry.org/fasta/parts/All_Parts.
  • We are not yet updating these files on a daily basis. --