User:Ilya/Registry: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
 
(42 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{Synthetic biology top}}
==To do==
<div style="padding: 10px; color: #000000; background-color: #ccccff; width:730px" >
*Map parts database schema to RDF/OWL (D2R Map/Server)
**[http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/SenseLab Use RDF/OWL to describe neuronal data available in SenseLab] - similar project
*[http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/ HCLSIG BioRDF subgroup tasks] - interesting projects
*Use LSID for parts identification
**setup LSID resolution service
*How to represent sequence features (do they belong to sequence or part)?
**Part has features and has a sequence (piece of DNA with molecular function combined by BB assembly)
**Sequence has features but a part already has sequence
*Tools to create and edit ontology and RDF instances?
**Protege from Stanford?
**IsaViz from W3C?
*existing RDBMS <-> RDF <-> objects (e.g., Javascript)
*Do we need "Device"?
*I want to build a NOR gate vs. I have a NOR gate
*Find a way to use MediaWiki software to work with the Semantic Web ontology of biological parts: create a UI from the description of a part in the ontology that would check the entered information for correctness according to the part definition in the ontology.


==Data or Metadata==
==To read==
(from [http://www.ibm.com/developerworks/opensource/library/os-lsidbp/ LSID best practices])
*[[doi:10.1371/journal.pone.0000339|Deductive Biocomputing]]
Data is defined as a sequence of unchanging bytes. Examples of data are microscope images, a protein sequence, a text file, etc. Metadata is usually information that describes the data either literally (date created, MD5 check sum, size) or contains information describing the relationship between the data and other objects.
:As biologists increasingly rely upon computational tools, it is imperative that they be able to appropriately apply these tools and clearly understand the methods the tools employ. Such tools must have access to all the relevant data and knowledge and, in some sense, “understand” biology so that they can serve biologists' goals appropriately and “explain” in biological terms how results are computed.
If you cannot determine what should be data and what should be metadata from your data model, follow this rule of thumb: Large byte sequences are easier to manipulate as data, while short byte sequences can be included as data, metadata, or made available in both forms.
*[http://dig.csail.mit.edu/2007/01/camp/ Semantic Web Boot Camp 2007 IAP]
*[http://www.semantictools.ru/ SemanticTools.ru]
*[http://www.lisperati.com/tellstuff/ How To Tell Stuff To A Computer - The Enigmatic Art of Knowledge Representation]
*[http://esw.w3.org/topic/HCLS/Banff2007Demo HCLS Demo given in Banff at WWW2007]
*[http://dataportability.org/ DataPortability project] - share and remix data using open standards
*[http://theinfo.org/ (theinfo)] - for people with large data sets
*[http://ibm-slrp.sourceforge.net/ IBM Semantic Layered Research Platform]
*[http://esw.w3.org/topic/LinkedData Linked Data] is to spreadsheets and databases what the Web of hypertext documents is to word processor files
*[http://www.w3.org/2001/sw/sweo/ Semantic Web Education and Outreach (SWEO) Interest Group]
**[http://www.w3.org/TR/cooluris/ Cool URIs for the Semantic Web] - document explaining the effective use of URIs to enable the growth of the Semantic Web
*[http://www.freebase.com/ Freebase] is an open, shared database of the world's knowledge
*Using RDF on the Web: [http://thefigtrees.net/lee/blog/2007/01/using_rdf_on_the_web_a_survey.html A Survey], [http://thefigtrees.net/lee/blog/2007/01/using_rdf_on_the_web_a_vision.html A Vision]
*[http://www.w3.org/2005/Incubator/rdb2rdf/ W3C RDB2RDF Incubator Group]
*[http://www.biositemap.org/ Biositemap] allows scientists, engineers, centers and institutions engaged in modeling, software tool development and analysis of biomedical and informatics data to broadcast and disseminate to the world the information about their latest computational biology resources (data, software tools and web-services) - [[Wikipedia:Biositemap|from Wikipedia]].
*[http://bioontology.org/projects/ontologies/SoftwareOntology/ Software Ontology]
*[http://intranet.cs.man.ac.uk/bhig/  Bio-Health Informatics Group] at the University of Manchester
*[http://img.cs.man.ac.uk/ The Information Management Group] at the University of Manchester
*[[Wikipedia:Semantic_search|Semantic search]]
*[http://www.topazproject.org/ Topaz] is a powerful object to RDF persistence and query service ([http://www.plos.org/cms/node/260 used by PLoS]).


==Abstraction Hierarchy==
==BBF standards==
*Part - simple biological function encoded in DNA
*[[The_BioBricks_Foundation:Standards/Technical|Technical standards]]
*Device - simple logical function; collection of parts
**[[The_BioBricks_Foundation:Standards/Technical/Exchange|Data exchange]]
*System - collection of devices
**[[PICA_Framework_Draft_Proposal_Documents|Part Interaction and Composition Assertion Framework Draft Proposal Documents]]
*Device is_a part in context of the system but also device has_a part.
**[http://brickit.wiki.sourceforge.net/Data+model BrickIt data model] aims to create a portable web-based registry that helps synthetic biologists to plan, organize and track their local biobrick samples ([http://brickit.wiki.sourceforge.net/ wiki])
*Device is_a subclass of Part, System is_a subclass of Device
**[http://biohack.sourceforge.net/wiki/index.php/Biobricks Biobricks at Biohack wiki]
*How to represent barriers and interfaces betwee levels of abstration?
*Genetic, protein and cell devices
* :RBS :subclassOf :BasicPart OR :RBS :typeOf :BasicPart (instance)
*Basic parts: detailed specs and sequence data
*Composite parts: basic parts plus assembly (composite parts have are the same if they have the same basic parts)
*Device: not necessary on a single piece of DNA
*Separate spaces: set of hierarchies
**Physical (DNA sequence assembly)
**Design
**Standards (Performance)
*Class of Standards: assembly standards, performance standards?


==Current Design==
==External projects==
Biobricks come in three flavors:
*[http://research.nokia.com/projects/connectingme ConnectingMe] project will develop a new application architecture that uses a semantic web information repository and data integration engine along with a user customizable presentation engine
*Parts/basic parts/subparts encode basic biological functions (RBS, CDS)
*[http://projects.csail.mit.edu/jourknow/ Jourknow] - semantic web personal info organizer ([http://projects.csail.mit.edu/jourknow/study/ FAQ])
*Devices/composite parts are made from a collection of parts and encode some human-defined functions, such as logic gates in electronic circuits) (inverter)
*Systems perform tasks, such as counting (oscillator)


*No need to specify deep_components vs component_list
==Meetings==
*Right now: composite parts have only components listed; deep components produced from that list
*[http://esw.w3.org/topic/CambridgeSemanticWebGatherings/Gatherings/ Cambridge Semantic Web Gatherings]
**Demos:
***[http://e-culture.multimedian.nl/ MultimediaN N9C Eculture project homepage]
***[http://cmch.tv/research/semanticSearch.asp CMCH Database of Literature smart search]
***[http://demo.openlinksw.com/isparql/ OpenLink iSPARQL]


Types: what type are Plasmid, Cell and T7?
==Data or metadata==
(from [http://www.ibm.com/developerworks/opensource/library/os-lsidbp/ LSID best practices])
Data is defined as a sequence of unchanging bytes. Examples of data are microscope images, a protein sequence, a text file, etc. Metadata is usually information that describes the data either literally (date created, MD5 check sum, size) or contains information describing the relationship between the data and other objects.
If you cannot determine what should be data and what should be metadata from your data model, follow this rule of thumb: Large byte sequences are easier to manipulate as data, while short byte sequences can be included as data, metadata, or made available in both forms.


[http://parts2.mit.edu/r/parts/partsdb/index.cgi Registry Parts Index]
==From XML to RDF==
*A part is not allowed to contain both its own sequence and other parts
[[doi:10.1038/nbt1139|From XML to RDF]]: how semantic web technologies will change the design of 'omic' standards
*Subparts - ordered set
* ?
 
Naming convention
*Part name/number - unique ID
*BB a _ X nnnnnn
*BB: BioBricks
*a: alpha stage of development
*X: part type
*nnnnnn: 4-6 digit part number
*Normally, the part name contains the letter associated with the part's type. Confusion is possible when a part fits into multiple categories.
 
Part properties ([http://parts2.mit.edu/r/parts/partsdb/view.cgi?part_id=153 example]) (* marks properties that belong to composite, possible value(s) are in parenthesis)
*name
*short_description (Promoter (lacI regulated, lambda pL hybrid))
*description
*type (Regulatory)
*status/availability (Available)
*results/usefulness (None|Fails|Works)
*component_list (NULL | BBa_B0032 BBa_C0051 BBa_B0010 BBa_B0012 BBa_R0063 BBa_B0030)*
*base_components (0 | 9)*
*deep_components (NULL | 149 156 603 145 193 147 161 603 145)*
*deep_components_2 (own part_id | _149_156_603_145_193_147_161_603_145_)* ?
*deep_component_count (1 | 9)*
*device_name (NULL | inverter)*
*sequence (why is sequence available for the composite parts)
*feature(s)
**type
**start
**stop
**label
*usage
**lastmod_user
**lastmod_date
*biology (Very weak promoter)
*functional parameters
**efficiency 0.6
*design
**author (names(s) or id)
**owner (number: owner_id)
**creation_date
**container_id
**version
**source (Bacteriophage 434 right operator)
**notes
**reference?
**owning groups
*physical DNA (instances?)
**plasmid
**plasmid_length
**part_and_plasmid_length
**VF2-VR
*location(s) - This part may be found in these wells/tubes
**library
**well
**plate
**plasmid - this the same plasmid as in physical DNA section above?
**cell
*files
*references
*licenses
 
==New Design==
*User
**Ontology-based knowledge sharing
**Ontology-based presentation platform
**Ontology-based search engine
*Backend
**Inference and query engine
**Persistant storage for ontologies and metadata
**Extraction tools for metadata
*Architecture based on open standards: RDF, OWL, HTTP, etc


==Microformats==
*[http://microformats.org/ microformats.org]
*[http://gmpg.org/xfn/ XFN] - Xhtml Friends Network


==Miscellaneous==
==Miscellaneous==
Line 116: Line 75:
*Data is represented by a graph of triples (statements about resources)
*Data is represented by a graph of triples (statements about resources)
*Syntax doesn't matter: there are many ways to serialize the data (XML, N3, etc).
*Syntax doesn't matter: there are many ways to serialize the data (XML, N3, etc).
*[http://www.biowisdom.com/ontology/faq_q1.htm Ontology vs taxonomy vs thesaurus vs list]
*[[Wikipedia:Ontology|Ontology]] vs [[Wikipedia:Taxonomy|Taxonomy]] vs [[Wikipedia:Folksonomy|Folksonomy]] vs [[Wikipedia:Collabulary|Collabulary]]
**Taxonomy - concepts and relationships but no attributes.
**Controlled vocabulary - only concepts.
*[http://microformats.org/ Microformats]
**"lowercase semantic web"
**humans first, machines second
*HCLS task forces:
*HCLS task forces:
**[http://www.w3.org/2001/sw/hcls/task_forces/BIORDF.doc BIORDF] (Structured data to RDF) - Susie Stephen, Joanne Luciano co-leads
**[http://www.w3.org/2001/sw/hcls/task_forces/BIORDF.doc BIORDF] (Structured data to RDF) - Susie Stephen, Joanne Luciano co-leads
Line 142: Line 94:
**[http://www.mnot.net/blog/2004/04/14/rest_in_wsdl REST in WSDL]
**[http://www.mnot.net/blog/2004/04/14/rest_in_wsdl REST in WSDL]
**[http://esw.w3.org/topic/WebDescriptionProposals Web Description Proposals]
**[http://esw.w3.org/topic/WebDescriptionProposals Web Description Proposals]
 
*[[Wikipedia:Logic|Logic]] studies the laws of valid [[Wikipedia:Inference|inference]] (the act or process of deriving a conclusion based solely on what one already knows).
==To Do==
*[[Wikipedia:Closed_world_assumption|Closed world assumption]] is the presumption that what is not currently known to be true is false.
*Map parts database schema to RDF/OWL (D2R, other?)
*[[Wikipedia:Open_World_Assumption|Open World Assumption]] assumes that its knowledge of the world is incomplete. If something cannot be proved to be true, then it doesn't automatically become false.
**[http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/SenseLab Use RDF/OWL to describe neuronal data available in SenseLab] - similar project
*[http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/ HCLSIG BioRDF subgroup tasks] - interesting projects
*Use LSID for parts identification
**setup LSID resolution service
*How to represent sequence features (do they belong to sequence or part)?
**Part has features and has a sequence (piece of DNA with molecular function combined by BB assembly)
**Sequence has features but a part already has sequence
*Tools to create and edit ontology and RDF instances?
**Protege from Stanford?
**IsaViz from W3C?
*existing RDBMS <-> RDF <-> objects (e.g., Javascript)
*Do we need "Device"?
*I want to build a NOR gate vs. I have a NOR gate
 
==From XML to RDF==
(from [http://dx.doi.org/10.1038/nbt1139])
* ?
 
==Knowledge Management==
(knowledge represenation, extraction, etc.)
*Thinking XML: Basic XML and RDF techniques for knowledge management
**[http://www.ibm.com/developerworks/xml/library/x-think4/index.html Part 1]: Generate RDF using XSLT
**[http://www.ibm.com/developerworks/xml/library/x-think5/index.html Parl 2]: Combining files into an RDF model, and basic RDF querying
**[http://www.ibm.com/developerworks/xml/library/x-think6.html Part 3]: Knowledge from semantics
**[http://www.ibm.com/developerworks/xml/library/x-think8.html Part 4]: Issue tracker schema
**[http://www.ibm.com/developerworks/xml/library/x-think9.html Part 5]: Defining RDF and DAML+OIL
**[http://www.ibm.com/developerworks/xml/library/x-think10/index.html Part 6]: RDF Query using Versa
**[http://www.ibm.com/developerworks/xml/library/x-think12.html Part 7]: Review and relevance of the techniques discussed
*[http://www.research.ibm.com/journal/sj40-4.html IBM Systems Journal - Knowledge Management]
*[http://kr.org/ Principles of Knowledge Representation and Reasoning, Incorporated (KR, Inc.)] is a Scientific Foundation incorporated in the state of Massachusetts of the United States of America concerned with fostering research and communication on knowledge representation and reasoning.
*Personal
**[http://mindraider.sourceforge.net/ MindRaider] is Semantic Web outliner
**[http://www.gnowsis.org/ gnowsis] the Semantic Desktop environment
**[http://usefulinc.com/articles/2004/desktop-metadata 2004: Metadata for the desktop]
*Text Mining
**[[doi:10.1038/4401090a|Machine readability]] - Nature article
**[http://biotext.berkeley.edu/ BioText] - infrastructure to support the development and deployment of statistical approaches to natural language processing, which will identify entities and relations between them in bioscience texts
**[http://blogs.nature.com/wp/nascent/2006/04/open_text_mining_interface_1.html OpenText Mining Interface (OTMI)]
**[http://arrowsmith.psych.uic.edu/arrowsmith_uic/index.html Arrowsmith] explores the causes of disease
**[http://www.ebi.ac.uk/Rebholz-srv/ebimed/index.jsp EBIMed] is a web application that combines Information Retrieval and Extraction from Medline
**[http://bionlp.org/ BIONLP] - natural language processing of biology text] - Bob Futrelle's page
*[http://aima.cs.berkeley.edu/ Artificial Intelligence: A Modern Approach] - this book describes the nature of knowledge, its representation, inference based on knowledge, and many other topics (sample chapters available online)
*[[Wikipedia:Concept_maps|Concept maps]]
*[[Wikipedia:Mind_mapping|Mind maps]]
*[[Wikipedia:Topic_map|Topic maps]]
**[http://www.ontopia.net/topicmaps/materials/tmrdfoildaml.html Topic maps, RDF, DAML, OIL]
*[[Wikipedia:Semantic_network|Semantic network]] - a directed graph consisting of vertices which represent concepts and edges which represent semantic relations between the concepts
 
===Description Logics===
*[[Wikipedia:Description_logic|Description Logic]] - a cornerstone of the Semantic Web for its use in the design of ontologies
**[http://www.inf.unibz.it/~franconi/dl/course/ Tutorial course]
**[http://www.cs.man.ac.uk/~ezolin/logic/complexity.html Description Logic Complexity Navigator]
*Logic: a well formalized part of agent knowledge and reasoning.
*Reasoning: logical inference, "processing knowledge" (implicit knowledge has to be made explicit)
*Expressive Power of representation language - able to represent the problem
*Correctness of entailment procedure - no false conclusions are drawn
*Completeness of entailment procedure - all correct conclusions are drawn
*Decidability of entailment problem - there exists a (terminating) algorithm to compute entailment
*Complexity - resources needed for computing the solution
*Logics differ in terms of their representation power and computational complexity of inference. The more restricted the representational power, the faster the inference in general.
*First-order logic: we can now talk about objects and relations between them, and we can quantify over objects. Good for representing most interesting domains, but inference is not only expensive, but may not terminate.
*DL vs OWL (from [[Wikipedia:Description_logic|Description Logic]] @ Wikipedia):
**A concept in DL jargon is referred to as a class in OWL
**A role in DL jargon is a property in OWL.
*DL vs ER (from http://www.inf.unibz.it/~franconi/dl/course/slides/db/db.pdf):
**An ER conceptual schema can be expressed in a suitable description logic theory.
**The models of the DL theory correspond with legal database states of the ER schemas.
**Mapping ER schema in DL theory:
***Reasoning services such as satisfiability of a schema or logical implication can be performed by the corresponding DL theory.
***A description logic allows for a greater expressivity than the original ER framework, in terms of full disjunction and negation, and entity definitions by means of both necessary and sufficient conditions.
 
===Knowledge Bases===
(from http://www.inf.unibz.it/~franconi/dl/course/slides/kbs/kbs-modelling.pdf)
*Distinctions:
**Primitive vs. Defned.
**Defnitional vs. Incidental.
**Concept vs. Individual.
**Concept vs. Role.
*Steps to design:
**Enumerate Objects. As a bare list of elements of the KB; they will became individuals, concepts, or role.
**Distinguish Concepts from Roles. Make a first decision about what object must be considered role; remember that some could have a "natural" concept associated. The remaining objects will be concepts (or maybe individuals).  Also, try to distinguish roles from attributes.
**Develop Concept Taxonomy. Try to decide a classifcation of all the concepts, imagining their extensions. This taxonomy will be used as a first reference, and could be revised when definition will be given. It will be used also to check if definition meet our expectations (sometime, interesting, unforeseen (re)classifications are found).
**Devise partitions. Try to make explicit all the disjointness and covering constraints among classes, and reclassify the concepts.
**Individuals. Try to list as many as possible generally useful individuals. Some could have been already listed in step 1. Try to describe them (classify).
**Properties and Parts. Begin to define the internal structure of concepts (this process will continue in the next steps). For each concept list:
***intrinsic properties, that are part of the very nature of the concept;
***extrinsic properties, that are contingent or external properties of the object; they can sometime change during the time;
***parts, in the case of structured or collective objects. They can be physical (e.g., "the components of a car", "the casks of a winery", "the students of a class", "the members of a group", "the grape of a wine") or abstract (e.g., "the courses of a meal", "the lessons of a course", "the topics of a lesson").
***In some cases some relationships between individuals of classes can be considered too accidental to be listed above (e.g., "the employees of a winery"; but the matter could change if we consider Winery as a subconcept of Firm).
***In general, the above distinctions depend on the level of detail adopted.
***Some of the listed roles will be later considered defnitional, and some incidental.
***After this and the next steps check/revision of the taxonomy could be necessary.
**Cardinality Restrictions. For the relevant roles for each concept.
**Value Restriction. As above. Also, chose the right restriction.
**Propagate Value Restrictions. If some value restrictions stated in the previous step does not correspond to already existing concepts, they must be defined.
**Inter-role Relationship. Even if hardly definable in DL, they can be useful during the populating and debugging phases.
**Definitional and Incidental. It is important distinguish between definitional and incidental properties, w.r.t. to the particular application.
**Primitive and Defined. As above.
 
==Links==
*[http://www.mozilla.org/rdf/doc/ RDF in Mozilla]
 
</div>
{{Synthetic biology bottom}}

Latest revision as of 14:31, 23 June 2008

To do

  • Map parts database schema to RDF/OWL (D2R Map/Server)
  • HCLSIG BioRDF subgroup tasks - interesting projects
  • Use LSID for parts identification
    • setup LSID resolution service
  • How to represent sequence features (do they belong to sequence or part)?
    • Part has features and has a sequence (piece of DNA with molecular function combined by BB assembly)
    • Sequence has features but a part already has sequence
  • Tools to create and edit ontology and RDF instances?
    • Protege from Stanford?
    • IsaViz from W3C?
  • existing RDBMS <-> RDF <-> objects (e.g., Javascript)
  • Do we need "Device"?
  • I want to build a NOR gate vs. I have a NOR gate
  • Find a way to use MediaWiki software to work with the Semantic Web ontology of biological parts: create a UI from the description of a part in the ontology that would check the entered information for correctness according to the part definition in the ontology.

To read

As biologists increasingly rely upon computational tools, it is imperative that they be able to appropriately apply these tools and clearly understand the methods the tools employ. Such tools must have access to all the relevant data and knowledge and, in some sense, “understand” biology so that they can serve biologists' goals appropriately and “explain” in biological terms how results are computed.

BBF standards

External projects

  • ConnectingMe project will develop a new application architecture that uses a semantic web information repository and data integration engine along with a user customizable presentation engine
  • Jourknow - semantic web personal info organizer (FAQ)

Meetings

Data or metadata

(from LSID best practices) Data is defined as a sequence of unchanging bytes. Examples of data are microscope images, a protein sequence, a text file, etc. Metadata is usually information that describes the data either literally (date created, MD5 check sum, size) or contains information describing the relationship between the data and other objects. If you cannot determine what should be data and what should be metadata from your data model, follow this rule of thumb: Large byte sequences are easier to manipulate as data, while short byte sequences can be included as data, metadata, or made available in both forms.

From XML to RDF

From XML to RDF: how semantic web technologies will change the design of 'omic' standards

  • ?

Microformats

Miscellaneous

  • Semantics - the meaning that is implied by words and sentences.
  • Software agent can search distributed registries using an ontology. This is impossible right now because storage schema is unknown.
  • Data is represented by a graph of triples (statements about resources)
  • Syntax doesn't matter: there are many ways to serialize the data (XML, N3, etc).
  • HCLS task forces:
    • BIORDF (Structured data to RDF) - Susie Stephen, Joanne Luciano co-leads
    • T2S (Text to Structured RDF) - Robert Futrelle, Matthew Cockerill
  • Architecture of the World Wide Web @ W3C
  • Reification @ Wikipedia
  • Metadata
    • Semantic mapper is tool or service that aids in the transformation of data elements from one namespace into another namespace.
    • Metadata registry is a central location in an organization where metadata definitions are stored and maintained in a controlled method.
  • XSLT vs XQuery
  • A Semantic Web Primer for Object-Oriented Software Developers - OO vs SW, links to software, etc
  • rdfs:label vs rdfs:comment
    • used to describe a resource with human readable text in addition to "pure" RDF properties (may have multiple values for internationalization needs)
    • rdfs:label is used to give a human-readable name of a resource
    • rdfs:comment is used to give a longer description
  • rdf:about and rdf:ID in RDF/XML
  • Resource manipulation and description: URIQA, REST, WebDAV, WSDL, etc
  • Logic studies the laws of valid inference (the act or process of deriving a conclusion based solely on what one already knows).
  • Closed world assumption is the presumption that what is not currently known to be true is false.
  • Open World Assumption assumes that its knowledge of the world is incomplete. If something cannot be proved to be true, then it doesn't automatically become false.