Mking44 Week 10
Assignments
Individual Journal Entries
Class Journal Entries
Purpose
The purpose of this week's assignment was to critically analyze MetaCyc metabolism database and determine if it is a overally good database in terms of user-friendliness, relativeness, and quality of scientific work.
Choosing a Database
- Databases to select were found on Nucleic Acids Research Database Issue Table of Contents 2020
- MetaCyc was chosen as database to examine
- Ron Caspi, Richard Billington, Ingrid M Keseler, Anamika Kothari, Markus Krummenacker, Peter E Midford, Wai Kit Ong, Suzanne Paley, Pallavi Subhraveti, Peter D Karp, The MetaCyc database of metabolic pathways and enzymes - a 2019 update, Nucleic Acids Research, Volume 48, Issue D1, 08 January 2020, Pages D445–D453, https://doi.org/10.1093/nar/gkz862
Database Evaluation
- Questions were obtained from Week 10 Assignment
General information about the database
- What is the name of the database? (link to the home page)
- MetaCyc (https://metacyc.org/)
- What type (or types) of database is it?
- Metabolic pathway database, including enzymes (https://metacyc.org/)
- What biological information (type of data) does it contain? (sequence, structure, model organism, or specialty [what?])
- From their homepage: "pathways involved in both primary and secondary metabolism, as well as associated metabolites, reactions, enzymes, and genes" (https://metacyc.org/)
- What type of data source does it have?
- primary versus secondary ("meta")?
- secondary data, from their homepage: "curated database of experimentally elucidated...pathways" (https://metacyc.org/)
- curated versus non-curated? electric versus human curation? if human, in-house staff versus community?
- curated, see above. Human curation with a committee of scientists(https://ecocyc.org/advisors.shtml)
- primary versus secondary ("meta")?
- What individual or organization maintains the database?
- SRI International maintains the database. (https://metacyc.org/)
- public versus private
- public
- large national or multinational entity or small lab group
- "nonprofit scientific research institute" (https://en.wikipedia.org/wiki/SRI_International)
- What is their funding source(s)?
- Found on their homepage, they are funded by NIH National Institute of General Medical Sciences. (https://metacyc.org/)
Scientific quality of the database
- Does the content appear to completely cover its content domain?
- How many records does the database contain?
- Their homepage states it contains 2,766 pathways from 3,067 different organisms as well as over 1,000 enzymes and other objects in the database. (https://metacyc.org/)
- What claims do the database owners make about coverage in the corresponding paper?
- In the article, the database owners state that since they pull from over 60,000 publications, it makes them the largest curated collection of metabolic pathways. They also state they have a lot more records than their competitors, as well as rate of records added being higher as well. (https://academic.oup.com/nar/article/48/D1/D445/5581728)
- How many records does the database contain?
- What species are covered in the database? (If it is a very long list, summarize.)
- By performing a search and filtering by organism, they have all domains of life! (https://metacyc.org/pwy-search.shtml)
- Is the database content useful? I.e., what biological questions can it be used to answer?
- Yes the database content is very useful. It can be used to compare different pathways between organisms, help with metabolite research, and solving certain diseases relating to metabolism. There is also a feature on the website which predicts metabolic pathways in sequenced genomic data, which can help identify certain organisms. (https://metacyc.org/)
- Is the database content timely?
- Is there a need in the scientific community for such a database at this time?
- Sure, a lot of things in medicine like cancer, neurological and metabolic diseases, as well as general concepts such as helping drug target identification.
- Is the content covered by other databases already?
- There are a few metabolism databases, including its main competitor KEGG, which is mentioned in the Nucleic Acid Research article. However, some of them are specific to one organism (Human Metabolome Database) or have different databases or features within them. On their website, they compare Metacyc to other common metabolic databases (there's only a few) (https://metacyc.org/MetaCycUserGuide.shtml#TAG:__tex2page_sec_10)
- Is there a need in the scientific community for such a database at this time?
- How current is the database?
- When did the database first go online?
- How often is the database updated?
- About every 3-4 months seems to be the trend between the release notes (https://metacyc.org/release-notes.shtml)
- When was the last update?
- Dec 18, 2019 was the last release notes published (https://metacyc.org/release-notes.shtml)
General utility of the database to the scientific community
- Are there links to other databases? Which ones?
- Since Metacyc is part of the BioCyc(pathway/genome database) database collection, EcoCyc(E coli database), HumanCyc, BsubCyc (Bacillus subtilis database) are all linked. Also, they include database links for different types of databases such as Uniprot, PubChem, NCI Open Database, and a lot more found here (https://metacyc.org/MetaCycUserGuide.shtml#TAG:__tex2page_sec_8). They also allow for users to perform BLAST searches against some MetaCyc proteins in the Search menu.
- Is it convenient to browse the data?
- Yes it is convenient to browse the data. All the user has to do is search in the search bar by gene, protein, metabolite, or pathway for a basic search, or for example, can choose to perform analysis between different species, genes, proteins, etc. (https://metacyc.org/)
- Is it convenient to download the data?
- It is convenient to download the data. Genes with their name and ID can be downloaded in txt file, as well as pathways visualized with Biopax, which is the universal language for pathways or as a PDF file. (https://biocyc.org/download.shtml)
- However, some of the tools require Biocyc subscriptions.
- In what file formats are the data provided?
- BioPAX format, attribute-value format, tabular format, SBML format, FAFSA (http://bioinformatics.ai.sri.com/ptools/flatfile-format.html)
- What type of files, indicated by the file extension (e.g., .txt, .xml., etc.)?
- .col,.dat, .owl, .fsa (http://bioinformatics.ai.sri.com/ptools/flatfile-format.html)
- Are they standard or non-standard formats? (i.e., are they following an approved standard for that type of data)?
- Yes, .owl and .fsa are standard, but .col and .dat are just attribute value and tabular format used by the tool.
- What type of files, indicated by the file extension (e.g., .txt, .xml., etc.)?
- BioPAX format, attribute-value format, tabular format, SBML format, FAFSA (http://bioinformatics.ai.sri.com/ptools/flatfile-format.html)
- Evaluate the “user-friendliness” of the database: can a naive user quickly navigate the website and gather useful information?
- Is the website well-organized?
- Yes it is well organized. It has a main search bar, then different headings such as Sites, Search, Genome, Metabolism, Analysis and Help.
- Does it have a help section or tutorial?
- Yes, there is a user guide on the homepage. (https://metacyc.org/MetaCycUserGuide.shtml)
- Are the search options sensible?
- Yes, the search options are sensible. You can search by different filters such as organism, protein, gene, and pathway.
- Run a sample query. Do the results make sense?
- I performed a search result of lactose. The results did make sense and were organized by pathways, proteins, gene ontology terms, reactions and EC numbers. (https://metacyc.org/META/substring-search?type=NIL&object=lactose&quickSearch=Quick+Search)
- Is the website well-organized?
- Access: Is there a license agreement or any restrictions on access to the database?
- There is no limitation to accessing the database. However, some of the analysis tools require some subscriptions to the BioCyc site.
Summary judgement
- Would you direct a colleague unfamiliar with the field to use it?
- Yes I would direct a colleague unfamiliar with the field to use it. If they needed to study different metabolic pathways or needed to find different enzymes or proteins in metabolism, then yes I would.
- Is this a professional or "hobby" database? The "hobby" analogy means that it was that person's hobby to make the database. It could mean that it is limited in scope, done by one or a few persons, and seems amateur.
- It is definitely a professional database. By using over 60,000 publications, the creators of MetaCyc have made a universal database for all types of organisms and thousands of enzymes, proteins, genes, and pathways, with links to other databases as well.
Scientific Conclusion
MetaCyc is a database that has been around for over 2 decades that contains metabolism, including genes, enzymes, proteins, etc. They have pulled their data from over 60,000 publications which makes them the largest curated collection metabolism database. It doesn't require a subscription (except for some analysis tools), it is very user friendly, and a guide to the site is found on the homepage. They also are not specific to one or several species, but the site spans all three domains of life. This database is useful in studying metabolic pathways in general, metabolic and neurological disorders, and comparing different organisms in terms of their metabolism. It's very efficient also because when you search a metabolic pathway, there are links to other databases for ID numbers or additional information. In conclusion, MetaCyc is a good database for universal knowledge of metabolism and is the largest curated collection for metabolic pathways.
Acknowledgments
- I copied and modified the protocol from Week 10 for this assignment.
- Except for what is noted above, this individual journal entry was completed by me and not copied from another source.
Mking44 (talk) 20:08, 27 March 2020 (PDT)
References
- Caspi et al 2018, "The MetaCyc database of metabolic pathways and enzymes", Nucleic Acids Research 46(D1):D633-D639
- Downlaod BioCyc Databases and Pathway Tools Software. (2019). Retrieved March 30, 2020, from https://biocyc.org/download.shtml
- EcoCyc/BioCyc Steering Committee. (2019). Retrieved March 30, 2020, from https://ecocyc.org/advisors.shtml
- OpenWetWare. (2020). BIOL368/S20:Week 10. Retrieved March 27, 2020, from https://openwetware.org/wiki/BIOL368/S20:Week_10
- Pathway Tools Data-File Formats. (2017). Retrieved March 30, 2020, from http://bioinformatics.ai.sri.com/ptools/flatfile-format.html
- Ron Caspi, Richard Billington, Ingrid M Keseler, Anamika Kothari, Markus Krummenacker, Peter E Midford, Wai Kit Ong, Suzanne Paley, Pallavi Subhraveti, Peter D Karp, The MetaCyc database of metabolic pathways and enzymes - a 2019 update, Nucleic Acids Research, Volume 48, Issue D1, 08 January 2020, Pages D445–D453, https://doi.org/10.1093/nar/gkz862
- SRI International. (2020, March 19). Retrieved March 30, 2020, from https://en.wikipedia.org/wiki/SRI_International