Individual Journal Entries

Class Journal Entries


The purpose of this week's assignment was to critically analyze MetaCyc metabolism database and determine if it is a overally good database in terms of user-friendliness, relativeness, and quality of scientific work.

Choosing a Database

  • Databases to select were found on Nucleic Acids Research Database Issue Table of Contents 2020
  • MetaCyc was chosen as database to examine
    • Ron Caspi, Richard Billington, Ingrid M Keseler, Anamika Kothari, Markus Krummenacker, Peter E Midford, Wai Kit Ong, Suzanne Paley, Pallavi Subhraveti, Peter D Karp, The MetaCyc database of metabolic pathways and enzymes - a 2019 update, Nucleic Acids Research, Volume 48, Issue D1, 08 January 2020, Pages D445–D453,

Database Evaluation

General information about the database

  1. What is the name of the database? (link to the home page)
  2. What type (or types) of database is it?
  3. What biological information (type of data) does it contain? (sequence, structure, model organism, or specialty [what?])
    • From their homepage: "pathways involved in both primary and secondary metabolism, as well as associated metabolites, reactions, enzymes, and genes" (
  4. What type of data source does it have?
    • primary versus secondary ("meta")?
      • secondary data, from their homepage: "curated database of experimentally elucidated...pathways" (
    • curated versus non-curated? electric versus human curation? if human, in-house staff versus community?
  5. What individual or organization maintains the database?
  6. What is their funding source(s)?
    • Found on their homepage, they are funded by NIH National Institute of General Medical Sciences. (

Scientific quality of the database

  1. Does the content appear to completely cover its content domain?
    • How many records does the database contain?
      • Their homepage states it contains 2,766 pathways from 3,067 different organisms as well as over 1,000 enzymes and other objects in the database. (
    • What claims do the database owners make about coverage in the corresponding paper?
      • In the article, the database owners state that since they pull from over 60,000 publications, it makes them the largest curated collection of metabolic pathways. They also state they have a lot more records than their competitors, as well as rate of records added being higher as well. (
  2. What species are covered in the database? (If it is a very long list, summarize.)
  3. Is the database content useful? I.e., what biological questions can it be used to answer?
    • Yes the database content is very useful. It can be used to compare different pathways between organisms, help with metabolite research, and solving certain diseases relating to metabolism. There is also a feature on the website which predicts metabolic pathways in sequenced genomic data, which can help identify certain organisms. (
  4. Is the database content timely?
    • Is there a need in the scientific community for such a database at this time?
      • Sure, a lot of things in medicine like cancer, neurological and metabolic diseases, as well as general concepts such as helping drug target identification.
    • Is the content covered by other databases already?
      • There are a few metabolism databases, including its main competitor KEGG, which is mentioned in the Nucleic Acid Research article. However, some of them are specific to one organism (Human Metabolome Database) or have different databases or features within them. On their website, they compare Metacyc to other common metabolic databases (there's only a few) (
  5. How current is the database?

General utility of the database to the scientific community

  1. Are there links to other databases? Which ones?
    • Since Metacyc is part of the BioCyc(pathway/genome database) database collection, EcoCyc(E coli database), HumanCyc, BsubCyc (Bacillus subtilis database) are all linked. Also, they include database links for different types of databases such as Uniprot, PubChem, NCI Open Database, and a lot more found here ( They also allow for users to perform BLAST searches against some MetaCyc proteins in the Search menu.
  2. Is it convenient to browse the data?
    • Yes it is convenient to browse the data. All the user has to do is search in the search bar by gene, protein, metabolite, or pathway for a basic search, or for example, can choose to perform analysis between different species, genes, proteins, etc. (
  3. Is it convenient to download the data?
    • It is convenient to download the data. Genes with their name and ID can be downloaded in txt file, as well as pathways visualized with Biopax, which is the universal language for pathways or as a PDF file. (
    • However, some of the tools require Biocyc subscriptions.
  4. In what file formats are the data provided?
  5. Evaluate the “user-friendliness” of the database: can a naive user quickly navigate the website and gather useful information?
    • Is the website well-organized?
      • Yes it is well organized. It has a main search bar, then different headings such as Sites, Search, Genome, Metabolism, Analysis and Help.
    • Does it have a help section or tutorial?
    • Are the search options sensible?
      • Yes, the search options are sensible. You can search by different filters such as organism, protein, gene, and pathway.
    • Run a sample query. Do the results make sense?
  6. Access: Is there a license agreement or any restrictions on access to the database?
    • There is no limitation to accessing the database. However, some of the analysis tools require some subscriptions to the BioCyc site.

Summary judgement

  1. Would you direct a colleague unfamiliar with the field to use it?
    • Yes I would direct a colleague unfamiliar with the field to use it. If they needed to study different metabolic pathways or needed to find different enzymes or proteins in metabolism, then yes I would.
  2. Is this a professional or "hobby" database? The "hobby" analogy means that it was that person's hobby to make the database. It could mean that it is limited in scope, done by one or a few persons, and seems amateur.
    • It is definitely a professional database. By using over 60,000 publications, the creators of MetaCyc have made a universal database for all types of organisms and thousands of enzymes, proteins, genes, and pathways, with links to other databases as well.

Scientific Conclusion

MetaCyc is a database that has been around for over 2 decades that contains metabolism, including genes, enzymes, proteins, etc. They have pulled their data from over 60,000 publications which makes them the largest curated collection metabolism database. It doesn't require a subscription (except for some analysis tools), it is very user friendly, and a guide to the site is found on the homepage. They also are not specific to one or several species, but the site spans all three domains of life. This database is useful in studying metabolic pathways in general, metabolic and neurological disorders, and comparing different organisms in terms of their metabolism. It's very efficient also because when you search a metabolic pathway, there are links to other databases for ID numbers or additional information. In conclusion, MetaCyc is a good database for universal knowledge of metabolism and is the largest curated collection for metabolic pathways.


