This DataONE OpenWetWare site contains informal notes for several research projects funded through DataONE. DataONE is a collaboration among many partner organizations, and is funded by the US National Science Foundation (NSF) under a Cooperative Agreement.


Project Description/Abstract

Questions to Consider

  1. How do data citation practices vary across disciplines (biology, ecology, earth sciences, etc.)?
  2. What are barriers to citation of raw data hosted online? Are there ways of getting around these barriers?
  3. How much of the data cited is open access?
  4. Are the datasets represented by DOIs, URIs or other persistent identifiers?
  5. Are the datasets integrated with citation software like BibTeX or use RIS?

Instructions for Adding Resources

  1. Email with the full citation (including DOI or URI if available). Or:
  2. Join OpenWetWare
  3. Edit the page by adding the citation using the <biblio> or <cite> tags as per this page:
  4. You can reference DOIs using OpenWetware as per the following: Referencing a DOI within OWW
  5. Or if there are no reference numbers, just copy and paste the link/citation.
  6. Make sure that the citation is in the appropriate category: Projects for project websites related to data citation, Discussion for whitepapers or presentations related to data citation, Editorials for opinion pieces about data citation, Academic Papers for articles/papers related to data citation, Interesting Findings for any neat find that could be relevant for future reference and/or discussion sections of pending manuscripts.

Thank you.



WebLab: Liu et al. WebLab: a data-centric, knowledge-sharing bioinformatic platform Nucleic Acids Research Advance Access published on July 1, 2009, DOI 10.1093/nar/gkp428. Nucl. Acids Res. 37: W33-W39. (article more or less advertising this tool)

Data Citation Example Policies

Bruce Wilson's "ORNL DAAC Experience With Digital Object Identifiers (DOIs)" presentation Media:DataONE DOI for DC Mgrs 2010-02-22.pptx

Bob Cook, Citations to Published Data Sets, Fluxletter: FLUXNET Newsletter [7]

Ecological Society of America work on barriers and incentives to sharing data:

Fry, J and Lockyer, S and Oppenheim, C and Houghton, J and Rasmussen, B (2009) Identifying benefits arising from the curation and open sharing of research data produced by UK Higher Education and research institutes. Project Report.

Toby Green's OECD white paper 'We Need Publishing Standards for Datasets and Data Tables' li doi:10.1787/603233448430

Australian National Data Service overview of citing data PDF

eBank UK overview of citing data

Hogenaar, Arjan. "Enhancing Scientific Communication through Aggregated Publications Environments." Ariadne Issue 61 October 2009. [8]

John Kunze, "Practical Citation in a World of Evolving Data" 2007 International Workshop on Database Preservation presentation

Kunze, John, Robert Cook, Patricia Cruse, Carol Tenopir, Todd Vision, William K. Michener. "Defining the Data Citation Problem in the DataNet Context" File:Datanet citation agu jak.ppt File:Agu citation abstract 20090903.pdf

Page, Roderic. Blog Entry Semantic Publishing: towards real integration by linking

Pepler, S. and O'Neill, K Preservation intent and collection identifiers CLADDIER Project report II (2008) [9]Project_1/Notes#Pepler

Ruusalepp, Raivo. “A Comparative Study of International Approaches to Enabling the Sharing of Research Data.” EBA Consultancy 30 November 2008. [10]

Schneider, Jeri. "Opportunties and Challenges: Implementing Data Citation Standards" 2006 IASSIST conference presentation

Singh, Deepak. Blog Entry Searching Scientific Literature in the 21st Century

Thorisson, Gudmundur A. "Accreditation and attribution in data sharing." Nature Biotechnology 27, 984 - 985 (2009) doi:10.1038/nbt1109-984b

Van de Sompel, S., Lagoze C., Nelson, M.L., “Adding eScience Assets to the Data Web”, paper presented at the Linked Data on the Web Workshop, 20 April 2009, Madrid [11]


"Data Producers Deserve Credit", Nature Genetics 41, 1045 (2009) doi:10.1038/ng1009-1045 fulltext

"Compete, collaborate, compel", Nature Genetics 39, 931 (2007) doi:10.1038/ng0807-931 fulltext

"Got data?" Nature Neuroscience 10, 931 (2007) doi:10.1038/nn0807-931 fulltext

Costello, M.. 2009. Motivating Online Publication of Data. Bioscience 59, no. 5, (May 1): 418-427. (accessed June 7, 2010). doi:10.1525/bio.2009.59.5.9

Smith, V. "Data publication: towards a database of everything." BMC Research Notes 2009, 2:113 doi: 10.1186/1756-0500-2-113

Whitlock MC, McPeek MA, Rausher MD, Rieseberg L, Moore AJ (2010). Data archiving. Am Nat 175: 145–146. doi:10.1086/650340

NCAS British Atmospheric Data Centre "Data Publication" (2008)

Academic Papers

Altman, Micah and Gary King. 2007. "A Proposed Standard for the Scholarly Citation of Quantitative Data." D-Lib Magazine, Vol. 13, No. 3/4 (March/April),

Anwar, Nadia and Ela Hunt. "Improved data retrieval from TreeBASE via taxonomic and linguistic data enrichment." BMC Evolutionary Biology 2009, 9:93 doi: 10.1186/1471-2148-9-93

Huang, Kenneth Guang-Lih. "Innovation in the life sciences : the impact of intellectual property rights on scientific knowledge diffusion, accumulation and utilization" DSpace@MIT, 2006. (regarding gene patenting and property rights as a potential barrier/negative impact on knowledge dissemination)

Borgman, Christine, Jillian Wallis, and Noel Enyedy. 2007. "Little science confronts the data deluge: habitat ecology, embedded sensor networks, and digital libraries." International Journal on Digital Libraries 7, no. 1/2: 17-30. Library, Information Science & Technology Abstracts, EBSCOhost (accessed July 3, 2010). doi: 10.1007/s00799-007-0022-9

Kansa, Eric C. and Bissell, Ahrash. Web Syndication Approaches for Sharing Primary Data in "Small Science" Domains. Data Science Journal.

Lee, Jae W.; Zhang, Jianting; Zimmerman, Ann S.; Lucia, Angelo (2009). "DataNet: An emerging cyberinfrastructure for sharing, reusing and preserving digital data for scientific discovery and learning." AIChE Journal 55(11): 2757-2764.

Scherle, R., Carrier, S., Greenberg, J., Lapp, H., Thompson, A., Vision, T., & White, H. (2008). Building support for a discipline-based data repository. In Proceedings of the 2008 International Conference on Open Repositories [12]

Smith, Vincent S., Simon D Rycroft, Kehan T Harman, Ben Scott and David Roberts. "Scratchpads: a data-publishing framework to build, share and manage information on the diversity of life." BMC Bioinformatics 2009, 10(Suppl 14):S6 doi:10.1186/1471-2105-10-S14-S6

Stanley, Barbara and Michael Stanley. "Data Sharing: The Primary Researcher's Perspective." Law and Human Behavior, Vol. 12, No. 2 (Jun., 1988), pp. 173-180

Langille MGI, Eisen JA. "BioTorrents: A File Sharing Service for Scientific Data." PLoS ONE 5(4) (2010): e10071. doi:10.1371/journal.pone.0010071

Marshall, D. C. (2009, November). Cryptic failure of partitioned bayesian phylogenetic analyses: Lost in the land of long trees. Syst Biol 59 (1), syp080-117. doi:10.1093/sysbio/syp080

Mazzetti, P.; Nativi, S.; Caron, J.. "RESTful implementation of geospatial services for Earth and Space Science applications." International Journal of Digital Earth, Mar2009 Supplement 1, Vol. 2, p40-61, 22p, 5 Diagrams, 2 Charts; doi: 10.1080/17538940902866153; (AN 37579821)

Noor MAF, Zimmerman KJ, Teeter KC (2006) Data Sharing: How Much Doesn't Get Submitted to GenBank? PLoS Biol 4(7): e228. doi:10.1371/journal.pbio.0040228

Page, R. (2007, May). Tbmap: a taxonomic perspective on the phylogenetic database treebase. BMC Bioinformatics 8 (1), 158+. doi:10.1186/1471-2105-8-158

Piwowar, HA, RS Day and DB Fridsma. "Sharing Detailed Research Data Is Associated with Increased Citation Rate." PLoS ONE 2(3) (2007): e308. doi:10.1371/journal.pone.0000308

Piwowar, HA and WW Chapman. Identifying data sharing in biomedical literature. AMIA Annu Symp Proc. 2008 Nov 6:596-600. PubMed PMID: 18998887; PubMed Central PMCID: PMC2655927 (not about data citations per se, but rather author descriptions of data sharing)

Penev, Lyubomir et al. "Data publication and dissemination of interactive keys under the open access model." Zookeys 21 (2009): 1-17, doi: 10.3897/zookeys.21.274

Schiff, L. R., Van House, N. A., & Butler, M. H. (1997). Understanding complex information environments: A social analysis of watershed planning:

Schwartz F, Fang Y. Citation data analysis on hydrogeology. Journal of the American Society for Information Science & Technology [serial online]. February 15, 2007;58(4):518-525. Available from: Library, Information Science & Technology Abstracts with Full Text, Ipswich, MA. Accessed June 7, 2010. doi:10.1002/asi.20526; (AN 24169223)

Sieber JE, Trumbo BE. (Not) giving credit where credit is due: Citation of data sets. Science and Engineering Ethics. 1995. 1(1) 11-20. doi:10.1007/BF02628694

Zimmerman, Ann. "Data Sharing and Secondary Use of Scientific Data: Experiences of Ecologists"

Interesting Findings

Anecdotal comment on difficulty of data reuse in treebase: Marshall, David. Cryptic Failure of Partitioned Bayesian Phylogenetic Analyses: Lost in the Land of Long Trees. SysBio. 2010. 59(1):108-117. doi:10.1093/sysbio/syp080

  • Direct Quote: "I conducted an informal survey of Internet-accessible files using Google Scholar ( and a keyword set consisting of “partition,” “MrBayes,” “mitochondrial,” “codon,” “phylogeography,” and “TreeBASE.” The latter 2 terms were intended to bias the sample toward studies with larger numbers of taxa and easily accessed data sets. I examined the first 24 studies that fit these criteria; many proved unusable here because the data sets were not accessible, the character sets were not clearly specified in the TreeBASE files (, or the partitioning details (e.g., parameter linking) were not entirely specified."
  • Valerie Enriquez 15:48, 25 June 2010 (EDT): In search for articles citing data from ORNL DAAC, found this article about data citation policy mentioning ORNL DAAC and their policy.
Fisher, Joshua B. and Louise Fortmann. "Governing the data commons: Policy, practice, and the advancement of science." Information & Management Volume 47, Issue 4, May 2010, Pages 237-245 doi:10.1016/
  • Direct quote: "NASA's Distributed Active Archive Center (DAAC) managed the archival and distribution of NASA data through the Oak Ridge National Laboratory (ORNL). Their data citation policy requested that authors include a bibliographic citation to ORNL DAAC rather than to the individual scientists who provided the data.2 The DAAC also included a policy for data producers on quality control for data dissemination (e.g., metadata, defined parameters, and consistency). The constraints on data producers were more detailed than for data users. NASA expected peer review rules to govern the actions of the researchers and accepted negotiated collective-choice arrangements."
  • Valerie Enriquez 11:19, 28 June 2010 (EDT): As per email from Margaret Henty, I have added links to the ANDS (Australian National Data Service) project and DataCite.
  • Valerie Enriquez 10:12, 2 July 2010 (EDT): As per email from Monica Duke, have added more resources including articles and blog entries about the Semantic Web and a white paper on citation, location, preservation and identification recommendations. The pieces on the Semantic Web should prove insightful for my own project, seeing as how there needs to be a greater link between the datasets saved in repositories and the articles that cite them.