User:Sarah Judson/Notebook/DataOne DataCitationPractices:Notebook:10June2010

{| width="800"
 * style="background-color: #cdde95;" align="center"|
 * style="background-color: #cdde95;" align="center"|




 * align="center" style="background-color: #e5edc8;" |

title=Search this Project


 * colspan="2" style="background-color: #F2F2F2;" align="right"|Customize your entry pages 
 * colspan="2"|
 * colspan="2"|
 * colspan="2"|

Data citation practice inventory within journals (articles)
Owner: [Sarah]

What are various practices for data citation within academic papers? How prevalent is each variety? How do these practices vary across discipline, journal, data type, data source? How have these practices varied across time? good broad questions for now, i'm refining more specific questions and how they fit into the broader picture

[edit] Scope and Plan which journals? --> Starting with AmNat, SysBio, MolecularEco. Probably will then move to some of the ESA affiliated journals and a GIS/earth journal (need suggestion). - This will give a broad coverage of subject types (in previously mentioned order: behavioral/model, systematics/phylogeny, genetics, ecology, earth/GIS). Then maybe Evolution, Nature, Science b/c big names in biology, but these are more broad coverage, including the previously mentioned journals. We have some survey results on scientist attitudes and behaviours that might sync up nicely with these results if we choose journals that reflect the scientists' fields. When asked "Which of the following best describes your primary field of concentration within evolutionary biology?" the top results were: Behavior/Neurobiology	 23% Development/Morphology	 21% Ecology	 17% Genetics/Genomics	 14% Molecular evolution	 8% Paleontology	 8% Great! thanks for this list. Is there more data from this survey, I'm interested. I don't know which journals best sync up with these fields? see above. i may need a more specific paleontology journal. which time periods? starting with 2010, moving back. probably annually through 2000 and then every five years before. For now, the first issue(s) or 25-50 journals published that year. Should move to random sampling to eliminate the effects of special topic issues. the 2010 preliminary dataset, though not random, is important for investigation of extracted data and trends within a single journal issue. what data will you extract? Still determining fields. Right now, keywords, article topic, dataset citation (Y/N), how data cited, if data is readily accessible, author reciprocally posting their dataset (y/n, same nested questions as with dataset citation). I have an ever expanding spreadsheet...planning on a more refined database or google doc form soon. how many datapoints do you expect? Many! Lots of articles. Planning on 100+ per journal, assuming we pick focal journals. Especially if I can dedicate my time more to this since Valerie has taken depositories which seemed like it was originally under my domain, and because Nic should be able to answer my journal-based questions with the data he is collecting. what stats will you run? what is your statistical power?still need to think on this. Baseline = % of articles in journal that cite a data set, % that do it properly, % that post also post their data, etc. Beyond that, mostly correlations between data citation or lack thereof vs. journal, time, topic (field of concentration), open access, etc. These are relatively simple but may suffice. I'm interested in a more sophisticated method, but am not familiar with traditional statistics in social sciences. Perhaps some multivariate clustering to establish what parameters determine data citation or not. Open to suggestions, especially to common methods in social sciences and specifically data citation (if there are any)! Statistical power should be good b/c large sample size (many more articles than journals), some issues with "unequal sampling" b/c some journals have fewer publications per year/issue. what do you plan to have complete by June 30th?1. Establish WHAT information is collected from each article, 2. Establish HOW information is collected (expedite manual searching, possibly text searching and database automation), 3. Get through 2010 articles of SysBio, AmNat, MolecularEco, 4. Evaluate continued article sampling (random, time-scale, by topic) plans for integration with other intern work?I made brief comments below about collaboration which I hope to update soon. I think a central database would standardize data collection (i.e. fields, character states). Also, this would allow for ease of analysis because an article (or journal or repository) could be evaluated for journal, repository trends/metadata as well (and vice versa for each of our focal areas).