DataONE:Notebook/Reuse of repository data/2010/06/22

Reuse of Repository Data

Main project page

Previous entry Next entry

Notes for June 22, 2010

Valerie Enriquez 13:45, 22 June 2010 (EDT): In lieu of searching earlier this morning, have been consolidating data from different spreadsheets and generating .csv files from ISI Web of Science to import into new main spreadsheet for upload matching Sarah's spreadsheet.
Conference call today will further provide direction
Valerie Enriquez 17:45, 22 June 2010 (EDT): As per phone conference and conversation with Heather, my report and preliminary presentation at the July meeting will now focus on the degree of difficulty in finding citations for reused data for TreeBASE, Pangaea and ORNL DAAC.
Valerie Enriquez 19:25, 22 June 2010 (EDT): Was unable to directly upload new spreadsheet to google docs, as exceeded 1MB limit. May try hosting document openly on Dropbox if this does not work.

Abstract Brainstorming

Valerie Enriquez 20:08, 22 June 2010 (EDT): While it is becoming easier to track the citation of articles in publication thanks to unique identifiers such as DOIs and tracking tools such as Scopus and ISI Web of Science Cited Reference Search, it is still difficult to find data cited. At the conference call today, Joan Starr mentioned that in Europe, the metadata tag "publisher" can also refer to a data repository. Why is this not the case in the US?

Findings so far: While repositories have unique identifiers such as TreeBASE's study accession number, it is rare that researchers cite data using the study accession number. However, in the case of Pangaea, one can find articles citing the data including the dataset's DOI. Also, while the repository may sometimes referenced by name (TreeBASE, Pangaea or ORNL DAAC) in the text of the article, it is still much more likely to find a citation by searching for the author's name or the citation of the article or study that the dataset supplements. While mentions of these databases can be rare, mention of data reused from these databases is even rarer, with many search results including citations for articles whose investigators had deposited their findings into a repository like TreeBASE, Pangaea or ORNL DAAC as opposed to researchers who had found and reused this data.

What must be done: I will need to go back through my journal and original searches to find the exact numbers of true hits and false drops, record them into a new spreadsheet and calculate their percentages.

Publication Brainstorming

Valerie Enriquez 19:39, 22 June 2010 (EDT): As per conversation with Heather, am brainstorming some potential journals to whom I should submit my article. I have conducted an initial search through LISTA (Library, Information Science & Technology Abstracts with Full Text) for articles pertaining to data citation or data reuse in the library/information science literature. Based on this (and a follow-up search pertaining to "citation analysis"), I have found:
- Collection Management This journal recently ran an article titled "The Use of Web of Knowledge to Study Publishing and Citation Use for Local Researchers at the Campus Level" doi:10.1080/01462671003597959, in which the authors used ISI Web of Science to seek and identify periodical literature citing local researchers.
- Information Services & Use Author Guidelines International Journal focusing on information technology, particularly applications to business and scientific fields.
- Informing Science Quote from their about page: "The academically peer refereed journal Informing Science endeavors to provide an understanding of the complexities in informing clientele. Fields from information systems, library science, journalism in all its forms to education all contribute to this science. These fields, which developed independently and have been researched in separate disciplines, are evolving to form a new transdiscipline, Informing Science. Informing Science publishes articles that provide insights into the nature, function and design of systems that inform clients. Authors may use epistemologies from engineering, computer science, education, psychology, business, anthropology, and such. The ideal paper will serve to inform fellow researchers, perhaps from other fields, of contributions to this area."
- Journal of the American Society for Information Science & Technology Quoted from their page: "The Journal welcomes rigorous work of an empirical, experimental, ethnographic, conceptual, historical, socio-technical, policy-analytic, or critical-theoretical nature. JASIST also commissions in-depth review articles (Advances in Information Science) and reviews of print and other media." I find this relevant to my interests.
- Journal of Information Science Quoted from their about page: "The Journal of Information Science is an international journal of high repute covering topics of interest to all those researching and working in the sciences of information and knowledge management. The Editors welcome material on any aspect of information science theory, policy, application or practice that will advance thinking in the field."
- Library Technology Reports As a publication of the American Library Association, this could reach a wide audience of librarians interested in born digital holdings or technological changes in scientific research.
- Scientometrics Even if my study turns out to not be highly quantitative, this may be useful for Nic and Sarah to consider.

Transcript of Conversation with Heather

[16:51] Heather Piwowar: Hi Valerie!

[16:51] Valerie Enriquez: Hi

[16:51] Valerie Enriquez: what's going on?

[16:51] Heather Piwowar: Had a thought about next steps for your research....

[16:51] Heather Piwowar: left a blurb on your talk page

[16:51] Valerie Enriquez: yeah, I could use a bit of pointing in the analysis department

[16:51] Valerie Enriquez: thanks

[16:51] Heather Piwowar: want to chat to flush it out and see what you think?

[16:52] Heather Piwowar: I've got 10 minutes now, or we could chat tomorrow morning early?

[16:52] Valerie Enriquez: (if I randomly sign off, that means my computer has once more overheated despite it not being that ot here)

[16:52] Valerie Enriquez: sure

[16:52] Valerie Enriquez: I could talk now or tomorrow

[16:52] Heather Piwowar: no prob. my wireless randomly drops, I understand

[16:52] Heather Piwowar: ok let's do a bit now and then more tomorrow

[16:52] Valerie Enriquez: ok, sounds good

[16:52] Valerie Enriquez: right before the meeting tomorrow maybe?

[16:53] Heather Piwowar: oh, hrm, do we have a meeting tomorrow?

[16:53] Valerie Enriquez: oh wait

[16:53] Heather Piwowar: we did last week that is true

[16:53] Heather Piwowar: I wasn't thinking we would tomorrow though.

[16:53] Valerie Enriquez: did we agree that we're not having a meeting?

[16:53] Valerie Enriquez: oh yeah, sorry

[16:54] Heather Piwowar: I'll send an email to confirm: no meeting.

[16:54] Valerie Enriquez: either way, a bit now, a bit tomorrow

[16:54] Valerie Enriquez: cool

[16:54] Heather Piwowar: ok, here's the blurb:

[16:54] Heather Piwowar: Valerie, just talked to Todd and we had a thought. It seems like generating effective searches for reuse is really difficult. What makes a search effective varies by repository due to repository names, support for dois, etc. Evidence of the difficulties would be very useful/motivating for initiatives like datacite, and interesting to all people who submit data. As such, describing the difficulties in formulating effective searches for reuse, using three repositories as examples, would make a great publication in and of itself. Maybe a research article, or a perspectives piece, or ??? And insights from the write-up could inform how to proceed for the last few weeks of your internship. See what you think, consider some potential publishing venues that might be appropriate for a case study like this, and let's chat sometime on Wednesday about it?

[16:54] Valerie Enriquez: I like that idea

[16:54] Heather Piwowar: cool.

[16:54] Valerie Enriquez: mainly because what I've found has been spotty

[16:54] Heather Piwowar: want to say it back to me to make sure we are thinking the same thing?

[16:54] Heather Piwowar: :)

[16:55] Valerie Enriquez: ok, basically, a writeup about the various search methods I've used and a commentary on how difficult it is to find citations for reused data

[16:55] Heather Piwowar: yup. that.

[16:55] Heather Piwowar: not sure how quantitative.

[16:55] Valerie Enriquez: that sounds more up to my speed

[16:55] Heather Piwowar: probably at least a little bit quantitaive

[16:55] Valerie Enriquez: ah, yeah, I had questions about how I'd measure that

[16:55] Heather Piwowar: along the lines of what you've found

[16:55] Valerie Enriquez: would I use R?

[16:55] Heather Piwowar: nah, not necessarily.

[16:56] Valerie Enriquez: ok

[16:56] Heather Piwowar: so I'd imagine that as a perspective piece it would mostly be case study approach....

[16:56] Heather Piwowar: so saying "if I do this search here, 60% of my hits are for data creation"

[16:56] Heather Piwowar: etc

[16:56] Valerie Enriquez: ah

[16:57] Heather Piwowar: whereas if I try that search that worked with treebase for pangaea, all I get it hits for supercontinents.

[16:57] Valerie Enriquez: my main worry was that my searches turned out to bee too specific to provide a useful sample size of articles

[16:57] Valerie Enriquez: ah, yeah

[16:57] Heather Piwowar: or "dois work well for xxxx, but aren;t supprted in searching for reuse out of repositories A, B, C"

[16:57] Heather Piwowar: yeah, agreed.

[16:58] Valerie Enriquez: I have 75 articles so far

[16:58] Heather Piwowar: so instead if we decide the weakness is a strength, and figure out how to tell the story of how difficult it is, and what gotchas you run into...

[16:58] Valerie Enriquez: cool

[16:58] Heather Piwowar: yeah, not many.

[16:58] Heather Piwowar: so maybe we make the story about the searching, instead.

[16:59] Heather Piwowar: with some numbers, to quantify the relative difficulty of getting hits, but mostly not.

[16:59] Heather Piwowar: is the scope of journals that you read, is there a natural home for this sort of article?

[17:00] Valerie Enriquez: well, the ISI searches went across a whole bunch of different journals

[17:00] Valerie Enriquez: I'm going through again to check whether or not they're open source

[17:00] Heather Piwowar: .... hmm, let me take back the previous "mostly not." The article should be quantitative enough to make its point and provide useful evidence.

[17:00] Valerie Enriquez: ok

[17:01] Valerie Enriquez: like mentioning that one search brought up 1000 results

[17:01] Heather Piwowar: can you think of places you'd want to publish an article like this?

[17:01] Heather Piwowar: (since thinking about the first-choice submission place always helps to focus the tone/length/etc)

[17:02] Valerie Enriquez: hm. it might be useful in a library publication relating to reference (particularly scientific reference), but I'd have to look further into that

[17:02] Valerie Enriquez: ok

[17:02] Heather Piwowar: yes. I'll brainstorm too.

[17:02] Heather Piwowar: maybe put a place on the wiki where you are collecting ideas, and I'll add to it and then others can too.

[17:02] Valerie Enriquez: ok, I could put a comment box on the main page

[17:03] Valerie Enriquez: (er, the main page of my notebook)

[17:03] Heather Piwowar: so the idea would be that you'd maybe wrap up exploring search alternatives (especially DOIs) for DAAC

[17:03] Heather Piwowar: then outline and write this piece, roughtly, maybe in time for the July meeting.

[17:03] Heather Piwowar: what do you think, woudl that be doable?

[17:03] Valerie Enriquez: Definitely

[17:03] Heather Piwowar: then discussion about that woudl lead to where to focus for the last few weeks of your internship.

[17:04] Valerie Enriquez: excellent

[17:04] Heather Piwowar: on the 75 articles, or for example on following the reuse patterns of a few select datasets, or ???

[17:04] Heather Piwowar: cool.

[17:04] Heather Piwowar: well I don't know your usual manuscript-drafting approach,

[17:04] Heather Piwowar: but if you usually write an abstract first, or outline first, or something

[17:05] Valerie Enriquez: I've been brainstorming abstract ideas in my past couple of journal entries

[17:05] Heather Piwowar: then we can give you early feedback.

[17:05] Valerie Enriquez: but thanks to todays meeting, I have more of a direction now

[17:05] Heather Piwowar: yes, I saw that. good stuff.

[17:05] Heather Piwowar: good.

[17:06] Heather Piwowar: ok, super. then I'll keep an eye on your pages, and will give a shout-out to Todd and others when you have a direction gelling...

[17:06] Heather Piwowar: jelling? I have no idea.

[17:06] Heather Piwowar: anyway.

[17:06] Valerie Enriquez: excellent, thanks!

[17:06] Heather Piwowar: good.

[17:07] Heather Piwowar: let me know if/when you get stuck or want someone to bounce ideas off of for direction/focus/venue/whatever.

[17:07] Valerie Enriquez: ok, I will

[17:07] Heather Piwowar: I'll be online a bit less at the end of the week, but still available even if via email.

[17:07] Heather Piwowar: bye!

[17:07] Valerie Enriquez: bye!

DataONE:Notebook/Reuse of repository data/2010/06/22

Notes for June 22, 2010

Abstract Brainstorming

Publication Brainstorming

Transcript of Conversation with Heather

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools