DataONE:meeting notes:10 June 2010 email

project plan updates

11 messages

Heather Piwowar 	 Thu, Jun 10, 2010 at 6:55 AM Reply-To: hpiwowar@gmail.com To: Nicholas Weber , Sarah Walker Judson , valerie.enriquez@simmons.edu Cc: Todd Vision  Hi guys,

1. I see your updates on the research plan page, good stuff. I'm experimenting with commenting on the associated talk page, here: http://www.openwetware.org/wiki/Talk:DataONE/Summer_2010/Research_questions

Feel free to respond there, and we'll see how it works?

2. If you've got lists and spreadsheets and data and stuff, I think the best approach might be to put them in Google docs, make the doc public and publicly-editable (I think publicly editable... maybe not.... hrm), and include a link to the google doc on the wiki

3. I've been posting tips about how to use the wiki easily on our main project page. Keep an eye on it, and add your tips too... or questions about things you don't know how to do yet so we can all figure it out... http://www.openwetware.org/wiki/DataONE

4. Sarah, I know you are busy this morning, but Nic I'm around this morning if you want to chat a bit in real time about your research plan? I should appear as "Available" in google chat, so feel free to invite me to chat whenever that is true.

5. Let's set up some standing meetings on Monday mornings. Meetings = google group chat? Maybe 9am pacific time, would that work for everyone? I'm guessing the projects overlap enough that it makes sense to talk all together, but we could also have individual chats about the projects if that would help.

ok, that's it for now, off to chat with Valerie and get her up to speed and then I'll make some more comments on the research plans......

Heather

Sarah Walker Judson 	 Thu, Jun 10, 2010 at 8:11 AM To: hpiwowar@gmail.com Cc: Nicholas Weber , valerie.enriquez@simmons.edu, Todd Vision  Before I get into my responses to Heather's points, I'd like to ask, is everyone ok if I post original emails/chats on the OWW? I think it would be a useful storage place, especially do log emails/chats that may have just been between two people. I've done this for my chat and the last bulleted email Heather sent. See: http://openwetware.org/wiki/DataONE:Notebook/Summer_2010#Correspondence. I'm ok with my stuff up there and know Heather is as well, but I'd like formal consent from the rest. I also made "rules" on commenting on the transcripts, let me know what you think.

1. How do you use talk on OWW? I'm new to wikis. I can "edit" the talk page, but I like how with Heather it logged which user was commenting. I see her tip for leaving a timestamp and name on the wiki tips. The alternative (or additional) thing we could do is to put our initials by any comment we make on the page itself (the timestamp is probably better but may clutter the page), that way it is right by the question/comment it is related to. I did this with my first name on the brief comments I made in Valerie and Nics' sections to indicate it was me.

In reading through Heather's comments, I'm wondering if anyone has better database experience than me and if anyone knows about an open-source online database that we could all edit. I currently use Access and Filemaker. I have Instant Web Publishing with Filemaker, but it requires that the database be continually open for it to work and I don't have a server computer for that. Access requires translation to SQL for web serving which I'm clueless about and that would also require it to be a more final version before it is posted, rather than openly editable by all parties. I'm a big advocate of databases and don't think it would be that hard to whip up a simple one for our needs. I'm willing to do it and could have a prototype out quickly, but am wondering if any of our more IT inclined folks have more experience or knowledge of an open source option.

2. I'll get my spreadsheet up soon. The internet at my hotel STINKS and bails every 1/2 hour or more, so I've had to operate off my desktop. I found it useful to look at Nic's from his notebook page (which brings me to an unrelated point...I thought Nic was looking at JOURNAL level metadata, not the funding body of each article. Wouldn't it be better for me to make note of the funding body since I'm operating on the article level? I hadn't considered this in my searches, but could at add it to my list of fields easily and that way it could be more directly coorelated with the same articles I am using. Please let me know ASAP as I plan to work on a lot of articles today and don't want Nic to have to keep doing this if it is repetitive.)

3. Thanks for the wiki help. Timestamp tip especially.

4. I'm available today until 9am PT. I see Heather on, but busy. I'll put myself as available until I have to go.

5. Monday 9am is good time for me, but so is just about any morning. My only other suggestion would be midweek so we can evaluate our progress in the week but still have time to implement suggestions in the remainder of the week....at least I have trouble thinking about that on Monday with the weekend erasing my memory of my failures/questions the previous week. But I can adapt and make better notes on Friday so I can still benefit from a monday meeting.

Sincerely, Sarah Walker Judson

Heather Piwowar 	 Thu, Jun 10, 2010 at 8:40 AM Reply-To: hpiwowar@gmail.com To: Sarah Walker Judson  Cc: Nicholas Weber , valerie.enriquez@simmons.edu, Todd Vision  Sarah, on the ASAP front, yes please do extract funder data while you are in the articles right now, that would be helpful for Nic....

Heather

Sarah Walker Judson 	 Thu, Jun 10, 2010 at 8:46 AM To: hpiwowar@gmail.com Cc: Nicholas Weber , valerie.enriquez@simmons.edu, Todd Vision  Will do. I also think that it shouldn't just be the specific funding source like Nic had..that's good, but it should also be a more general category like NSF, private, or internal (university), etc. so that we can do correlations about type of funding which isn't possible with the very specific funding information alone.

Sincerely, Sarah Walker Judson [Quoted text hidden] Nicholas Weber 	 Thu, Jun 10, 2010 at 8:51 AM To: Sarah Walker Judson  Agreed, good point Sarah

Heather Piwowar <hpiwowar@gmail.com>	 Thu, Jun 10, 2010 at 9:08 AM Reply-To: hpiwowar@gmail.com To: Sarah Walker Judson <walker.sarah.3@gmail.com> Cc: Nicholas Weber <nmweber@illinois.edu>, valerie.enriquez@simmons.edu, Todd Vision <tjv@bio.unc.edu> Yes. Guessing there will be other related things to extract too as part of Nic's project, related to data sharing as well as reuse....

Feeling like lots of overlap in these projects. I think once we get our project plans out there and spreadsheets shared with each other we can have a look at how to collect all of our stuff most efficiently.

ok, will go reread all everybody's comments. Great stuff, all this work and discussion this morning!

Heather

Heather Piwowar <hpiwowar@gmail.com>	 Thu, Jun 10, 2010 at 10:34 AM Reply-To: hpiwowar@gmail.com To: Sarah Walker Judson <walker.sarah.3@gmail.com> Cc: Nicholas Weber <nmweber@illinois.edu>, valerie.enriquez@simmons.edu, Todd Vision <tjv@bio.unc.edu> Sarah, very useful points. I'll expand on a few of them here.

Nic and Valerie, do you give permission for us to post our chat and email correspondence on the open web? You can always request to be "off the record" for any given conversation or part of conversation, but in general do you agree that we can make our conversations public? Useful to have explicit permission :)

I'm new to wikis too. I think editing the "talk" page is one good way to communicate. I think the way Todd used his name and timestamp on this page is a helpful model too, for comments in context: http://www.openwetware.org/wiki/DataONE/Summer_2010/Research_questions

Good question about the open access/cloud computing database options. I don't know of one offhand. I'm a big advocate of databases too, in general, and will poke around for a solution.

Sarah, what do you envision using a database for, in this project? Or to ask a different way, what will you want a database to do that a spreadsheet wouldn't, for this project? Data normalization? Data summarization? ?? I ask because it will help us know what to work towards.

One easy open idea is to have multiple tabs in a Google spreadsheet, one table per tab. This would facilitate easy data entry and viewing. There are also easy ways to query google spreadsheets one table at a time to extract a subset of rows and do data summarization (http://blog.ouseful.info/2009/05/18/using-google-spreadsheets-as-a-databace-with-the-google-visualisation-api-query-language/)

Merging and querying across tables (the nuts and bolts of databases, I know) is harder. There are several alternatives for that. I'm guessing we might not have just one solution, since you might prefer an Access interface whereas another student might only be familiar with Excel.... so we could make an Access framework that pulled data from the google spreadsheets dynamically, and also create an unnormalized, huge, merged table of everything for people to view in Excel (and use the kick-butt Pivot tables there) (does anyone not know about Pivot tables? If you don't, tell me, and I'll show you how helpful they can be).

Anyway... my suggestion at this point is to do data input in Google Spreadsheets, one tab (or one spreadsheet) per table, and we can always import from there to some other solution for merging and visualizing when we know what our data and needs are better.

Finally, good point about the benefits of mid-week meetings. How about for NEXT week we have a chat on BOTH Monday at 9am pacific AND Wednesday at 9am pacific because we have lots to talk about, and maybe after that we just do Wednesdays.

Does that work for everyone?

Heather

valerie.enriquez@simmons.edu <valerie.enriquez@simmons.edu>	 Thu, Jun 10, 2010 at 10:57 AM To: hpiwowar@gmail.com Cc: Sarah Walker Judson <walker.sarah.3@gmail.com>, Nicholas Weber <nmweber@illinois.edu>, valerie.enriquez@simmons.edu, Todd Vision <tjv@bio.unc.edu> Hi Heather,

I give my permission to post our chat/email correspondence on the open web.

Also, thanks for the timestamp tip.

I can make both Monday and Wednesday meetings at 9am PST (noon my time, unless I'm counting wrong).

Talk to you all soon.

Valerie

Nicholas Weber <nmweber@illinois.edu>	 Thu, Jun 10, 2010 at 11:02 AM To: hpiwowar@gmail.com Cc: Sarah Walker Judson <walker.sarah.3@gmail.com>, valerie.enriquez@simmons.edu, Todd Vision <tjv@bio.unc.edu> I think posting emails and conversations to the OWW is extremely useful and I have no problem with anything being put into that space. I was a little unsure how to nest a page for including conversations from gchat. Also, is there any specific place we should start posting those as opposed to email correspondence? Maybe we should start a new page called "Conversations" or "Communication" ?

Sarah, I poked around for free OA databases a bit this morning but I didn't have much luck finding anything that seemed reliable or exportable.

I think mid-week meetings seem like a very good idea, and next week having a Monday chat would be nice as well. This Wednesday (and this Wednesday only) I'll be tied up teaching most of the day, but I have a window of time from 12-1:15 free....so if we could push the chat back one hour I could make it?!

Sarah, I am going to post some of the fields for journal metadata in a googledoc spreadsheet to OWW in the next 1/2 hour -- I'll email you as well to make sure we're capturing everything we think might be useful.

Nic

Sarah Walker Judson <walker.sarah.3@gmail.com>	 Fri, Jun 11, 2010 at 7:20 AM To: hpiwowar@gmail.com Cc: Nicholas Weber <nmweber@illinois.edu>, valerie.enriquez@simmons.edu, Todd Vision <tjv@bio.unc.edu> I'm good for both meeting times.

And, in terms of database, Access is nice b/c it easily integrates with Excel. I think it's fine if we collect the data in a google doc, but then we use Access (or otherwise) to connect it for analysis. One of our main topics on Mon/Wed should be necessary fields (extracted data) to make sure we're covering everything, at what level we collect it (detailed or coded...I think both for retracing purposes), and if/how we host a database or just pass it around through email (which is doable, just clunky).

Sincerely, Sarah Walker Judson

Sarah Walker Judson <walker.sarah.3@gmail.com>	 Fri, Jun 11, 2010 at 7:25 AM To: Nicholas Weber <nmweber@illinois.edu> Cc: hpiwowar@gmail.com, valerie.enriquez@simmons.edu, Todd Vision <tjv@bio.unc.edu> Looked at your spreadsheets briefly this morning. I have some suggestions, but will make a specific list of fields by the time we "meet" next. Generally speaking, I think its good that we collect the original text (i.e. in your policy sheet which has large chunks of text), but I think each of those fields should be accompanied by a coded categorical or quantitative field, otherwise they aren't useful for statistics. There are different pros and cons to coding the data during or post data collection...we should discuss this on Monday. I think the main pro for coding it right away as you collect the data is that you're already reading through it. But on the other hand, sometimes its better to code at the end b/c then you know the full range of variation. I'm still trying to decide the best approach myself.

Sincerely, Sarah Walker Judson

I'll take care of posting our email correspondence.. I've figured out an easy way to post it and I'll probably update it every few days.

I've been keeping it here: http://www.openwetware.org/wiki/DataONE:Notebook/Summer_2010#Correspondence. If we'd like it more accessible, let me know.

Sincerely, Sarah Walker Judson