Difference between revisions of "User:Andy Maloney/Open Science"

From OpenWetWare
(What is open science?)
Line 20: Line 20:
<br style="clear:both;"/>
<br style="clear:both;"/>
==What is open science?==
==What is open science?==
I cannot think of a better way to describe it than what [http://www.nd.edu/~gezelter/Main/index.html David Gezelter], Associate Professor at University of Notre Dame, said about it in a blog post. He defined it as four simple ideas.
I cannot think of a better way to describe it than what [http://www.nd.edu/~gezelter/Main/index.html David Gezelter], Associate Professor at University of Notre Dame, said about it in a [http://www.openscience.org/blog/?p=269 blog post]. He defined it as four simple ideas.
# Transparency in experimental methodology, observation, and collection of data.
# Transparency in experimental methodology, observation, and collection of data.
# Public availability and reusability of scientific data.
# Public availability and reusability of scientific data.

Revision as of 12:46, 22 March 2011


This page describes my experience with open science.

This is the fourth chapter in my completely open notebook science dissertation. If you would like to post questions, comments, or concerns, please join the wiki and post comments to the talk page. If you do not want to join the wiki and would still like to comment, feel free to email me by using the provided link below.


Figure 1: Open access logo.

This chapter will talk about my experience with the growing movement that is pushing science in the open. I will not discuss the pros or cons about open science in this chapter as every one will have their own opinion on the subject. I will only briefly outline below some success stories I have encountered with using open science and some of the pitfalls that I have encountered.

What is open science?

I cannot think of a better way to describe it than what David Gezelter, Associate Professor at University of Notre Dame, said about it in a blog post. He defined it as four simple ideas.

  1. Transparency in experimental methodology, observation, and collection of data.
  2. Public availability and reusability of scientific data.
  3. Public accessibility and transparency of scientific communication.
  4. Using web-based tools to facilitate scientific collaboration.

The transparency in methodology, observation, and collection of data using web-based tools is easily accomplished using services similar to OpenWetWare, which is a provider of open notebooks. Open notebooks are web based notebooks that are completely open to the public for viewing. Open notebooks are just like paper notebooks in that you write in your notebook what you have done in in the lab. The only difference is that in a web based open notebook, you can embed videos, pictures, and links very easily. You can embed images and links in a paper based notebook via the tape it in the page method, however, you would have to make a flip book in order to embed a movie in a traditional notebook. I will discuss below my experience using an open notebook.

Public availability and accessibility of scientific data is a bit more complicated because there does not exist a standard for the dissemination of data. Nor is there a repository for collecting the data. I will discuss some of the advances made here at the University of New Mexico in an attempt to create a repository and other web based services that have taken the initiative to also start disseminating scientific data.

There are some researchers who would argue that putting data in the open will get you "scooped". I have heard many of these arguments but I have yet to hear one that can counter this example given by Dr. Rob Olendorf who is a collaborator and a library scientist here at UNM. Rob likens open science to red winged black birds that he studied at artificial ponds (Olendorf 2004). The birds will setup a community with mates and territories. If a bird neighboring another bird decides to cheat on its mate with the neighbor, it will oddly enough irritate the entire community and keep other male neighbors from cooperative nest defense. The same thing can be said about the open science community. If an open scientist discovers that data published openly is being misused, it is sure to cause a scene in the community. The scene will more than likely be caused from how open scientists communicate, which is through online forums. A perfect example of this is the response generated when the online community discovered that some journals are willing to charge you extra for "expedited" review. This feed led to an openly editable protest letter against the "fast-track" fees.

Open science experience

I feel that open notebooks or even electronic notebooks in general are preferable to paper notebooks in that they are accessible from anywhere there is internet access. Private wiki based notebooks are available that maintain a level of security to projects if an open based notebooks is not an option. See the project by Galois for an example. This ability to access information done in a lab from anywhere is very beneficial and has aided my research quite a bit. Since my notebook is open, a simple Google search reliably returns information that I put in my notebook with simple search terms.

The use of an open notebook does come at its own cost, however. Since this is web based technology; servers can crash, someone can forget to pay the bill, or data can be inadvertently lost if not redundantly backed up. Thankfully I have not experienced a server crash but, I have experienced other infuriating issues with this technology. One case is when the browser crashes. If one does not continuously save pages written in the wiki, then they can be lost due to a browser or system crash. This is not a problem when using a paper based notebook.

I cannot discuss open notebooks without discussing the myriad of other web based technologies used in conjunction with the notebooks. These things include BenchFly, YouTube, Google, Scribd, Instructables, and many others. This many others is actually problematic as there is no standard in how scientific information is disseminated using these types of services. Some services such as Flavors attempt to bring all online content that users make into one single space. Unfortunately no such service exists for scientists that are doing open notebook science which means one is left with trying to coerce the available web based applications to do what the scientist needs. This is just indicative how young the area of web based open science is and I hope it changes in the future.

The storage and user readability of data from experiments can be a very complicated subject. Every experimenter uses short hand and abbreviations in their experiments. Those notes may be easy for the experimenter to read but, they are basically gibberish to someone that is looking at the data for the first time. Dr. Koch and I have been collaborating with Rob Olendorf in order to see if we can use the library system here to store scientific data at an institutional level. Rob is programming an automated XML tagging system that will allow the raw data I take (mostly images) to be tagged with user readability in mind. The tagging is similar to the meta data used for mp3s. We are basically pushing the limits of storage and usefulness of scientific data at UNM. Since one of my experiments can produce 1 TB of image data, the storage and the transferring of data become a big obstacle.

Not at the institutional level are other web based applications that can disseminate data. I have spoken to Alan B. Marnett PhD, founder of BenchFly about the possibility of using his service as a way to host my image data. Hosting image data on BenchFly is a very natural evolution to the site's purpose and Alan was excited about the possibility. Unfortunately the cost of uploading greater than 1 TB of data has been some issue and we are currently in the process of finding a solution with Alan about uploading.

I will not discuss the advantages or disadvantages of using an institutional data hosting service compared to a cloud based one, i.e. BenchFly. I believe that both are essential to the dissemination of data because one is designed to be archival (the libraries), while the other is designed to be easily navigable by users.

Before speaking to Alan from BenchFly, I uploaded videos of data to YouTube. Doing a simple search on Google using the key terms "gliding motility assay" will bring up several movies showing data I took. That data led to Dr. William Saxton and Dr. Josh Deutsch both professors at UCSC to ask Dr. Koch if they could obtain the data in the movies. Of course I was extremely happy to give the data to them. The gliding motility assay data I took was designed for a specific purpose, trackability. This was so that I could take speed measurements discussed in Chapter 2 and microtubules that exhibited motion that was circular, were not tracked for my purposes. The data did have some microtubules that did exhibit this circular motion however. It turns out that the circular motion is what Dr. Saxton and Dr. Deutsch were after. Stuck microtubules have been shown to mix the insides of fly eggs (Serbus 2005) and Dr. Deutsch, his student M. Brunner and Dr. Saxton (Deutsch 2011) came up with a spectacular model describing this motion. The data I took served as an in vitro check to the model. The use of the data I took is well beyond anything I or Dr. Koch could have imagined since before our interactions with Dr. Deutsch and Dr. Saxton, we had no idea this area of research existed. Because of the open data, the group at UCSC was able to use it in their paper.

Not only did they use it in their paper but, Chapter 1 of this thesis was used to get Dr. Saxton's group started with gliding motility assays. Gliding motility assays are not easy and I have seen them fail for inexperienced researchers almost every time they attempted it. Dr. Saxton's student, Corey Monteith, read my rough (and I mean really rough) draft of Chapter 1 and was able to get the assay working in their group. Having a common language helped as well and I was very glad to speak to Corey using the same language outlined in Chapter 1. This is a major advancement for their group because what took me nearly a year to perfect, Corey was able to do in two weeks.


In science we talk a lot about impact. Unfortunately the way we define impact now is based on ideas that I will not discuss here. For a discussion about impact factor by a major player in the open science community, Dr. Cameron Neylon, see this blog entry.

Impact of open science does not have the same metrics as more traditional science which, are not truly defined yet. For instance, I personally have zero impact in science using the old metrics. I have spent years as a scientist and using the old metrics, I have nothing to show for it. If the traditional definition of impact carried imaginary numbers, I probably would have a negative or imaginary number associated to my name.

But, I have contributed to open science. Some of that contribution comes in the form of simple 3D renderings of optomechanics I made using a freely available program from Google called SketchUp. This allowed me to build models of very complex optical systems very easily as can be seen in Movie 1.


<object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/EZvATD7VZHU&hl=en_US&fs=1&"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/EZvATD7VZHU&hl=en_US&fs=1&" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object> </html>

Movie 1: Movie of the Fluorescent Schlieren Microscope I built in Jim Thomas's lab.

The SketchUp community with no influence of my own decided to make an optomechanics section in the 3D warehouse with the models that I uploaded. This may seem trite to a traditional scientist but, a simple search for optomechanics in the warehouse produces a lot of models that can be downloaded freely in order to design optomechanical designs. Unfortunately, this has zero impact.

Laboratory research necessitates that the researcher be a maker. A maker is someone that makes something, be it a pie or an optical tweezers. A vast community of makers exists that most scientists do not know about. I have recently posted things to the maker community via posts to Instructables. Those posts include how to build an objective heater, a laser shutter, a hot plate/stirrer and the assembly of a diode laser. Conducting Google searches using objective heater or laser shutter consistently place my posts in the top 10 results found by Google. Again, unfortunately this has zero impact in science even though a researcher can look at those posts and be able to reproduce them.


The last item I did not discuss from Dr. Gezelter's statements about open science is communication. In reality, closed or open, the only way to push science forward is to communicate in my opinion. The form of communication can vary as it may take the form of talking to a person not on the project (but has security clearance) or, posting scientific musings and questions to online forums. Open scientists may have it easier than closed scientists that do non-clearance required science since there exists forums to communicate openly in.

Open science has its problems when it comes to where or how to post data and information. This pitfall will ultimately work itself out if there are enough scientists like Rob Olendorf and Alan Marnett who are interested in making scientific data archival and available. Along with these pitfalls, impact is a problem since we do not have a metric for open science. Even with its current problems, open science is evolving and will not disappear. Especially with success stories such as the ones outlined above.