User:Andy Maloney/Open Science

Purpose
This page describes my experience with open science.

This is the fourth chapter in my completely open notebook science dissertation. If you would like to post questions, comments, or concerns, please join the wiki and post comments to the talk page. If you do not want to join the wiki and would still like to comment, feel free to email me by using the provided link below.


 * Click here to join the wiki.
 * Click here to email me.
 * Click here to post general comments about the open dissertation.
 * Click here to post comments to this chapter's talk page.

A pdf version of the Introduction can be. This is the final snapshot of the dissertation that has been approved by my committee. This file will inevitably become out of sync with the wiki pages.

A zip folder containing the LaTeX code can be down loaded here. 

Acknowledgements
I would like to thank Dr. Haiqing Liu (while in the lab of Dr. Gabriel A. Montano) for supplying kinesin to our lab. I would also like to thank Dr. Susan Atlas and the support from DTRA CB Basic Research Program under Grant No. HDTRA1-09-1-008 and the UNM IGERT on Integrating Nanotechnology with Cell Biology and Neuroscience NSF Grant DGE-0549500. Finally, I'd like to thank Dr. Erik Schaeffer for his discussions on temperature stabilization. 

Introduction
This chapter will talk about my experience with the growing movement that is pushing science in the open. I will not discuss the pros or cons about open science in this chapter as every one will have their own opinion on the subject. I will only briefly outline below some success stories I have encountered with using open science and some of the pitfalls that I have encountered using it. 

What is open science?
I cannot think of a better way to describe what open science is than what Dr. David Gezelter, Associate Professor at University of Notre Dame, said about it in a blog post. He defined it as four simple ideas.
 * 1) Transparency in experimental methodology, observation, and collection of data.
 * 2) Public availability and reusability of scientific data.
 * 3) Public accessibility and transparency of scientific communication.
 * 4) Using web-based tools to facilitate scientific collaboration.

The transparency in methodology, observation, and collection of data using web-based tools is easily accomplished using services similar to OpenWetWare, which is a provider of open notebooks. Open notebooks are web based notebooks that are completely open to the public for viewing. They are just like paper notebooks in that you write in your notebook what you have done in the lab except that in a web based open notebook, you can embed videos, pictures, and links very easily. You can embed images and links in a paper based notebook via the tape it in the page method, however, you would have to make a flip book in order to embed a movie in it.

There are many different formats for doing open science that facilitate scientific collaboration and one such method is through open notebook science. Dr. Jean-Claude Bradley from Drexel University coined the term open notebook science and has setup a challenge with Dr. Andy Lang from Oral Roberts University to have scientists measure solubilities of chemicals in organic solvents. This broad based challenge has been met by many scientists and their findings can be found on the ONSchallenge page.

Another form of open science that facilitates collaboration and transparency of scientific communication is communication through online forums. Friendfeed hosts one such forum where many open scientists go to talk about discoveries or scientific musings. It is through this forum that I have had the honor of meeting many different scientist, including Dr. Bradley, Dr. Bill Hooker, Graham Steel, and too many others to name.

Public availability and accessibility of scientific data is complicated because there does not exist a standard for the dissemination of data. Nor is there a repository for collecting data that can be viewed publicly. I will discuss some of the advances made here at the University of New Mexico in an attempt to create a repository for data. I will also discuss other web based services that have taken the initiative to start hosting scientific data in this chapter.

Getting scooped
There are some researchers who would argue that putting data in the open will get you scooped. I have heard many of these arguments but I have yet to hear one that can counter this example given by Dr. Rob Olendorf who is a collaborator and a library scientist here at UNM. Dr. Olendorf likens open science to red winged black birds that he studied at artificial ponds (Olendorf 2004a,Olendorf 2004b). The birds will setup a community at these artificial ponds where they define territories and select mates. If a bird neighboring another bird decides to cheat on its mate with the neighbor, it will irritate the entire community of birds around a pond and keep other male neighbors from cooperative nest defense with the cheater. This analogy can extend to the open science community. If an open scientist discovers that data published openly is being misused, it is sure to cause a scene in the community. The scene will more than likely be caused because of how open scientists communicate, which is through online forums. A perfect example of this is the response generated when the online community discovered that some journals are willing to charge you extra for expedited review. This online feed led to an openly editable protest letter against the fast-track fees. 

Open notebooks
I feel that open notebooks or even electronic notebooks in general are preferable to paper notebooks in that they are accessible from anywhere there is internet access. Private wiki based notebooks are available that maintain a level of security to projects if an open based notebooks is not an option. See the project by Galois for an example. The ability to access information done in a lab from anywhere is very beneficial and has aided my research quite a bit. Since my notebook is open, a simple Google search reliably returns information that I put in it just by using simple search terms.

The use of an open notebook does come at a cost, however. Since this is web based technology; servers can crash, someone can forget to pay the bill, and/or data can be inadvertently lost if not redundantly backed up. I have not experienced a server crash but, I have experienced other infuriating issues with this technology. One case is when the browser crashes. If one does not continuously save pages written in the wiki, then they can be lost due to a browser or system crash. This is not a problem when using a paper based notebook.

Online technology
I cannot discuss open notebooks without discussing the myriad of other web based technologies used in conjunction with the notebooks. These things include BenchFly, YouTube, Google, Scribd, Instructables, and many others. This many others is actually problematic as there is no standard in how scientific information is disseminated. Some services such as Flavors attempt to bring all online content that users make into one single webpage that is easily accessible and navigable. This type of online content aggregation is called a life stream. Unfortunately no such service exists for scientists that are doing open notebook science. This means that the scientist is left with trying to coerce the available web based applications to do what one needs in order to publish data openly. This is just indicative how young the area of web based open science is and I hope it changes in the future.

The storage and user readability of data from experiments can be a very complicated subject. Every experimenter uses short hand and abbreviations in their experiments. Those notes may be easy for the experimenter to read but, they are basically gibberish to someone that is looking at the raw data from an experiment for the first time. Dr. Koch and I have been collaborating with Dr. Olendorf in order to see if we can use the library system here to store scientific data at an institutional level. Dr. Olendorf is programming an automated XML tagging system that will allow the raw data I take (mostly images) to be tagged with user readability in mind. The tagging is similar to the meta data used for mp3s. We are basically pushing the limits of storage and usefulness of scientific data at UNM with this project since just one of my experiments can produce 1 TB of image data. Storage and transferring of data become big obstacles for openly disseminating data in the TB regime and we are working on solutions to host the data efficiently.

Other web based applications that can disseminate data not at the institutional level are also helpful. I have spoken to Dr. Alan B. Marnett, founder of BenchFly about the possibility of using his service as a way to host image data. Hosting image data on BenchFly is a very natural evolution to the site's purpose and Dr. Marnett was excited about the possibility for hosting the data. Unfortunately the cost of uploading greater than 1 TB of data has been an issue.

I will not discuss the advantages or disadvantages of using an institutional data hosting service compared to a cloud based one, i.e. BenchFly. I believe that both are essential to the dissemination of data because one is designed to be archival (the libraries), while the other is designed to be easily navigable by users.

Open data sharing
Before speaking to Dr. Marnett from BenchFly, I uploaded videos of data to YouTube. Doing a simple search on Google using the key terms gliding motility assay will bring up several movies showing data I took. That data led to Dr. William Saxton and Dr. Josh Deutsch both professors at UCSC to ask Dr. Koch if they could obtain the data in the movies. Of course I was extremely happy to give the data to them. The gliding motility assay data I took was designed for a specific purpose, trackability. This was so that I could take speed measurements discussed in Chapter 3. Microtubules that exhibited motion that was circular, were not tracked for my purposes, however, the data did have some microtubules that did exhibit this circular motion. It turns out that the circular motion is what Dr. Saxton and Dr. Deutsch were after. Stuck microtubules have been shown to mix the insides of fly eggs (Serbus 2005) and Dr. Deutsch, his student M. Brunner and Dr. Saxton (Deutsch 2011) came up with a spectacular model describing this motion. The data I took that was used in their study is well beyond anything I or Dr. Koch could have imagined since before our interactions with them, we had no idea this area of research existed.

Not only did they use it in their paper but, Chapter 2 of this thesis was used to get Dr. Saxton's group started with gliding motility assays. Gliding motility assays are not easy and I have seen them fail for inexperienced researchers almost every time they attempted it. Dr. Saxton's student, Corey Monteith, read my rough (and I mean really rough) draft of Chapter 1 and was able to get the assay working in their group. Having a common language, outlined in Chapter 2, expediated my suggestions to Corey in order to debug the assay. This is a major advancement for their group because what took me nearly a year to perfect, Corey was able to do in a month. 

Impact
In science we talk a lot about impact. Unfortunately the way we define impact is based on ideas that I will not discuss here but, impact is based on the ``prediction'' of impact. For a discussion about impact factor by a major player in the open science community, Dr. Cameron Neylon, see his blog entry on the subject (Neylon).

Impact of open science does not have the same metrics as more traditional science. In fact, the metrics for open science impact are not agreed upon yet. For instance, I personally have zero impact in science using the old metrics. I have spent years as a scientist and using the old metrics, I have nothing to show for it. If the traditional definition of impact carried imaginary numbers, I probably would have a negative or imaginary number associated to my name. But, I have contributed to open science. Some of that contribution comes in the form of simple 3D renderings of optomechanics I made using a freely available program from Google called SketchUp. This allowed me to build models of very complex optical systems very easily as can be seen in Figure 2.

The SketchUp community, with no influence of my own, decided to make an optomechanics section in the 3D warehouse with the models that I uploaded. This may seem trite to a traditional scientist but, a simple search for optomechanics in the warehouse produces a lot of models that can be downloaded freely in order to design optomechanical systems.

Laboratory research necessitates that the researcher be a maker. A maker is someone that makes something, be it a carrot cake, see Figure \ref{fig:CarrotCake}, or an optical tweezers. A vast community of makers exists that most scientists do not know about. I have recently started posting things to the maker community via posts to a site called Instructables. Those posts include how to build an objective heater, a laser shutter, a hot plate/stirrer and the assembly of a diode laser. Conducting Google searches using the terms objective heater or laser shutter consistently place my posts in the top 10 results found. These contributions have no traditional impact yet they are useful. In order to make open science impact something tangible, we need to have some sort of metric system in place. 

Conclusion
The last item I did not discuss from Dr. Gezelter's statements about open science is communication. In reality, closed or open, the only way to push science forward is to communicate in my opinion. The form of communication can vary as it may mean talking to a person not on the project (but has security clearance) or, posting scientific musings and questions to online forums. Open scientists may have it easier than closed scientists do since there exists forums to communicate openly about science.

Open science does have its problems when it comes to where or how to post data and information and how to determine its impact. The pitfalls of hosting data will ultimately work itself out if there are enough scientists like Dr. Rob Olendorf and Dr. Alan Marnett who are interested in making scientific data archival and available. Impact will be defined once a metric is established. Even with its current problems, open science is evolving and will not disappear.

Return to the table of contents. 