Open Source Research

From OpenWetWare
Revision as of 23:33, 3 July 2013 by Matthew Todd (talk | contribs) (First go at rewording the page with a new hierarchy)
Jump to: navigation, search


Open Source Research Home        Malaria        Tuberculosis        Links       

Open Source Research (OSR) adopts the following basic rules (first written down here):

  1. First law: All data are open and all ideas are shared
  2. Second Law: Anyone can take part at any level
  3. Third Law: There will be no patents
  4. Fourth Law: Suggestions are the best form of criticism
  5. Fifth Law: Public discussion is much more valuable than private email
  6. Sixth Law: An open project is bigger than, and is not owned by, any given lab.

This wiki gathers resources for open research aimed at finding new medicines for diseases. A wiki is intended for project status and notes - the actual collaboration occurs on other pages. In the case of the malaria project, for example, these may be found here.

Open Source Software Development

Open source in software development implies the project was open to anyone, and the final product emerged from a distributed team of participants. There may have been a funded kernel of work initially, but the subsequent development by the community is not explicitly funded. There are many examples of high quality, robust and widely used applications that were developed by an open source model, such as the Firefox and Chrome web browsers, the Linux operating system and the Apache web server. There are thriving open source software development communities on the web at, for example, Sourceforge and GitHub. Central to the operation of these sites and projects is the sharing of data and ideas in near-real time.

Open data

Many valuable initiatives advocating open data have emerged in which large datasets are deposited to assist groups of researchers (e.g., Pubchem, ChEMBL and SAGE Bionetworks); the release of malaria data in 2010 falls into this class. These very important ventures employ the internet as an information resource, rather than as a means for active collaboration. For people to work together on the web, data must be freely available. Yet the posting of open data is only a necessary and not sufficient condition for open science. Open data may be used without a requirement to work with anyone. The GSK malaria data, for example, may be browsed and used by people engaged in closed, proprietary research projects - there is no obligation to enagage in an open research project.

An important feature of open data is that it maximises re-use (or should be released in a way that permits re-use). Essentially the generator of data should avoid making assumptions about what data are good for. The data acquired by the Hubble space telescope has led to more publications by teams analysing the data than from the original teams that acquired the data.

The Panton Principles describe important recommendations for releasing data into the open.

Open Innovation and Prize-based Incentives

As an effort to stimulate innovation, several pharma companies have adopted an "open innovation" model. This is a somewhat nebulous term that means companies must try to bring in the best external ideas to complement in-house research.NRDD Article The mechanisms of bringing in new ideas are:

  • Prizes for solutions to problems (e.g., Innocentive). A competition means that teams work in isolation and do not pool ideas. Such a mechanism does not change the nature of the research, rather the motivation to participate. The pharmaceutical industry itself essentially already operates on this model.
  • Licensing agreements with academic groups/start-ups (e.g., Eli Lilly’s PD2 program). In such arrangements, companies may purchase the rights to promising ideas. Vigilance of intellectual property may of course shut down any open collaboration at a promising stage. It has therefore been proposed to limit open innovation science to “pre-competitive areas” (e.g., toxicology) but to date the industry has been unable to define what the term “pre-competitive” means beyond the avoidance of duplication of effort and the requirement for public-domain information resources.NRDD article

For more on this distinction see Will Spooner's article.


The use of a widely distributed set of participants to accelerate a project is a strategy that has been widely employed in many areas. The writing of the Oxford English Dictionary made use of volunteers to identify the first uses, or best examples of the use, of words. Pioneering work on distribution of computing power required on science projects (where the science itself was not necessarily an open activity) was achieved with the SETI@Home and Folding@Home projects.

With the rise of the web, several highly successful crowdsourcing experiments have emerged in which tasks are distributed to thousands of human participants, such as the Foldit and Galaxyzoo projects. What is notable about such cases is the speed with which the science progresses through the harnessing of what has been termed the “cognitive surplus”.

Open science

Open science is the application of open source methods to science. Thus data must be released as they are acquired, and it must be possible for any reader of the data to have an impact on the project. There should be a minimisation of groups working on parts of the project in isolation and only periodically releasing data - ideally complete data release and collaboration happen in real time, to prevent duplication of effort, and to maximise useful interaction between participants.

Though there is no formal line to distinguish crowdsourced projects from open science projects, it could be argued that open science projects are mutable at every level. For example, while anyone could participate in the original Galaxyzoo project, the software, and the basic project methodology, were not open to change by those who participated. On the other hand in the Polymath project, while there was a question to answer at the outset, the direction the project took could be influenced by anyone, depending on how the project went. In the Synaptic Leap discovery of a chemical synthesis of a drug, the eventual solution was influenced by project participants as it proceeded.

Open Source Drug Discovery

Drug discovery is a complex process involving many different stages. Compounds are discovered as having some biological activity, and these are then improved through iterative chemical synthesis and biological evaluation. Compounds that appear to be promising are assessed for their behaviour and toxicity in biological systems. The move to evaluation in humans is the clinical trial phase, and there are regulatory phases after that, as well as the need to create the relevant molecule on a large scale.

Since no drug has ever been discovered using an open source approach it is difficult to be certain about how OSDD would work. However it seems likely that the biggest impact of the open approach would be in the early phases before clinical trials have commenced. Open methods could also have an impact on the process chemistry phase, in creating an efficient chemical synthesis on a large scale.

Open work cannot be patented, since there can be no delays to release of data, and no partial buy-ins. If a group opts out of the project to pursue a "fork", they leave the project. Open source drug discovery must operate without patents. The hypothesis is that through working in an open mode, research and development costs are reduced, and research is accelerated. This offsets the lack of capital support for the project. Costs of clinical trials and product registration would have to be sourced from governments and NGOs. Whether this is possible is one of the central questions of OSDD.

What Can I Do?

Open projects rely on participation by interested strangers. To participate, find coordination pages of the projects, read and ask questions. Follow feeds from projects. If things aren't clear, try to contact someone involved with the project. Some sites, like this one, are wikis, and so you can edit those pages directly. Some are blogs, and so you can leave comments.

This wiki hosts current projects


An appropriate default licence for open research is CC-BY-3.0: any results are both academically and commercially exploitable by whoever wishes to do so, provided the project is cited. This allows for full commercial benefit from open research while maintaining well-worn standards of giving credit where credit is due.