OpenSourceMalaria:Story so far
This is a human-readable summary of the first open source drug discovery for malaria project. A less readable collection of all the relevant data can be found here.
Note: This page is currently (May 2 2012) being built, so some links are missing.
The story so far in the open source drug discovery for malaria project
Last year my lab received seed funding for a pilot project in open source drug discovery from the Medicines for Malaria Venture. Our project champion at the outset was Tim Wells, who had clearly been thinking along similar lines to me - rather than debate the idea of open source drug discovery any more, let's just try it. Jeremy Burrows quickly came on board and led the suggestion to go after a few of the actives that had been placed in the public domain in 2010 by GlaxoSmithKline. We got started in the lab in August 2011. We were successful in securing further funding from an ARC Linkage grant - a scheme where funds from an external agency are matched by the Australian Government. This has given us funding from May 2012 for three years - so we have enough money to drive this project for the moment to see if it works.
Naturally we're all really excited about this. The scientific idea behind the project is familiar medicinal chemistry territory - we need to find a small molecule that is effective for the treatment of malaria, and we will do that by making molecules (my lab's primary responsibility) and evaluating them (with collaborators). Based on those results, we make analogs, or ditch the series and pick another. We started with the arylpyrrole series that was one of the most attractive sets in the original GSK dataset, but there are plenty of others that are also very attractive from a medchem perspective.
The difference with this project though (as we previously described in the 6 Laws) is that everything is open, meaning all the experiments go on the web (including the ones that did not turn out well). All the data are available. Anyone can do anything they wish with the compounds, with the proviso we are cited - the licence for the project is CC-BY-3.0, though this is sometimes not yet clear on all the various websites we use. The main difference is that anyone can take part - that people may make molecules, offer guidance and input in other ways that change the direction of the project as it is happening. i.e. rather than releasing all our data at the end of the project we release as the project is happening so that people can really become involved in the research. Thus the iterative cycle of analog synthesis in response to biological data that is normally guided by a kind of medchem intuition is now guided by the intuition of the collective. Similarly, since the biological data are all open too, it should be easier to form an objective assessment of a molecule's performance divorced from the judgement of those closest to the compounds. In the same way that in software development "with enough eyeballs all bugs are shallow" we hope that the open nature of the research makes the science better and faster. As it did with our previous synthetic project with praziquantel.
Early Stages of the Project
Paul Ylioja started by resynthesising the two known active compounds from the GSK set, plus a few simple derivatives, and confirming that they were active. The biological evaluation was carried out by three separate labs to ensure we were on a solid footing. Our MMV project champion Paul Willis recommended a few "near neighbor" compounds that also looked interesting, and we made a number of these too, which were again evaluated. Sanjay Batra came on board the project and his student Soumya made some analogs that will be included in our first paper, though their activity is low. Sanjay works at the CDRI in Lucknow, India, where Saman Habib also works - Saman is leading the Indian OSDDm project which will shortly get started. The outcome was that the original hits remained interesting (because of their reasonable potency and logP values) but that we were clearly also generating highly potent novel antimalarials in this class. One compound was coming out picomolar. This is quite impressive given the small number of compounds made to date, and is perhaps testament to the quality of the hits contained in the GSK set.
Several compounds have been sent for metabolism assays to Sue Charman's lab at Monash and we are waiting for the results. The two original GSK compounds as well as the highly potent near neighbor X are currently being evaluated in mice. Compound X and Y were recently subjected to the hERG assay and passed, implying that they should not exhibit cardiac side effects. Several of the compounds have also been evaluated in a gametocyte assay with interesting results that may suggest they have activity in blocking the transmission of the parasite.
What Are the Compounds Doing?
What might these compounds be doing? We're not sure. The original screens were whole-cell assays, so while we know they're effective, we don't know what they're doing. Iain Wallace from ChEMBL has done a very neat prediction of these compounds (as well as predictions for the whole malaria box, which is a set of compounds MMV are providing to people for antimalarial screening and which are the focus of a current round of Gates requests for proposals). Iain's able to cluster compounds as a similarity map, which is a neat way of visualizing the correlation between structure and predicted activity. Are these predictions right? We're seeking to examine that by sending a subset of compounds to Corey Nislow at the University of Toronto for examination in a yeast-based assay he's developed.
How do we Obtain Other Compounds?
The original compounds from the GSK assay were commercially-available. Rather than make compounds that might be sourced by other means, what about obtaining commercially-available compounds of interest from the suppliers, either through purchase or donation? Iain again came to help us with this by finding commercially-available compounds through an emolecules search which looked similar to those we are interested in (actually for a related series, see below). Now we have to see whether we can actually obtain these compounds. It's an interesting conundrum - we know that there are compounds sitting in fridges that are related to our most active antimalarials, and all we need are a few milligrams. How do we get hold of these most efficiently? And what about compounds that are not commercially-available? Are there compounds in academic labs that we could evaluate, such as those from the Roberts lab at Scripps (whom we've contacted). What about compounds that are unpublished, but may be excellent candidates for screening? It would be useful to have a needs-driven marketplace for molecules, like a Molecular Craigslist, but it doesn't really exist.
In parallel an Honours student in the lab, Jimmy Cronshaw, has started the synthesis of two other hits from the GSK set, again to confirm activity. He's nearly finished one, and is about to do the difficult part of the other series. Again, these are very attractive hits to be pursuing.
Open Source Drug Discovery More Generally
We held an interesting one-day meeting on open source drug discovery for malaria where we discussed general issues surrounding open drug discovery, followed by more specific malaria-related ideas. These talks are gradually going up on YouTube, and they frame many of the issues very well, for example the landscape of drug discovery in neglected diseases, and whether patents are necessary in drug discovery. More coming as and when we can do the annotation properly.
How We Run the Project
The technical background to the project is also interesting, though as with everything in this project we're open to suggestions. We're using Labtrove for recording the raw data in electronic lab notebooks. We're using this site to coordinate and discuss ideas. We're increasingly using Google+ for discussion of small points, and Twitter as a broadcast mechanism for updates. We have not used LinkedIn as much as we did for the praziquantel project. We employ a wiki to host the current description of where the project is at. These sites are all quite intuitive and simple, but all have their limitations.
Why Take Part?
What of motivations? Why would people contribute? Partly to solve a problem. Partly to be involved with quality science that is open, and hence subject to the most brutal form of ongoing peer-review. Partly for academic credentials (since we'll soon be publishing a paper). Partly to demonstrate competence. Perhaps a mixture of all these things.
We're playing with the idea of a competition, though - or rather a prize. While we have certainly led the way to a very promising lead compound, we are acutely aware that there is a long road towards having a compound look sufficiently promising that it moves towards clinical trials. There's a lot of tweaking, and perhaps even the move to another series. Who knows. It's likely we will need a lot more input that we have currently been getting, and so we're playing with whether to launch a prize to stimulate input. It would be a teamless prize, awarded based on performance of individuals within a group where everything is shared. Difficult to judge, difficult to award, and hence worth doing.
A final point - the project is open. We don't own it. It exists in itself and those people most active in the project lead it. If you wish to contribute, in any capacity, please do so. There is no need to "clear" anything with me by email first. It's often the case that I will receive questions/suggestions by email. In the development of Linux, the need for Linus to approve everything caused problems, and the observation that "Linus doesn't scale". Well I don't scale either, so it's more efficient if all our discussions are held publicly. I know that a lot of people don't like this. In science we don't tend to get the idea of "beta testing" something. When data are released in science there is an expectation that the data are correct, and usually accompanied by an explanation. I don't understand this view. I'm comfortable with release of data immediately, and then I'm happy to apply a caution filter that makes me skeptical of things until repeated, or makes me unsurprised when a repeat fails.