OpenSourceMalaria:Story so far

From OpenWetWare
Revision as of 23:19, 20 August 2013 by Matthew Todd (talk | contribs) (Series 1 - The Arylpyrroles: added note about hybrid arylpyrroles that might be of interest)

Malaria Scheme.png

Malaria Home        OSM So Far        Compound Series        Links        Open Source Research Home        Tech Ops        FAQ       

This is a human-readable summary of the open source malaria project to date. A less readable collection of all the relevant data from the various compound series can be found here.

The story so far in the open source malaria project

The OSM project is primarily concerned with taking public domain compounds that have shown good activity in killing the malaria parasite in cells and improving the properties of those molecules in order to discover a compound that can enter Phase I clinical trials. This is phenotypic drug discovery project focussed on the hit-to-lead phase.


In 2011 the Todd lab at The University of Sydney received funding for a pilot project in open source drug discovery from the Medicines for Malaria Venture (MMV). The project champion at the outset was Tim Wells. Jeremy Burrows came on board and led the suggestion to go after a few of the actives that had been placed in the public domain in 2010 by GlaxoSmithKline and others. Work got underway in the lab in August 2011. The team were successful in securing further funding from an Australian Research Council Linkage grant that has generated funding from May 2012 for three years.

OSM Project Structure

The scientific idea behind the project is familiar medicinal chemistry methodology - the aim is to find a small molecule that is effective for the treatment of malaria, and that involves generating new molecules and evaluating them. Based on the biological results new analogs might be made or the series might be abandoned and another selected. Decisions to abandon series may be taken early because there are plenty of other series that are very attractive from a medchem perspective.

The crucial difference with this project though (as described in the 6 Laws) is that everything is open, meaning all the experiments go on the web (including the ones that did not turn out well). All the data are available. Anyone can do anything they wish with the compounds, with the proviso the project is cited (see CC-BY licence conditions below). Anyone can take part - people may make molecules, offer guidance and input in other ways that change the direction of the project as it is happening, i.e. rather than the release of all data at the end of the project, data are released as the project is happening so that people can become genuinely involved in the research. Thus the iterative cycle of analog synthesis in response to biological data that is normally guided by luck and medchem intuition is now guided by the intuition of the collective. Similarly, since the biological data are all open too, it should be easier to form an objective assessment of a molecule's performance divorced from the judgement of those closest to the compounds. In the same way that in software development "with enough eyeballs all bugs are shallow" the open nature of the research makes the science faster. This was found to be the case in a previous open source synthetic chemistry project involving the drug praziquantel.

Experimental Progress to Date


The first series to be tried was based on an arylpyrrole that was one of the most attractive hits in the original GSK dataset. Ultimately this series was parked because the ester, which was metabolically unstable, could not be replaced with another functional group. A related set of compounds, the "Near Neighbours" displayed high potency but were found to be too insoluble. Series 2, the triazoloureas, appeared promising but there was a suggestion another team were working on this series, so it was parked. Series 3, the aminothienopyrimidines, are under current investigation, though the analogs examined to date have been inactive and also suffer from low solubility.

Compounds evaluated to date (assigned OSM codes): picture list, .sdf, old spreadsheet

Series 1 - The Arylpyrroles

Paul Ylioja started Series 1 by resynthesising the two known active compounds from the GSK set (OSM-S-5 and OSM-S-6 - structures below), plus a few simple derivatives, and confirming that they were active. The biological evaluation was carried out by three separate labs (Vicky Avery, Stuart Ralph and the original GSK Tres Cantos Lab led by Javier Gamo) to ensure a solid footing of repeatability. The original compounds contained an ester which was thought likely to hydrolyze in vivo, so various versions of the "lower half" of these leads were also evaluated to check whether the original hits were prodrugs, but all these compounds were found to be inactive. The project champion from MMV, Paul Willis, recommended a few "near neighbor" compounds that also looked interesting, and a number of these were made too and evaluated in this first round. One compound, OSM-S-9, was found to be more active than the original GSK hits. Sanjay Batra came on board the project and his student Soumya made (and is making) some analogs varying in the position of the fluorine atom, though the activity of those tested to date is low. (Sanjay works at the CDRI in Lucknow, India, where Saman Habib also works - Saman is leading the Indian OSDDm project which will hopefully get started soon). The outcome was that the original hits remained interesting (because of their reasonable potency and logP values) but that highly potent novel antimalarials were also being generated in this class. Thus a second set of compounds was synthesized and evaluated, giving rise to several new highly potent compounds, one of which (OSM-S-39) displayed a picomolar IC50 value. This is impressive given the small number of compounds made to date, and is testament to the quality of the hits contained in the original GSK set.

Original and Representative Potent Compounds in the Arylpyrrole Series

It was decided to move the most promising compounds on to advanced biological evaluation, to assess the promise of this class early rather than continue to increase potency through analog synthesis:

  • Metabolic and solubility assays: The two original GSK compounds (OSM-S-5 and OSM-S-6) and six other compounds made in this project were evaluated by Sue Charman's lab at Monash for their stability in phosphate buffer. The raw data are here and can be discussed here. The GSK originals displayed good solubility but moderate degradation rates. The other compounds were degraded more slowly but at a cost of low solubility. Subsequently the original GSK compound OSM-S-5 was evaluated for stability in human and mouse plasma. The compound was stable in human plasma but susceptible to hydrolysis in mouse plasma. Esterase activity is known to be higher in rodents than in other species which was confirmed using a control compound in this assay (p-nitrophenol acetate) so the results are not too surprising. Lab book page here. A glutathione trapping experiment was carried out on OSM-S-35 as representative of the Near Neighbour series, giving some identification of possible trapped metabolites, but the levels observed were not large.
  • hERG: One of the original GSK compounds (OSM-S-5) plus one of the most potent novel compounds identified to date (OSM-S-35) were subjected to the hERG assay and passed, perhaps implying that this class of compounds should not exhibit undesirable cardiac side effects. Discussion page here.
  • Late Stage Gametocyte Assay: Four of the compounds have also been evaluated in a late stage gametocyte assay with very interesting results indicating unusually high activity in blocking the transmission of the parasite. The original GSK compound OSM-S-5 was inactive. Discussion page here.
  • In vivo: However, the two original GSK compounds as well as one of the most promising near neighbor compounds (OSM-S-35) were evaluated in mice and found to possess zero oral efficacy (Results available here). Subsequent analysis of the plasma samples from the trial with one of the GSK compounds (OSM-S-5) showed that the compound was indeed orally available, but levels in the blood were not being maintained. Raw data here.

Other discussions of the biological results described above can be found here. The choice became whether to change the focus of the OSM project to another compound series or whether to continue to alter the structures of the most promising compounds to overcome the in vivo roadblock described above.

A decision was taken to carry out a third round of analog synthesis and evaluation on the arylpyrrole series, with an emphasis on analogs a) with low logP and b) that lack the thiazolidinone heterocycle. It was decided for the moment to park the near neighbour thiazolidinones because despite potency and activity in the late stage gametocyte assay, the series suffered from low solubility. Should this set of compounds be re-started, an automatic prediction of isosteric replacements has already been done (data).

A consultation was started and occurred asking for suggestions for the ten most appropriate compounds to make, and the ten most interesting for commercial procurement. Assistance came from automatic searching of databases of commercial compounds coupled with similarity searching based on the hit compounds or other sources. The final stage of the consultation took place live online. The lists of compounds were finalised and confirmation was secured from GSK that none of the proposed structures had been included in the original GSK screen; commercial compounds were ordered and synthesis commenced. In early November 2012, the team received results from the biological evaluation of the commercial compounds (having codes OSM-S-81 through 91) and the first synthetic compounds that had been completed. This set of compounds were found to possess low to negligible levels of activity. Though surprising, the date provided some interesting insights. For example, a forked series, the pyrazoles, looked attractive but so far all examples tested (e.g., OSM-S-92) were found to be inactive (pictorial comparison). It was clear that alteration of the portion of the molecule attached to the pyrrole tended to eliminate activity (picture of this). While not completely forbidden, replacement of the ester with amine or amide functionality was generally deleterious (picture of this).

The next batch of third round compounds were synthesised and evaluated in December 2012. As with the first batch, all of the compounds were essentially inactive (OSM-S-103 showed mild activity).

Representative Compounds from the Third Round

At the end of 2012 it appeared that the series was producing diminishing returns but there remained a small number of compounds which needed to be made to complete the campaign - specifically isosters of the troublesome ester in the original GSK hit. Patrick Thompson, a collaborator from the University of Edinburgh built on Matin Dean's work on the sulfonamide whilst Murray Robertson and Alice Williamson focused on synthesising further analogues of the near-neighbours, trying to find potent molecules with increased solubility. The near-neighbours were evaluated and a number of potent compounds were discovered, some with lower LogP than the first set of near-neighbour compounds.

The OSM project has to date had no luck in securing donations of compounds from commercial suppliers. On considering some structurally 'similar' active compounds from the TCAMS set, the team have also decided to synthesise a few extra compounds plus hybrids, in order to assess their biological activity.

What are the Compounds in Series 1 Doing?

What might this series of compounds be doing to the parasite to kill it so effectively? It's not clear. The original screens were whole-cell assays, so while it is known that the compounds are effective, it's not known what they are doing in any detail. Iain Wallace from ChEMBL has performed a prediction of the biological role of these compounds (as well as predictions for the whole "Malaria Box", which is a set of compounds MMV are providing to people for antimalarial screening, and for all the antimalarials in ChEMBL. Iain clustered the compounds as similarity maps, allowing visualization of the correlation between structure and predicted activity. Discussed also here. One of the predictions was that the compounds in Series 1 should hit an enzyme known as DHODH, and GSK are at the time of writing screening some of the project's compounds against this enzyme (done - data need to be added). These predictions are made using informatics - a comparison of the structures of our compounds with other known compounds that have known activities. The argument is based on extrapolation. To evaluate whether the prediction is correct, a subset of compounds has been sent to Corey Nislow at the University of Toronto for screening in a yeast-based assay that does not identify for sure what the compounds are doing (which is very difficult) but provides harder biological evidence for a role. There are occasional other clues about activity arising from studies of these, or related compounds, in the literature; in such cases it is not clear whether the activity is relevant to their antimalarial potency, or whether the compounds are highlighted in assays because they are frequently members of commercial libraries.

Remaining Possible Lines of Enquiry

Though the series has been parked, anyone is free to re-investigate. Of interest might be:

  • Synthesise these compounds that might arise from the cyclisation of the arylpyrrole side chain
  • Evaluation of the "other half" of the original GSK hit containing the antipyrene
  • A new search of the TCAMS data set revealed other compounds similar to the original hits that could be used for generating "hybrid" compounds that might represent new targets for the series

Series 2 - Triazoloureas


Series 3 - Aminothienopyrimidines


Old text that needs adapting then deleting Two other interesting-looking starting points from the GSK set, based on a triazolourea and a thienopyrimidine, were the focus of James Cronshaw's honours thesis at Sydney University. The triazolourea (OSM-S-56) was found to be even more active than the original GSK hit (TCMDS 134395). James has synthesised the thienopyrimidine series lead, and we are awaiting confirmation of the compounds antimalarial activity. If the lead compound is confirmed as active then it will be necessary to decide which, if either, of the series are to be taken on further. In general a strategy for the project is to decide as early as possible which series are looking attractive biologically, because there are still a lot of leads to pursue. If any of these structures look synthetically attractive to you based on your previous experience, please consider joining the project. Some of these molecules are quite straightforward to make and would be suitable for undergraduate lab classes.

/end of old text

General Strategies

How to Obtain Other Compounds

Many of the original compounds from the 2010 GSK screen arose from libraries purchased from smaller specialised companies. How best to obtain analogs of hits in the OSM project? Novel compounds clearly need to be synthesised. Other compounds may be available by other means, however.

1) Identification of Relevant Commercial Compounds: What if some relevant compounds are already commercially available? How can these be found? Iain Wallace was able to do a search of databases such as eMolecules for relevant compounds above a certain threshold of similarity and filter compounds by supplier, generating a "hitlist" quickly and with no manual human input. These can be converted into spreadhseets for quick visualisation - see here for examples in the aminothienopyrimidine series.

2) Obtaining Commercial Compounds: With the compounds identified, the relevant suppliers need to be contacted to ask for donations. That will probably not be trivial since it needs a human interaction. Failing that the compounds can just be bought.

3) Identification of Other/Academic Compounds: What about desirable, known compounds that are not commercially available, e.g. compounds sitting in academic lab fridges? Some of these may be identified using resources like SciFinder, though these require expensive subscriptions. Many useful compounds may not even be in the published literature (an argument for openness in science). There are some reports of activity of compounds in Series 1 (e.g., here), though it is not clear whether the activity is relevant to malaria.

4) Get Other/Academic Compounds: This will be a case of manual contact with interested groups. Here is an example enquiry. Naturally contributions are rewarded by possible authorship on resulting papers. Ultimately we (the scientific community) are not efficient at sharing chemical resources. A user-driven list would be helpful that permits the following open appeal: "I need compound X. I can buy it from you, or you can give it to me, or you can make it for me or with me, but I need the compound in timeframe Y." This is like a Molecular Craigslist, and would reduce some of the supply-demand barriers in chemical research.

How to Comment on What You're Reading

Comments/input/questions can be contributed via Google+ either by "+"-ing the project or in the Open Source Malaria Research Community, directly on the relevant Electronic Lab Notebooks, via Twitter, Facebook or by posting a new Issue to the GitHub To Do list. Much of the older project activity happened on the malaria pages of The Synaptic Leap. The page you're reading also has a "talk" tab for input if you would prefer, though there are no notifications active. Alternatively please write your own input somewhere (e.g., your own blog) and link to the project. Anonymity is perfectly acceptable, just less useful. Please avoid email if you can.

For the full summary of ways to join/interact/contribute, please see the "Join the Team" sheet on the OSM Landing Page.


The project's licence unless otherwise stated is CC-BY-3.0 meaning you can use whatever you want for whatever purpose, provided you cite the project.