BIOL398-05/S17:Class Journal Week 11

From OpenWetWare


Lauren M. Kelly Reflection Questions

Lauren M. Kelly

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated?
    • One of the main issues with the data and analysis identified by Baggerly and Coombs was that there was not enough information to replicate the results. In particular, the paper did not describe how to use the software to assess patient susceptibility. As stressed in the lecture, it is crucial that studies are able to be replicated. Another issue was that the heat maps did not match up. Upon further analysis, it was also revealed that the data was manipulated so that the number of people susceptible to a drug or resistant to a drug were switched. According to DataONE, one of the best practices that was violated was the use of reproducible workflows. The decision making process within the data was not able to be evaluated. They also did not store data that could be read well in the future.
  2. Which of these did Dr. Baggerly claim were common issues?
    • As Dr. Baggerly stated in his talk, the most common issues are mixing up sample labels, gene labels, or group labels, and incomplete documentation.
  3. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • In order to have reproducible research, Dr. Baggerly argues that the norm should include data, provenance, code, descriptions of nonscriptable steps, and descriptions of planned design. Researchers should also use literate programming, reuse templates, report structures, and write executive summaries, as well as include important information in the appendices. His recommendations are in line with DataONE, which argues for consistent names, formatting, and codes in order for the data to be accessible and reproducible.
  4. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • I am surprised that no one seemed to notice or care that the data was impossible to be reproduced. This flawed data was allowed to drive clinical trials and put the lives of human beings at risk. The individuals who reviewed this data and attempted to reproduce it should have fought harder in order to make this scandal public earlier. It amazes me that so much confusion could arise for the data and it was still allowed to move forward.
  5. Go back to the methods section of the paper you presented for journal club. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
    • The methods section of the paper I presented for journal club provides plenty of information on the strain they used and the growth conditions. The equation they used to calculate the p-value is provided, as well as the criteria they used to select the batch culture data they compared their data sets to. They also describe the analytical methods in detail, and point to where you can read about their methods if they got it from another source. I think that with some investigation into where they got some of their procedures, the data analysis could be reproduced. However, it is still unclear to me how many replicates they used for each growth condition.

Acknowledgments

  • Worked on shared journal entry with Margaret J. O'Neil on Tuesday, April 4th, 2017.
  • Except for what is noted above, this individual journal entry was completed by me and not copied from another source

Lauren M. Kelly 22:12, 5 April 2017 (EDT)

Margaret J. ONeil Reflection

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated?
    The main issue discovered by Baggerly and Coombs was that the data and analysis weren't reproducible. Mislabeling data and purposefully manipulating the data was discovered when the raw data was looked at as well. DataONE emphasizes the need for thorough documentation throughout experiments and analysis in order to ensure the work is reproducible, and so the data needs to be consistent and organized. Baggerly and Coombs had problems obtaining the same results due to this lack of documentation, and had to develop their own model to get the same results from the raw data.
  2. Which of these did Dr. Baggerly claim were common issues?
    Baggerly said the common mistakes were complete confounding of results and easy mistakes made on excel, such as mislabeling which showed up in the fraudulent data set.
  3. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    Baggerly encourages thorough documentation of the methods used in research, and good documentation of the scientific process. This includes labeling, and including details for all processes and methods used. DataONE recommends that recorded data be organized, which corresponds to the need for thorough documentation suggested by Baggerly.
  4. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    I am still disgusted by the Duke case, more so than before because it seems ridiculous after all the evidence that Baggerly and Coombs provided that the fraud case was still more-or-less ignored. The fact that Duke was provided the information that the data might be fraudulent shows extreme negligence on the part of the university. While the professor who presented the data is clearly in the wrong, equal blame should be put on the university for their negligent actions.
  5. Go back to the methods section of the paper you presented for journal club. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
    No, I don't think there is sufficient information to reproduce the data analysis because dynamic cycle length was never defined by the authors of the paper. While they did a good job of detailing their process for how they managed to arrive at their conclusions, the lack of information on how to compute 1 of the 2 graph statistics that the paper hinges on seems rather negligent. While it is mentioned briefly, the actual computation is not explained which makes it difficult to figure out how to reproduce their results

Acknowledgments

  • Worked on shared journal entry with Lauren M. Kelly on Tuesday, April 4th, 2017.
  • Except for what is noted above, this individual journal entry was completed by me and not copied from another source

Nika Vafadari Reflection Questions

Nika Vafadari

  1. The main issue was that they could not reproduce the data and analysis results, thus indicating that Dr. Potti's results could not be replicated. For example, they didn't find the same separation when comparing their fit testing data to Dr. Potti's. In addition, they found an off by one error when looking at the p-values for various drugs, meaning that their values were off from Dr. Potti's by one number. They found that the reason for this problem was that their software did not have proper instructions. For example they found out that when using the software, which requires two inputs to work, the inclusion of headers leads to the off by one error that they saw. Furthermore, while six of the heat maps matched up, there were four drugs that still could not be explained by the replicated data. Also while the software was able to give predictions, it did not properly explain the sensitivity of patients but instead gave a rate of accuracy. In fact, the study is in violation of several DataONE best practices, including creating data sets that are valid and organized, which includes making sure the format and labeling is correct. In addition, the data did not have descriptive column names, which could have helped prevent mislabeling. Most importantly, Dr. Potti's experiment failed to include reproducible workflow.
  2. Common issues include that the labels for samples, genes and groups were not all correct and confounding in the experimental design. In addition, the documentation was poor and incomplete so most of these mistakes were not caught.
  3. He recommends that data, provenance, code, descriptions of non scriptable steps, and descriptions of planned design if used in the experiment should be requirements prior to beginning clinical trials. Similarly the best practices recommended by DataONE include creating valid data sets that are organized and consistent in regards to names, codes, and format. In addition it recommends using descriptive column and file names to keep data in order and checking for missing data points. In addition, similar to Dr. Baggerly it also recommends maintaining dataset provenance and using workflow that is reproducible.
  4. I am surprised how many attempts it actually took for someone to take the claims of Baggerly and Coombs seriously. They put in a great deal of effort in order to ensure that Dr. Potti did not end up harming patients due to falsified data.
  5. No, while the methods section includes the linear equations used for the model it does not properly explain how to incorporate each one or give any background for how they were derived. In addition, during the presentation Dr. Fitzpatrick brought up the point that the non linear equation that the model is based on is actually not a non linear equation, meaning that the results of the paper are questionable. Therefore, proper explanation of derivation of differential equations and non linear model would be necessary to check the results of the paper.

Acknowledgments

  • I certify that this individual journal entry was completed by me and not copied from another source.
  • Nika Vafadari 18:08, 5 April 2017 (EDT):

Cameron M. Rehmani Seraji Reflection Questions

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated?
    • The main issues with the data and analysis identified by Baggerly and Coombs was that the data was not reproducible. DataONE emphasized the documentation and organized entry of data. Dr. Baggerly claimed that he had trouble replicating the data because they had to develop their own methods to get the same results.
  2. Which of these did Dr. Baggerly claim were common isssues?
    • Dr. Baggerly claimed that mixing up sample labels, gene labels, group labels, and incomplete documentation were common issues that came up.
  3. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • Dr. Baggerly recommends for reproducible research that there be good documentation of what was done so there is no guessing if someone wanted to perform the same experiment using your methods. This corresponds with DataONE's recommendations because it suggests that all the data must be organized and consistent.
  4. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • I am still surprised with how Dr. Potti got away with tampering with his data to make people believe that there was significance where there was not.
  5. Go back to the methods section of the paper you presented for journal club. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
    • Going back to the methods section of the paper presented in journal club, i think there is sufficient information to reproduce the data analysis. The paper provided information on the strain they used and the different growth conditions they created. In addition, the equations used to calculate the p-value and the criteria they used to compare the data to batch cultures was also provided. Some parts of the methods would need to be further investigated because they were referenced from another source, but I think that data could be reproduced.

Acknowledgments

  • Except for what is noted above, this individual journal entry was completed by me and not copied from another source.
  • Cameron M. Rehmani Seraji 21:09, 5 April 2017 (EDT):

Conor Keith Reflection Questions

  1. The main problem with the data was that they could not get the same results when replicating Dr. Potti's methods. They found that Dr. Potti manipulated p-values and reported incorrect goodness of fit measures. He also provided no detail on the software he used. Dr. Potti did not keep organized documentation as recommended by Dataone.
  2. The most common mistake is mixing up sample and group labels and bad documentation.
  3. Good documentation of workflow is one of the most important recommendations. Researchers must make it so their experiment can be easily replicated.
  4. I'm surprised Dr. Potti's supervisor did not look into his results more. I would think that he would scrutinize all research that came out of his lab.
  5. I believe my methods provide significant enough information for it to be reproduced.

Acknowledgments

  • Except for what is noted above, this individual journal entry was completed by me and not copied from another source.

Conor Keith 02:55, 6 April 2017 (EDT)