BIOL368/F14:Class Journal Week 13

From OpenWetWare
Jump to navigationJump to search

Chloe Jones

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    • The main issue of the data and analysis produced by Baggerly and Coombs was that when trying to reproduce the study, they were not getting the same data. They were not able to replicate the data analysis that was provided, which was a major concern. The genes were mislabeled and the maps to see how drugs affected particular genes were also very disorganized. The practices violated by DataONE were keeping the data in a consistent/organized format so easy to follow and reproduce. Common issues mentioned by Dr. Baggerly were labeling of the genes and inconstant terms .
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • Dr. Baggerly recommends that organization and precision are key in producing reproducible data. He also suggests that the data be easily accessible and have an extreme amount of detail so other researcher can easily follow the steps you have taken without running into any confusion because of how in depth the information is. He also recommended a code method that could be used to identify errors and ensure accuracy.
  3. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • I’m not as upset about the case because following the talk it doesn’t seem as intentional as beforehand. Although, I am still upset that the doctor and research team showed such poor efforts in organizing , checking, and validating their data. They are highly trained individuals, so verifying their data and procedure would seem as though a second nature. It’s just sad to see that people had to suffer at the expense of somebody who should have known better.
  4. Look at the methods and results described in the paper from which you got the data you are working on. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
    • From my methods section I think I would have hard time reproducing the data. At first glance it seemed as though the methods were well detailed by after further analysis, I was able to see that it was lacking or failed to mention some information that would be viable to reproducing the study. For instance, I was not able to find clear consistencies with my data and that of theres.

Electronic Lab Notebook

Weekly Assignments

Class Journals


Chloe Jones 03:46, 15 October 2014 (EDT)Chloe Jones


Isabel Gonzaga

  • What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    • Some of the main issues of data and analysis identified by Baggerly and Coombs are that a lot of data is not reproducible. Researchers have 'intuition' about what makes sense in their research which affects their interpretation of their results. People are accepting results from research without questioning it, and thus the wrong conclusions are made and applied in clinical problems, which may have extreme ramification. Another main issue is the inconsistency of publishing and storing data, labeling data, and providing a detailed methodology to show readers how researchers come to their results. The lack of a uniform and careful system has many potential ramifications in the applications of the research.
    • Some of the Best Practices enumerated by DataONE is to be consistent in data columns and keep data in one table, use descriptive column and file names, leave blank spaces or enter '9999' for missing data (and tag with appropriate flags), enter complete lines of data, and to store data in a consistent non-proprietary format so it can be readable and reproduce able in the future.
    • Dr. Baggerly claimed that poor documentation is a common issue, which leads to forensic bioinformatics. The most common mistakes are the 'simple ones': confounding the experimental design, mixing up sample labels/gene labels/group labels, or incomplete documentation, which are largely consistent with DataONE's best practices. DataONE does not, however, mention documentation within the report.
  • What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • Dr. Baggerly recommends setting a norm in papers. In data, there should be labeled columns with a code provided. Full descriptions and methodology should be provided. This corresponds with DataONE's suggestions, although DataONE goes into much further depth in describing what 'good data entry' looks like. Dr. Baggerly also showed that his lab is strictly using the literate programming using Sweave software. DataONE recommends that all data and reports should be saved in a consistent format that can be read by any application, which I believe is doable with Sweave.
  • Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • It was surprising to see Duke's initial reaction to the case, calling it a "confidential document". In the original 60 Minutes video, it showed Duke to be completely surprised by their researcher's actions and willing to admit their wrong and take full responsibility. However, in this presentation, they appear to be much more reluctant and shown to be in a negative light. Their unwillingness to share their data makes them seem a little shady, in my eyes. Dr. Baggerly's talk also made this problem of reproducible research seem very common, as he details a number of studies that he was personally able to find problems with - certainly there is much more out there today.
  • Look at the methods and results described in the paper from which you got the data you are working on. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
    • I do not think there is sufficient information to reproduce their data analysis, because they do not discuss the log calculations or fold inductions used to generate their conclusions. There is no mention of statistical analyses, and the p-values were not provided. The gene samples for each condition is also badly and inconsistently labeled, which made it difficult for me to find the appropriate samples. While it is possible that I did the calculations wrong, my data and calculations so far show inconsistencies with the 48 gene regulon determined in the report.


Isabel Gonzaga 14:15, 2 December 2014 (EST)

Weekly Assignments

Class Journals

Electronic Lab Notebook

Nicole Anguiano

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    • The main issues with the data analysis by Baggerly and Coombs was the inability to reproduce the data. The methods were not detailed enough to explain how they did everything, especially regarding the software. Going beyond that, they also experienced problems in the organization and representation of the data. The most apparently best practice that was violated was the keeping of a valid, organized data set, as columns were often mislabeled and the data itself was invalid. The most common issue Dr. Baggerly noted was the data being offset by one. He also noticed repetitions in the data, improper naming, and reversing of values.
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • Dr. Baggerly believes that the paper data should be completely open and understandable so that anyone can read it. He also believes that the provenance of the data should be made open and clear, so that readers know exactly where data is coming from. He believes that the code should be available, and that descriptions of nonscriptable steps be provided. Lastly, the description of the planned design should be made available. DataONE also recommends that data be open, uncompressed, and unencrypted so that anyone can read it. DataONE also emphasizes that the data be understandable and well-organized.
  3. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • This case made me much more frustrated at Duke than before. That they kept the information confidential from the public that they likely knew would be incriminating, yet were willing to share with the National Cancer Institute was saddening. Also, the fact that the reviewers could not figure out how to reproduce the results from the paper, but only from the supplemental materials indicates to me that someone must have known something was wrong. It seems to me that everyone was so desperate to believe that the treatment would work that they ignored all the facts, numbers, and logic behind why it didn't.
  4. Look at the methods and results described in the paper from which you got the data you are working on. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
    • I do not think there is enough information to reproduce the data analysis. As the raw data is not supplied, it is difficult, if not impossible, to know what processing was done to the raw data, as it is not discussed in the paper. The data is also missing some information that would typically be expected, making analysis difficult. The false discovery rates are also not provided, making it difficult to know exactly how accurate their results were.

Nicole Anguiano 02:32, 3 December 2014 (EST)

Nicole Anguiano
BIOL 368, Fall 2014

Assignment Links
Individual Journals
Class Journals