BIOL388/S19:Class Journal Week 4
Ava's Reflection
- What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- Go back to the methods section of the paper you presented for journal club. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
Austin Dias Reflection
What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
One of the first issues Baggerly and Coombs found was an "off by one" indexing error when comparing their list to the published work, meaning they referenced a set of genes that were not involved. For their input to work correctly on the software used by the publishers, Baggerly and Coombs found that the software could not have a header row. In addition, Baggerly and Coomb suggest that the researchers most likely mixed up the sensitive and resistant patients. This was apparent because their previous study focusing on childhood Leukemia has a well known cure rate, but according to their data they thought most of their patients would be sensitive. They may have swapped this interpretation when inputting their data and given the drug to patients who would get no benefit from it. Baggerly and Coombs later came to realize that researchers had 122 test samples, but they are not all distinct in the fact that some have been reused multiple times. Also they were able to match the heat maps for 6 of the 7 drugs, but they could only match the gene list for 3 of the 7 drugs used in the Duke published paper, suggesting that two of the best genes that they reported were not measured yet. Interestingly, the group of researchers from Duke posted new data for two drugs used for clinical trials in the middle of the investigation against them. 43 samples were mislabeled and the other 16 had no indication of what samples they corresponded to.
The best practices listed by DataONE that were violated include maintaining dataset provenance. One instance discussed by Baggerly was accidental duplication. There were many genes that were duplicated, which should not have been the case. Another practice that should be used, but was clearly not applied in the Duke scenario, is using reproducible workflows. They were not very transparent in respect to the integration process and their was a lot of confusion revolving around gene labels.
What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
Dr. Baggerly recommends supplying raw data for which results were drawn, the code used, and a "description of nonscriptable steps". He states that an independent individual should be able to run the data and get the exact same results. Dr. Baggerly's suggestions closely align with what DataONE recommends. Both sources strike home the importance of reproducibility and guiding others through each and every decision that you made as a researcher and not assuming that others will deduce your reasoning, even if it may seem trivial.
Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
After viewing Dr. Baggerly's presentation I realize the importance of checking to ensure that data is reproducible. Being able to reproduce data with the methods initially used could give an indication of the legitimacy of the results obtained. I was also surprised when Dr. Baggerly mentioned the amount of push back against his publications regarding his concern with the Duke research, specifically when someone mentioned that he was being too negative. However, I believe that more important than being seemingly pessimistic, is making sure that the drugs given to patients in clinical trials are appropriate and the data supporting their efficacy has been reproduced prior to approval for clinical trial.
Go back to the methods section of the paper you presented for journal club. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
I do not think there is sufficient information there to reproduce their data because they did not give a step by step indication of exactly what they did. It was more of a general guide and list of softwares and techniques they utilized. One example is in the section where they explain how they isolated the RNA. They simply state mRNA was purified using the Oligotex Spin-Column Protocol. The protocols they included were not detailed enough to be able to conduct without reaching out to the authors and clarifying certain aspects.
Austindias (talk) 10:50, 21 February 2019 (PST)
Angela's Reflection
What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
When reviewing the data, Baggerly and Coombs found that things were mislabeled, rows and columns were off, which left gaps in some areas. Overall the spreadsheets were not kept neat, and in some cases special characters were input that affected functions.
What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
Dr. Baggerly recommends that all data be labeled, be available to the public, and have clear and consistent formatting. He advocates for complete documentation of all steps taken and the use of a systematic format to keep all data neat and organized.
Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
This talk definitely gave me a different attitude on mistakes in data analysis. In "Deception at Duke", it seemed like the researcher was maliciously manipulating his data. However this video shows how easy it is for people to make mistakes when analyzing their data, creating unintentional problems. I think data analysis is a skill that needs more focus in education, because if it wasn't for this class I don't know where else I would be learning these skills that are critical to successful research.
Go back to the methods section of the paper you presented for journal club. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
I question the information in this paper because the link to the data did not work, and they did not go into depth on how GeneSpring was used for analyzation. Upon first reading the paper, it seemed legitimate, however now the lack of detail and inaccessibility to data seems fishy.
Angela C Abarquez (talk) 10:39, 21 February 2019 (PST)
Sahil Patel Reflection
What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
- The main issues that were identified was the shift of one on all of the data and the lack of labels across the data making it impossible to interpret. The inability to reproduce the data violates the DataONE expected practice. Dr. Baggerly claimed that these could be common errors because data keeping can be a meticulous task and the fact that everything was off by one is a mistake that is difficult to catch when documenting it in the first place.
What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
- Dr. Baggerly recommends that in order to have reproducible research it must include publicly accessible raw data that is clean, easy to interpret and with a clear methodology and procedure. These correspond with the guidelines and recommendations set by DataONE.
Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
- I was surprised to see that cases like these are able to be published and that there are definitely other cases done where the data is unable to be reproduced. Dr. Baggerly's talk allowed me to see what is required when publishing research in terms of organization and thoroughness.
Go back to the methods section of the paper you presented for journal club. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
- No, I do not believe that there is sufficient information there to reproduce their data analysis, mainly because the set of raw data that was linked on the website was unavailable due to either a loss of website domain or simply to mask the inaccuracies found in the data.
Sahil Patel (talk) 19:49, 27 February 2019 (PST)