BIOL388/S19:Class Journal Week 5

From OpenWetWare
Jump to navigationJump to search

Ava's Reflection

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    • At first Baggerly and Coombs saw a lot of 'common' errors which could have been accidental, such as the gene lists being off by one due to the presence of a header, swapped values, mismatched labels, etc. A major red-flag also appeared when unaccounted genes were discussed as the reason that the study worked or was significant which is not a common error. The best practices that DataOne described that were violated most likely involve not labeling columns properly or not uniquely naming columns to distinguish between others. Maybe some of the text included special characters that created errors when transferring or files were not descriptively named which allowed the common mistakes to be carelessly made.
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • Dr. Baggerly recommends complete documentation of steps and design, as well as avoiding making simple mistakes like mixing gene or sample labels. DataOne also describes ways to avoid making these common errors. Storing data in consistent formats so that it can be easily comprehended in the future by yourself as well as others is really important and corresponds with Dr. Baggerly describing the importance of data availability and accessibility.
  3. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • I agree with Dr. Baggerly in how disturbing it is that not only was fraudulent data published but that clinical trials were started when sensitivity/ resistant labels were reversed and the findings clearly misleading.
  4. Go back to the methods section of the paper you presented for journal club. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
    • No, I do not. The dataset although was offered at one time is no longer able to be found by clicking on the link provided. This is a big issue expressed by Dr. Baggerly and also I feel that the methods are not descriptive enough for someone to reproduce the study. The data analysis performed and statistical analysis is not discussed in any detail whatsoever. Additionally, there are only two or three trials conducted for many tests which does not seem like enough data to make conclusions from and thus it would not be surprising if other studies found differing results as the data obtained was already so limited.

Avalekander (talk) 19:43, 18 February 2019 (PST)

Leanne's Reflection

  1. Dr. Baggerly states that there were three main issues with the Duke study--an off-by-one indexing error, incorrect/switched sensitive/resistant labels, and the inclusion of genes that are unaccounted for in the gene list, which also happen to be the genes the Duke group claims to explain why this signature works. Of these issues, two are common, and simple, errors--the off-by-one indexing error and incorrect labels--that can be easily caught and resolved with adequate documentation. Using the information from DataONE, the Duke study violated the practice of being consistent with their data, the use of reproducible workflows, and having overall poor documentation of the study.
  2. Dr. Baggerly suggests that data and quatification tables be labeled, that code should be provided, and that descriptions of nonscriptable steps and planned design be reported. He states that these are recommendations for papers, but absolutely required for clinical trials. DataONE also suggests consistent labeling and transparency in experimental design. Overall, documentation, documentation, DOCUMENTATION!!!
  3. I do not have any further reaction to the case after watching Dr. Baggerly's talk, but I was curious as to how the Duke study was a) published, and b) allowed to begin clinical trials without someone reviewing the data thoroughly beforehand. Specifically, I find it hard to believe that the simple, common errors were not caught since they should be easy to catch. Perhap's Dr. Potti's documentation was simply just that poor.
  4. When it comes to reproducing the actual experiment, it does seem that there is enough information to reproduce the results. However, sufficient detail is not reported to reproduce to data collection and analysis. The normalization of the data was explained decently well, though I am not sure if asked to do this I would be able to. The part that lacks in explanation the most is their use of GeneSpring for heirarchial clustering. They simply just state in the last sentence of the data analysis section that they would use this software to cluster their data. Overall, more information would be required to reproduce their data analysis. This would probably be good to include as supplementary information.

Leanne Kuwahara (talk) 22:32, 17 February 2019 (PST)


Desiree's Reflection

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    • The main issues that were identified by Baggerly and Coombs when they were reviewing the other scientist's data were the mistakes of mislabeling the data, or having the data be off due to the presence of a title on the column or row. The DataONE best practices that were most likely violated to result in these issues were not properly labeling the data columns and rows, leaving gaps in the spreadsheets, or adding special characters that could lead to an accidental action within one of the analysis softwares used.
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • Dr. Baggerely recommends that in order to have reproducible research one must make the data publicly available, must include directions/methodology in a clear and precise manner, and must also make sure to keep formatting consistent. These correspond to what DataONE recommends by emphasizing the point that formatting of data (especially in spreadsheets like Excel) should be consistent in order to minimize the errors when using the data in various types of software. DataOne emphasizes the use of explaining things in a clear and efficient manner in order to make the data easy to understand.
  3. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • When first hearing about the way that Dr. Potti changed his data I was really disappointed that a scientist would have the audacity to mess with the data collected just to gain fame. I was really angry since Potti's data was the result of clinical trials on actual human beings. After viewing Dr. Baggerly's talk, I was even more frustrated at Dr. Potti since Dr. Baggerely kept repeating the phrase "who knows what they did to get those results." The fact that a biostatistician was unable to figure out how the data was manipulated means that Dr. Potti really did know that he was making changes to the data; Dr. Potti was working hard to cover up the fact that he was editing the data.
  4. Go back to the methods section of the paper you presented for journal club. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
    • No. I do not believe that there was enough information to reproduce their data analysis. Especially since the section of the paper that I presented for journal club had a figure that was referring to work done in another experiment by other scientists. In order to sufficiently be able to reproduce the data, clear direction on how the experiment was run in addition to the data from the experiment being done should have been added to the paper and should have been made accessible to the public for the chance to allow it to be reproduced.

Desireegonzalez (talk) 23:19, 20 February 2019 (PST)

Brianna N. Samuels' Reflection

  1. The main issues with the data and analysis identified by Baggerly and Coombs were that the index error was of by one, the labels were incorrect and switched around, and genes that were unaccounted for were included in the list of genes. This violated the DataONE standards because the data was unable to be reproduced, it lacked consistency, and there was a lack of sufficient documentation. The common errors happened to be the incorrect and switched around labels as well as the off index error.
  2. The recommendation that Dr. Baggerly recommends for reproducible research in a clinical trial or paper is to label all data and tables, add descriptions of any steps not recorded in the plan design, and that a code should be provided. DataONE agrees with the labels and emphasizes documentation of everything especially the experimental design.
  3. My reaction as pretty much consistent except like Leanne Kuwahara I also was confused with how this was able to be published if all these recommendations for reproducible research and regulations are set in stone. Science experiments are all about reproducing the experiments so I feel that the common errors (if they're as common as they say) should have been easily detected because you go in expecting errors in these categories.
  4. I pretty much have mixed feelings about this. I believe that they did a sufficient job of including enough information to reproduce the experiment but I can see the lack of detail being an issue. Usually when you read a scientific paper you want anyone to be able to understand everything you are doing. The lack of detail can become an issue for someone who isn't familiar with the software and other things used .

Briannansamuels (talk) 13:44, 20 February 2019 (PST)

Edward's Reflection

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    • They found the heatmap signature's index were off by one due to the software that they were using (likely by copying and pasting from excel with a title header). They were able to get the same resulting heatmaps for 6 of the 7 drugs, but only able to match the gene list for only 3. They also mixed up the sensitive vs resistant labels (0 or 1), so the predictions were opposite of the actual results. In the gene list, there were 15 duplicate genes and 6 were inconsistent (labeled both ways). The studies that Baggerly & Coombs looked at could not be reproduced due to the lack of methods provided by the study, as well as simple mistakes, including mixing up sample and gene labels. These violated the DataONE practices by having inconsistent data, reversed labeling, and nonreproducable methods.
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • He suggests the norm should include the raw data, labeled columns, the code, and description of nonscriptable steps. The similarity to DataONE include proper and consistent labeling, as well as descriptive steps to for experimental design.
  3. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • Thinking back at the first Deception at Duke video, my negative views of the doctor were somewhat misplaced. I thought that he was just intentionally manipulating the data to show significance in order to become famous for finding a treatment for cancer. I now have to consider that the doctor was simply incompetent in data analysis by mixing up all of these labels and committing several common mistakes.
  4. Go back to the methods section of the paper you presented for journal club. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
    • The link to the data was broken, but I still don't think it would be possible to determine similar data. They did not really delve into how they compared data in GeneSpring besides "matrix of standard correlation." I would have to see the data sets and know how to use GeneSpring to give a better answer.

EdwardRyanTalatala (talk) 22:35, 20 February 2019 (PST)

Alison King's Reflection

  1. What were the main issues with the data and analysis identified by Baggerly and Coombs? What best practices enumerated by DataONE were violated? Which of these did Dr. Baggerly claim were common issues?
    • The main issues they found were that every finding was off by one, and when they compared their datasets to the ones from Duke, they were finding differences. The datasets matched for only 3 of the genes, but the heatmaps matched for 6 of them, which does not add up. They were finding opposite results of those from a previous study that they took their data from (i.e. switched the meaning of resistant vs. sensitive). Their list of genes was separated into groups of the genes that worked to split the test set, the genes that worked to split the training set, and the genes that explained why it worked, and there was NO overlap between these three. This seemed suspicious to Baggerly and Coombs. Also, the labels on the graphs did not match, and they reused samples multiple times in their graphs but labeled them differently. They violated the DataONE practices of keeping clean, consistent columns of data and keeping a reproducible workflow. Dr. Baggerly claimed that the findings being off by one and the switching of labels could be considered common errors because of errors in data keeping.
  2. What recommendations does Dr. Baggerly recommend for reproducible research? How do these correspond to what DataONE recommends?
    • Dr. Baggerly recommends labeling columns and providing code to make research more reproducible. It is important to make sure everything is documented in a way that is easy to follow. This corresponds to DataONE's recommendations of keeping consistent spreadsheets of data that are clearly labeled with descriptive column names.
  3. Do you have any further reaction to this case after viewing Dr. Baggerly's talk?
    • My reaction to this case remains the same. I am still shocked that somebody would change the data to intentionally deceive and create fake results.
  4. Go back to the methods section of the paper you presented for journal club. Do you think there is sufficient information there to reproduce their data analysis? Why or why not?
    • No, I do not believe that there it sufficient information that would allow is to reproduce their data analysis. They simple say that they performed statistical analysis with GeneSpring to get their results, with no mention of the type of analysis or how they were able to get to their results within GeneSpring.

Alison S King (talk) 16:45, 20 February 2019 (PST)

Fatimah Alghanem's Reflection

  1. Main main issues with the data and analysis identified by Baggerly and Coomb was that their data was off by one index for each gene. Also, only 6 out of the 7 genes matched the ones from Duke. The best practices enumerated by DataONE have violated was the incorrect labeling of rows and columns in the data spreadsheet and not having consistent formate. Dr. Baggerly claimed that the off by one index and the mislabeling were common issues.
  2. The recommendations that Dr. Baggerly gives for reproducible research is having consistent data and clear labeling.
  3. My reaction to this case is first that I am glad someone critiqued them and pointed out the disturbance of their data. Also, the manipulation of data upsets me because I could see how it could lead to bigger issues like having medication that is not effective or harmful based on research like this.
  4. I don't think there is sufficient information that would allow me to reproduce their data analysis. I also find it a little confusing because there's not many details.

Falghane (talk) 14:47, 7 May 2019 (PDT)