CANB610:RNA sequencing

From OpenWetWare
Jump to: navigation, search

Discussion Points:

How should these results be interpreted? RNA infidelity, paralogy/pseudogenes, poor data, or other?
The biggest criticism of the paper is that most of their purported RNA-DNA differences (RDD) occur in genes with known paralogs. Instead of finding cases where the RNA sequence differs from the DNA, they are finding RNA transcripts that belong to paralogous genes.

What are the implications of these interpretations?
It is very important to understand the strengths and weaknesses of your bioinformatic pipeline. If they would have chose a different method, the problem of paralogy could have been avoided.

What further data is needed to have confidence in paper's conclusions?
Ideal situation is an inbred mouse line that has its genetics completely understood (issues of paralogy become moot). If in this mouse line this same phenomenon of widespread RDD is found, then a strong piece of evidence exists.

Most subsequent work have been critical of study's findings, does this data have any champions?
The study's authors are pretty much their only champions, but have been less than convincing in their defense.

Does this information call into question RNAseq data? That is, if transcripts are being measured that correspond to alternative transcript forms/paralogous genes, are measured transcript levels too high?
The bioinformatic method this paper used, aligning sequences to the GENCODE database, would be prone to overestimate expression levels by having reads from paralogous genes map to the same location. Most groups align to the genome, instead of a list of exons like GENCODE, this allows for the filtering out of non-unique reads. If a read aligns to multiple regions of the genome due to paralogy, it would not be used, this minimizes expression overestimation.