Filter the RefSeq List
- After much trial and error I was finally able to provide a "cleaned up" RefSeq list
- The problem with RefSeq is that a gene can have multiple mRNA IDs assigned to the same genomic interval. All of those extra mRNA IDs are transcribed to the same gene, so I needed to get rid of them.
- A macros called DeleteDuplicateRows in Excel VBA was executed on the list.
- The macros asks excel to read through the selected column (column 2, or the start column in the case) and delete the entire row (chromosome, start, end, mRNA ID, and strand) if it recognizes that a value is not unique
- DeleteDuplicateRows Code
- The end results filters the original list to nearly half its size.
- If you would like a copy of this list please contact firstname.lastname@example.org