SEPTEMBER 12, 2013
I hope everyone had a great summer. A couple new things to tell you about with the start of the academic year. First, we have two new bioinformatics scientists on board. Dr. Duan Ma comes to us from the Dept of Biostatistics at Washington University as well as several years in industry where she worked on a broad variety of bioinformatics problems as well as cloud computing solutions. Dr. Jie Wu did his PhD in computational biology Cold Spring Harbor where he focused on developing algorithms for RNAseq. Both Jie and Duan are located in 68-317, so please stop by and say hello.
One new product to tell you about is new MiSeq v3 kits which just came out. These kits increase read counts to 25m from the current 15m maximum. The kits only come in 150 and 600 nt sizes and are somewhat more expensive than the current reagents. The changes to the MiSeq do not help v2 kits and they cannot reach the 25m threshold. We will be offering both the v2 and v3 kits for a short while, but our expectation is the v2 kits will be discontinued by Illumina in the not so distant future. New pricing for the v3 kits is up on our BioMicroCenter:Pricing pricing site.
Also on the sequencing front, we have some evidence that the CG bias issues we were seeing on the HiSeq may not have been totally resolved. We are in the middle of implementing some significant changes in our control lane chemistry in collaboration with Illumina. These will at least allow us to monitor the situation very aggressively and detect any changes in GC bias by using a more complex mixture of samples, instead of only phiX. As I mentioned in our April newsletter, if you have reason to believe your data was compromised by this issue, please come talk to us so we can help get replacement kits to get your samples rerun as soon as possible. The new control lane is in early beta but is being included on all HiSeq flowcells now. We do want to thank the Laub, Niles and Walker labs for DNA to help us in this effort.
Finally, the Technology Seminar Series will return this year. We begin next Wednesday with Wafergen talking about the Smartchip system. The seminar will be at noon in 68-180 and lunch will be provided. We’re hoping to have seminars every month throughout the year so be sure to look for them on our website.
JUNE 17, 2013
First, we have a number of BioMicroCenter:People staffing changes to tell you about. The end of June will see three members of the core moving on. Ryan Abo, one of our informatics scientists, will be heading to the Dana Farber and Paraj Patel and Pierrick Millet will be returning to Northeastern. Our new co-op students, Ashley Machado and Alexander Soltoff will be starting July 1st. We are currently undertaking a job search to look for Ryan’s replacement. If you have any concerns about the personnel changes, please feel free to contact me.
In a piece of good news, the root cause of the critical failures we have had with our MiSeq for the last month plus with homopolymer samples appear to have been identified and should be fixed today. A recent software upgrade that was supposed to improve the handling of homopolymeric samples apparently failed to install properly, resulting in a mix of pipeline versions that was unable to handle the sequences at all. I do want to thank everyone for their patience as we have struggled with this problem and assure everyone we will move through our backlog on the MiSeq as fast as we possibly can. I also want to thank the techs in the lab, especially Scott Morin, who have been working weekends to try to get as many samples through the MiSeq as possible.
Finally, with the end of the fiscal year, our annual price adjustments are due to take effect on July 1st. You can find a complete list of the new prices here . The largest change is a reduction in cost for HiSeq sequencing, especially for longer reads. This is associated with a significant decrease in the amount of time we will be holding data on our servers and with the recent switch from fastq + SAM file formats to retaining only BAM files. (see January 2013 notes).
APRIL 20, 2013
We have noticed a number of technical issues with some Illumina runs. We want to share with you to make sure you are aware of some changes and newly identified technical issues with the platform and what we are doing to correct them where we can. All of these changes are from the Illumina side and none were especially well documented (some not at all). These issues are unlikely to be limited to the BMC, so samples from elsewhere on campus or around the country may also have these issues. Please read this as it may have some impact on your analyses.
Just to begin, all of these changes are subtle and not obvious in most cases directly from the sequencers. It was the rare cases that had dramatic effects that caused us to notice them. If you decide you need to have samples rerun, we will work with you to try to get Illumina to replace the reagents and to get the samples rerun. Unfortunately, there is no way we can possibly do bulk reruns of several months’ worth of studies.
The most concerning issue is a dropout of GC rich regions in clustering. This has been an on-again off-again issue with Illumina that we have addressed over a year ago by improvements in amplification cycling conditions and enzyme selection. Some time, several months ago (we do not have a precise window), Illumina appears to have changed the chemistry of one of their clustering components and that caused a major change in performance on GC rich areas. This can be seen as an absence of reads from very GC rich areas but, because these areas are rare in most genomes, they cannot be seen on the flowcell wide metrics. This issue is found on current HiSeq and MiSeqV2 kits but not on MiSeqV1 kits nor, we suspect, on the GAII. We have been able to address this problem by adding a brief boiling step during NaOH denaturation of the samples and have implemented this as SOP starting about two weeks ago. This drop out of regions can cause significant issues for several studies – most notably ChIP analyses – when you are comparing data from different chemistries.
A second concern is one that has been reported in the community but we have not identified on our machines – yet – where samples from a run are being observed in the following run as minor contaminants. This issue is limited to the MiSeq and HiSeq2500 (we do not have the latter) where the tubes that add sample to the flowcell are not changed. This contamination is reported to be <1% and so would not show up on our quality metrics. However, if your MiSeq analyses are being based on finding a few reads in a large pool of discarded data or you are doing a number of sequential runs, you may wish to validate your data more carefully using an alternative technique such as qPCR or sanger sequencing. There is currently no technical fix to this problem.
A third issue has been around for a while though we had not appreciated the implications. Illumina’s newer versions of basecalling software have become less capable of handling uniform sequence (all A’s for example). In earlier versions, only 5 basepairs of variability were needed and intensities could be determined by the control lane we run on all HiSeq flowcells. Now, it appears that nt 1-25 all must have representation of all 4 bases at all positions, even with a control lane. This has always been an issue on the MISeq and we have solved it by spiking in 30%PhiX in the lane (as opposed to our normal 0.1% spike in). Similar solutions can be used on the HiSeq. Given this change, we are re-evaluating whether there is value in using the 8th lane as a control. The latest version of MiSeq software (only a couple days old) supposedly allows us to lower the fraction to 5%, but how successful this is remains to be seen. Base rearrangement with the GAII allows the GAII to avoid this issue.
Finally, it appears that custom priming on the MiSeq is not the same as custom priming on the HiSeq and GAII. It can still be done, but the Tm requirement is much higher. Primers that work on the HiSeq may fail on the MiSeq. As long as your Tm matches or exceeds the Tm used for Illumina primers, the MiSeq should work, but the MiSeq’s different chemistry (formamide instead of heat denaturation) is less forgiving.
In summary, we have a number of technical challenges that may (or may not) effect you and we want to make sure you have all the information we can give you. I want to thank the researchers and labs that have been very patient as we have struggled running their samples which led us to identify these problems. If you believe these issues have effected your data, please do not hesitate to contact me and we can discuss how to move forward.
MARCH 11, 2013
Quick update from BioMicro:
The Wafergen qPCR system is now operational. We have done a couple pilot experiments so far and it does seem to work, if there are a few more limitations than we anticipated. We are working with Wafergen to see how many of these can be alleviated but you are more than welcome to try it out and see if it would be useful to you. They have given us quite competitive pricing that is a lot lower than the cost for the Fluidigm BioMark . Please email us if you are interested in training.
JANUARY 9, 2013
Happy new years to everyone. A couple new things happening in BioMicro that we want to make everyone aware of.
First, this month begins a year long experiment in joining the BioMicro Center Informatics team and the KI Bioinformatics and Computing Core in to a single team. Our two teams have been collaborating for several years, sharing computational infrastructure, etc. but this year we will be formalizing and expanding the relationship with the goal of creating a more efficient unified core. Informatics analysis requests should still be sent to Charlie Whittaker or to myself as usual, but will be spread across the joint team based on expertise and on availability. You are also, as always, welcome to contact any of the informatics scientists directly. We hope this will allow us to reduce waiting times and to keep costs under control.
During the trial period (and hopefully going forward), pricing for informatics will be available in two flavors. First, for projects needing routine work, the subsidized rate will be $70/h for all CORE members (Biology, BE, KI, CEHS). For more involved projects, we have second option to purchase a “share” of the informatics team. This is an annual commitment for a fraction of an informaticist and will cost $960/mo for an average of 4h/week of informatics support. The monthly usage levels do not have to be exact and can be used in large blocks. The hours in the share can be used with any member of the team and the informaticist can vary from project to project.
Finally, and importantly, we will be changing the way we are storing Illumina sequencing data long term. In the past, we have saved the fastq, sam and bam files, along with the quality control data, in a zipped file. These zipped files now occupy over 50TB of storage which is limiting how we are able to handle new sequencing runs. To address this, we will be deleting the fastq and sam files from the archive and storing only the binary bam and quality control files. The fastq and sam files can be regenerated rapidly from the bam files using Picard and SamTools (though reads may not be in the same order). As always, we strongly encourage you to keep your own copy of the Illumina data and use our version only as a backup. We will begin this conversion next week. If you have any concerns, please do not hesitate to contact me.