Schumer lab: Download data to Sherlock from UW (PacBio)
Download data with Globus
1) UW will send an email containing a link to access the folder. click on the link to open globus.
2) Change the destination endpoint to SRCC OAK, then change the path to /oak/stanford/groups/schumer/data/PacBio_data/
3) Make a new folder with a naming convention similar to "UW_pacbio_date", and navigate in
4) Transfer all files/directories included in UW globus endpoint to this directory
5) The important files are *hifi_reads.bc*.bam and *hifi_reads.bc*.bam.pbi
Delete unnecessary files
1) delete fail_reads/ directory and unassigned.bam files. first navigate to the parent directory which contains the following directories: hifi_reads fail_reads metadata pb_formats statistics
cd /path/to/parent/directory/
rm -r fail_reads/
rm hifi_reads/*.hifi_reads.unassigned.bam*
Rename files
The files we get back from UW are named using a unique barcode that they assign, but is meaningless to us. We want to rename all of these files based on the sample names we provided them. The information connecting barcodes to sample names can be found in the SequencingReport.pdf that they send (e.g. 240724_UW_LongRead_SequencingReport_Baczenas.J(Stanford).pdf).
2) copy the renaming script to the directory with reads
cd /path/to/directory/named/hifi_reads/
cp /home/groups/schumer/shared_bin/Lab_shared_scripts/rename_PacBio_UW_reads.sh .
3) make a file called sample.key which contains the UW barcodes and the sample name, separated by "@". Double and triple check that this file is correct!
$ cat sample.key
bc2008@xvar-JUCH-S265-6-V-24-M01
bc2014@xvar-JUCH-S178-6-V-24-M01
bc2015@xvar-JUCH-S250-6-V-24-M01
bc2016@xvar-JUCH-S250-6-V-24-M02
3) run the renaming. it should take ~3 min per sample
sh rename_PacBio_UW_reads.sh
4) make sure there were no errors/the job wasn't killed. also check that the md5sums are the same, just to make sure everything worked.
5) you are all done!