Schumer lab: Backing up raw data
Back up your raw data on external hard drives or the SRA
It is very important to regularly backup your raw data!
There are two major options for backing up your raw fastq data or large processed data files:
1) External hard drives
Large hard drives for individual use can be purchased through amazon on quartzy.
One of the lab jobs (currently managed by Gabe and JJ!) is to do regular backups of OAK every 6 months. This is a big job so doesn't always happen every 6 months -- ask Gabe and Quinn if you need to access something from a previous backup.
2) Back up your data on the SRA
It can be a bit of a pain to get data uploaded properly to the SRA, but after that it will be available to you forever so it is worth the trouble!
- Note: you can set the release date on the SRA to several years in advance to protect your data until publication if it is sensitive
Workflow of backing up $OAK data
This should happen every six months and should encompass any new raw data and syncing all processed and collaboration data.
Optional: Check folder sizes and which folders need to be backed up
You can check which folders on OAK have been changed by comparing the folder sizes from the previous back up
and by getting the current disk usage
#!/bin/bash #SBATCH --job-name=oakSize #SBATCH --time=12:00:00 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem=16000 #SBATCH -p schumer,hns,owners now=$(date +"%m_%d_%Y") du --exclude="lab_member_folders" -ach /oak/stanford/groups/schumer/data > schumerOakDataSizes_"$now".txt
You can then check the current folder size and compare it to the folder size in the excel sheet. Example:
grep “Processed_files” schumerOakDataSizes_09_05_2023.txt | tail
With this you can determine which folders have had data added to them and need to be synced on the backup hard drives.
Get the external hard drives set up
The external hard drives we use for OAK backups currently live next to the tapestation computer, which can be used during the backup process. Once plugged in and turned on you can check the remaining room on each hard drive (using Disk Management via Computer Management) and which OAK/data folder is on which drive (all folders have a one-to-one copy on the backup drives).
We use Globus to back up OAK data. So, the “Globus Connect Personal” app must be running on the tapestation computer, and you must have access to the “tape_station_backup” Globus endpoint. More information about using Globus with Sherlock can be found here (including a quick link to the OAK endpoint): https://www.sherlock.stanford.edu/docs/storage/data-transfer/#globus
Syncing folders with new data
Now you can use the Globus browser interface to sync folders with new data.
1) Select the OAK/data folder you wish to sync.
2) Navigate to the tapestation drive ("Tapestation backup") that contains that folder.
3) Name your transfer something meaningful to you and select “sync” and “Skip files on source with errors”.
4) Then you can start the sync.
5) You can check the progress of the sync under the “Activity” button.
6) Where you can see the number of files transferred or skipped on sync. Or check if there are errors.
These syncs usually take a day or two and you will get an email confirmation when they finish.