Schumer lab: Depositing data on Oak
Introduction
Oak is a large storage resource where we can keep large raw data files and large processed data files. You can link to files on Oak from your local directory on Sherlock.
For example, if you wanted to link to this file on Oak, you could navigate to your local directory and type:
ln -s /oak/stanford/groups/schumer/data/High_coverage_whole_genome_quail_data/Xcortezi_whole_genome_data_sc_project_HUIC_sc_and_wt/HUICXI17JM06wt_read_1_allcombined.fastq.gz ./
Examples of file types that should be deposited on Oak:
- fastq.gz files
- associated metadata files for the fastq.gz files
- sam/bam files
- Ancestry tsv files from completed ancestryHMM runs. Make sure to use and informative name and copy a matching cfg file.
Ancestry tsv files deposited on Oak can be found here:
/oak/stanford/groups/schumer/data/Processed_files/Ancestry_tsv_results_files
sam/bam files and vcf files will be in different subfolders of Processed_files depending on the reference they were mapped to. For example:
/oak/stanford/groups/schumer/data/Processed_files/bams_mapped_to_xbir-pacbio-v2018
Current organization system on OAK
Oak is currently organized by data type and raw and processed data folders.
We are in the process of documenting all the data available on OAK here: https://docs.google.com/spreadsheets/d/1dUBAmaz2bNa-LppWxFZ7Vymld-jxrDbh-HC_WFjXSzA/edit#gid=89601434
Raw data:
All_swordtail_low_coverage_Tn5_data
ATACseq_data
ChipSeq_data
collaboration_data
High_coverage_whole_genome_quail_data
MyBaits_data
Pacbio_data
RNAseq_data
Processed data:
Processed_files
lab_member_folders
Processed_files are resources to be used by anyone in the lab, lab_member_folders are personal backups
Moving data to or from our lab directory on Oak
Important!!! Make sure to create a new data folder for each deposited dataset and name your raw data folder informatively
For example:
Xbirchmanni_10Xchromium_Hudsonalpha_July2018_raw_data
This name gives the species, the technology used for library prep, the company that did the sequencing, and the date of sequencing.
- Note: swordtail Tn5 data is stored in the following subdirectory:
/oak/stanford/groups/schumer/data/All_swordtail_low_coverage_Tn5_data
There are two ways to move data to our lab directory on Oak:
1) Using scp:
To move files to Sherlock (replacing with your user name):
scp myfile user@login.sherlock.stanford.edu:/oak/stanford/groups/schumer/data/mydirectory
To move files from Sherlock to a local directory:
scp user@login.sherlock.stanford.edu:/oak/stanford/groups/schumer/data/mydirectory/myfile ./
2) Using globus
Sign up for a globus account
Oak is linked to globus so you can navigate and download files through their interface.
Important: Documenting data you have uploaded
If you are the one downloading new data to Oak, place it in the appropriate directory with the full library name, data type, and month and year sequenced. For example:
Xpygmaeus_VCHO_AGCZ_COCA_whole_genome_sequences_Admera_health_June2020
or
Xcor-SMAR-IV-06_Xcor-SMAR-fromMM_Xcor-OCTZ-VI-22_Xvar-JUCH-II-22_Xcor-CHPL-V-17_CPXT-22-V-21_STACHW102-XII-03_PMHS-XII-03_Tn5_Admera-Nov2022
Important! Please update the google spreadsheet so that others can easily find the data
https://docs.google.com/spreadsheets/d/1dUBAmaz2bNa-LppWxFZ7Vymld-jxrDbh-HC_WFjXSzA/edit?usp=sharing
If you're depositing fastq files make sure to deposit the appropriate metadata files in the same directory!
For example, if your fastq files are named:
COACVI2018_CHAFV2018_ACUAV2018_ACUAVI2015_Tn5_S0_I1_alllanes_combined.fastq.gz
COACVI2018_CHAFV2018_ACUAV2018_ACUAVI2015_Tn5_S0_I2_alllanes_combined.fastq.gz
COACVI2018_CHAFV2018_ACUAV2018_ACUAVI2015_Tn5_S0_R1_alllanes_combined.fastq.gz
COACVI2018_CHAFV2018_ACUAV2018_ACUAVI2015_Tn5_S0_R2_alllanes_combined.fastq.gz
and are the result of two plates of Tn5 prep, put a single i5 file and two i7 files with informative names in the same directory. e.g.:
i5_library_COACVI2018_CHAFV2018_ACUAV2018_ACUAVI2015
20180626_Tn5_library_COAC_VI_2018_i7_barcodes.txt
20180718_Tn5_library_ACUA_VI_2015_CHAF_XI_2017_COAC_XI_2017_i7_barcodes.txt