Schumer lab: Where can I find commonly used files
Most genome assemblies, recombination maps, simulation tools, annotation files, etc can be found in our shared_resources folder on Sherlock (also available in Box):
/home/groups/schumer/shared_bin/shared_resources
The most important resources are outlined below
Where can I find up-to-date files for local ancestry inference
The most up-to-date files for running local ancestry inference can be found here:
/home/groups/schumer/shared_bin/shared_resources/Base_ancestry_run_files_August2020
This folder is set up so that you can copy all of its contents to the directory on slack where you are running local ancestry inference. For example:
cp /home/groups/schumer/shared_bin/shared_resources/Base_ancestry_run_files_August2020/* ./
Which genome assembly and annotation file should I use?
- For X. birchmanni the current assembly is:
/home/groups/schumer/shared_bin/shared_resources/xiphophorus_birchmanni_10x_12Sep2018_yDAA6.fasta
and the annotation files are:
/home/groups/schumer/shared_bin/shared_resources/Xbirchmanni-10x_12Sep2018_yDAA6-annotation-output
- For X. malinche the current assembly is:
/home/groups/schumer/shared_bin/shared_resources/xiphophorus_malinche_10x_14Nov2018_HyZ96.fasta
and the annotation files are:
/home/groups/schumer/shared_bin/shared_resources/xiphophorus_malinche_14Nov2018_HyZ96_annotation_output
- For X. maculatus the current assembly is:
/home/groups/schumer/shared_bin/shared_resources/xma_washu_4.4.2-jhp_0.1_combined-unplaced-mito.fa
and the annotation files are:
/home/groups/schumer/shared_bin/shared_resources/Xiphophorus_maculatus_LG.Xipmac4.4.2.81.gtf
Where can I find i5/i7 barcode files?
Tn5 libraries
For Tn5 libraries, i5 barcode sequences can be found in box under this path:
Schumer_lab_resources/Extractions_and_library_preps/Tn5_i5_indices.xls
For Tn5 libraries, i7 barcode sequences can be found in box under this path:
Schumer_lab_resources/Extractions_and_library_preps/Tn5_i7_indices.xls
Quail libraries
Quail libraries only have an i7 index, which can be found here:
Schumer_lab_resources/Extractions_and_library_preps/Quail_FC1_index_sequence.xls
RNAseq libraries
i5 and i7 indices for kapa RNAseq libraries can be found here:
Schumer_lab_resources/Extractions_and_library_preps/IDX Unique Dual Indexes_forkapa_RNAseq.xls
Which recombination map should I use?
- The current LD recombination map for X. birchmanni can be found here:
/home/groups/schumer/shared_bin/shared_resources/Xbirchmanni_LD_recombination_map_10xgenome_March2019
- The current crossover map for F2s can be found here:
/home/groups/schumer/shared_bin/shared_resources/Xbirchmanni_Xmalinche_F2_map_March2019
- The current crossover map for all artificial hybrids can be found here:
/home/groups/schumer/shared_bin/shared_resources/Xbirchmanni_Xmalinche_all_artificial_hybrids_map_March2019
Where can I find previously run ancestry.tsv files?
Results of previous AncestryHMM runs can be found here:
/oak/stanford/groups/schumer/data/Ancestry_tsv_results_files
Make sure to put your matching configuration files here too!
Where can I find the X. birchmanni genome split into windows?
Bed formatted files of the X. birchmanni reference genome split into windows and with recombination rate, number of coding and conserved basepairs, and number of synonymous and non-synonymous basepairs:
/home/groups/schumer/shared_bin/shared_resources/xbir_genome_windowed/
Window sizes are: 5kb, 10kb, 50kb, 100kb, 250kb, 500kb, 1Mb, 0.05cM, 0.1cM, 0.25cM, 0.5cM, 1cM
For basepair windowed data there are three columns: scaffold, start, end
Files annotated with extra information are *recRate_codingFeats_wConservedBPs_wSynNonsyn.bed
With eight columns: scaffold, start, end, mean recombination rate, number of coding bps, number of conserved bps, number of synonymous bps, and number of non-synonymous bps
For Centimorgan windowed data there are five columns: scaffold, start, end, mean recombination rate in that window, number of SNPs
Files annotated with extra information are *codingFeats_wConservedBPs_wSynNonsyn.bed
With nine columns: scaffold, start, end, number of SNPs, mean recombination rate, number of coding bps, number of conserved bps, number of synonymous bps, and number of non-synonymous bps