User:Timothee Flutre/Notebook/Postdoc/2012/02/01: Difference between revisions
From OpenWetWare
m (→Find SNPs in cis of genes: fix code display) |
m (fix code display + minor changes) |
||
Line 12: | Line 12: | ||
wget -O Ensembl_hg19_UCSC_20111019.txt.gz ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/ensGene.txt.gz | wget -O Ensembl_hg19_UCSC_20111019.txt.gz ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/ensGene.txt.gz | ||
* convert transcripts | * convert transcripts to BED format, and then gather coordinates at the gene level (TSS and TES): | ||
zcat Ensembl_hg19_UCSC_20111019.txt.gz | awk '{print $3"\t"$5"\t"$6"\t"$13"|"$2}' | gzip > Ensembl_transcripts.bed.gz | zcat Ensembl_hg19_UCSC_20111019.txt.gz | awk '{print $3"\t"$5"\t"$6"\t"$13"|"$2}' | gzip > Ensembl_transcripts.bed.gz | ||
transcripts2genes.py Ensembl_hg19_UCSC_20111019.txt.gz Ensembl_genes.bed.gz | transcripts2genes.py Ensembl_hg19_UCSC_20111019.txt.gz Ensembl_genes.bed.gz | ||
* identify SNPs in cis of each gene (500kb in 5' of TSS and 3' of TES): | * identify SNPs in cis of each gene (500kb in 5' of TSS and 3' of TES) assuming the SNP coordinates are taken from a file in the IMPUTE format: | ||
for i in {1..22}; do echo "chr"${i}"..."; awk -v i=${i} -F" " '{print "chr"i"\t"$3-1"\t"$3"\t"$2}' /path/to/chr${i}.impute | \ | |||
for i in {1..22}; do echo "chr"${i}"..."; awk -v i=${i} -F" " '{print "chr"i"\t"$3-1"\t"$3"\t"$2}' /path/to/chr${i}.impute \ | windowBed -w 500000 -a Ensembl_genes.bed.gz -b stdin | \ | ||
awk '{print $4"\t"$9"|"$8}' | \ | |||
gzip > chr${i}_genes_cisSNPs.txt.gz; done | |||
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### --> | <!-- ##### DO NOT edit below this line unless you know what you are doing. ##### --> |
Revision as of 17:52, 1 February 2012
Project name | <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page <html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html> </html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html> |
Find SNPs in cis of genes
wget -O Ensembl_hg19_UCSC_20111019.txt.gz ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/ensGene.txt.gz
zcat Ensembl_hg19_UCSC_20111019.txt.gz | awk '{print $3"\t"$5"\t"$6"\t"$13"|"$2}' | gzip > Ensembl_transcripts.bed.gz transcripts2genes.py Ensembl_hg19_UCSC_20111019.txt.gz Ensembl_genes.bed.gz
for i in {1..22}; do echo "chr"${i}"..."; awk -v i=${i} -F" " '{print "chr"i"\t"$3-1"\t"$3"\t"$2}' /path/to/chr${i}.impute | \ windowBed -w 500000 -a Ensembl_genes.bed.gz -b stdin | \ awk '{print $4"\t"$9"|"$8}' | \ gzip > chr${i}_genes_cisSNPs.txt.gz; done |