User:Lindenb/Notebook/UMR915/20101115
From OpenWetWare
Working for cedric, wrote a small tool to print the annotation of a swissprot entry:
Z3H7B_HUMAN http://www.uniprot.org/uniprot/Q9UGR2.xml
============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain ...................................========================= [36-69] TPR 1 repeat .....................................==..................... [38-39] In Ref. 5; BAA82983. sequence conflict MERQKRKADIEKGLQFIQSTLPLKQEEYEAFLLKLVQNLFAEGNDLFREKDYKQALVQYM 1-60 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain =========................................................... [36-69] TPR 1 repeat .....................==================================..... [82-115] TPR 2 repeat .......................................................===== [116-149] TPR 3 repeat ...........................=................................ [88] In Ref. 4; AAF05541. sequence conflict EGLNVADYAASDQVALPRELLCKLHVNRAACYFTMGLYEKALEDSEKALGLDSESIRALF 61-120 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain =============================............................... [116-149] TPR 3 repeat RKARALNELGRHKEAYECSSRCSLALPHDESVTQLGQELAQKLGLRVRKAYKRPQELETF 121-180 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain ....................................................=....... [233] Phosphoserine modified residue ............................================................ [209-224] In isoform 2. splice variant ...................................................=........ [232] In Ref. 4; AAF05541. sequence conflict SLLSNGTAAGVADQGTSNGLGSIDDIETGNVPDTREQVEIGAPRDCYVDPRGSPALLPST 181-240 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain .......................=========............................ [264-272] LD motif; interaction with NSP3 short sequence motif ................===......................................... [257-259] Almost no effect on NSP3 binding. mutagenesis site ...........................===.............................. [268-270] Complete loss of NSP3 binding. mutagenesis site ......=..................................................... [247] In Ref. 3; AAI52559. sequence conflict PTMPLFPHVLDLLAPLDSSRTLPSTDSLDDFSDGDVFGPELDTLLDSLSLVQGGLSGSGV 241-300 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain PSELPQLIPVFPGGTPLLPPVVGGSIPVSSPLPPASFGLVMDPSKKLAASVLDALDPPGP 301-360 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain ......................................................=..... [415] Phosphoserine modified residue ..................=......................................... [379] In dbSNP:rs9607793. sequence variant .........................=.................................. [386] In Ref. 4; AAF05541. sequence conflict .............................................=.............. [406] In Ref. 3; AAI52559. sequence conflict TLDPLDLLPYSETRLDALDSFGSTRGSLDKPDSFMEETNSQDHRPPSGAQKPAPSPEPCM 361-420 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain PNTALLIKNPLAATHEFKQACQLCYPKTGPRAGDYTYREGLEHKCKRDILLGRLRSSEDQ 421-480 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain ...................=========================................ [500-524] C3H1-type 1 zinc finger region TWKRIRPRPTKTSFVGSYYLCKDMINKQDCKYGDNCTFAYHQEEIDVWTEERKGTLNRDL 481-540 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain LFDPLGGVKRGSLTIAKLLKEHQGIFTFLCEICFDSKPRIISKGTKDSPSVCSNLAAKHS 541-600 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain ...............................=======================...... [632-654] C3H1-type 2 zinc finger region FYNNKCLVHIVRSTSLKYSKIRQFQEHFQFDVCRHEVRYGCLREDSCHFAHSFIELKVWL 601-660 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain ...=........................................................ [664] Phosphotyrosine modified residue ...............=............................................ [676] In Ref. 4; AAF05541. sequence conflict LQQYSGMTHEDIVQESKKYWQQMEAHAGKASSSMGAPRTHGPSTFDLQMKFVCGQCWRNG 661-720 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain .................................................=========== [770-798] C3H1-type 3 zinc finger region QVVEPDKDLKYCSAKARHCWTKERRVLLVMSKAKRKWVSVRPLPSIRNFPQQYDLCIHAQ 721-780 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain ==================.......................................... [770-798] C3H1-type 3 zinc finger region NGRKCQYVGNCSFAHSPEERDMWTFMKENKILDMQQTYDMWLKKHNPGKPGEGTPISSRE 781-840 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain .................=========================.................. [858-882] C2H2-type zinc finger region GEKQIQMPTDYADIMMGYHCWLCGKNSNSKKQWQQHIQSEKHKEKVFTSDSDASGWAFRF 841-900 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain .=============================.............................. [902-930] C3H1-type 4 zinc finger region .....................===================================.... [922-956] null coiled-coil region .................=.......................................... [918] In Ref. 3; AAI52559. sequence conflict PMGEFRLCDRLQKGKACPDGDKCRCAHGQEELNEWLDRREVLKQKLAKARKDMLLCPRDD 901-960 ================================= [1-993] Zinc finger CCCH domain-containing protein 7B chain .....................========.... [982-989] Poly-Ala compositionally biased region ..........=...................... [971] In Ref. 1; BAG37501 and 3; AAI52559. sequence conflict DFGKYNFLLQEDGDLAGATPEAPAAAATATTGE 961-993
Source code
import java.net.URL; import java.util.List; import javax.xml.bind.JAXBContext; import uniprot.Entry; import uniprot.FeatureType; import uniprot.LocationType; import uniprot.PositionType; import uniprot.Uniprot; // xjc -p "uniprot" "http://www.uniprot.org/support/docs/uniprot.xsd" public class UniprotAscii { private UniprotAscii() throws Exception { } private void run(String id) throws Exception { final int line_length=60; JAXBContext jc = JAXBContext.newInstance("uniprot"); String uri="http://www.uniprot.org/uniprot/"+id+".xml"; Uniprot uniprot=(Uniprot)jc.createUnmarshaller().unmarshal(new URL(uri)); for(Entry entry:uniprot.getEntry()) { System.out.println(entry.getName().get(0)); System.out.println(uri); String sequence= entry.getSequence().getValue().replaceAll("[ \n\t\r]", ""); List<FeatureType> features=entry.getFeature(); int start=0; while(start< sequence.length()) { int end=Math.min(start+line_length, sequence.length()); for(FeatureType feat:features) { int x0=0; int x1=0; LocationType t=feat.getLocation(); if(t==null) continue; PositionType begT=t.getBegin(); PositionType endT=t.getEnd(); PositionType posT=t.getPosition(); String range=null; if(begT!=null && endT!=null) { x0=begT.getPosition().intValue(); x1=endT.getPosition().intValue()+1; range="["+begT.getPosition()+"-"+endT.getPosition()+"]"; } else if(posT!=null) { x0=posT.getPosition().intValue(); x1=x0+1; range="["+posT.getPosition()+"]"; } else { System.err.println("BOUM"); continue; } if(x0>=end) continue; if(x1<start) continue; int x=start; x0--; x1--; x0=Math.max(start, x0); x1=Math.min(end,x1); while(x<x0) {x++;System.out.print(".");} while(x<x1) {x++;System.out.print("=");} while(x<end) {x++;System.out.print(".");} System.out.println(" "+range+" " + feat.getDescription()+" "+feat.getType()); } System.out.println(sequence.subSequence(start, end)+" "+(start+1)+"-"+end); System.out.println(); start+=line_length; } } //System.err.println("OK"); } public static void main(String[] args) { try { UniprotAscii app=new UniprotAscii(); int optind=0; while(optind<args.length) { if(args[optind].equals("-h")) { return; } else if(args[optind].equals("-L")) { //app.readLength=Integer.parseInt(args[++optind]); } else if(args[optind].equals("--")) { optind++; break; } else if(args[optind].startsWith("-")) { System.err.println("Unnown option: "+args[optind]); return; } else { break; } ++optind; } if(optind==args.length) { return; } else { while(optind< args.length) { String inputName=args[optind++]; app.run(inputName); } } }catch(Throwable err) { err.printStackTrace(); } } }
Received update from IG
for the indels: new vs old headers
1 Position.Build36 1 Position.Build36 2 chrom 2 chrom 3 Depth 3 Depth 4 CIGAR 4 CIGAR 5 ref_upstream 5 ref_upstream 6 ref.indel 6 ref.indel 7 ref_downstream 7 ref_downstream 8 Q.indel. 8 Q.indel. 9 max_gtype 9 max_gtype 10 Q.max_gtype. 10 Q.max_gtype. 11 max2_gtype 11 max2_gtype 12 bp1_reads 12 bp1_reads 13 ref_reads 13 ref_reads 14 indel_reads 14 indel_reads 15 other_reads 15 other_reads 16 repeat_unit 16 repeat_unit 17 ref_repeat_count 17 ref_repeat_count 18 indel_repeat_count 18 indel_repeat_count 19 Gene.name 19 Gene.name 20 Gene.start 20 Gene.start 21 Gene.end 21 Gene.end 22 Strand 22 Strand 23 Nbr.exon 23 Nbr.exon 24 refseq 24 refseq 25 UCSC.ID 25 type 26 type 26 type.pos 27 type.pos 27 Intron.start 28 Intron.start 28 Intron.end 29 Intron.end 29 region.splice 30 region.splice