User:Lindenb/Notebook/UMR915/20101115
From OpenWetWare

Working for cedric, wrote a small tool to print the annotation of a swissprot entry:
Z3H7B_HUMAN http://www.uniprot.org/uniprot/Q9UGR2.xml
============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain ...................................========================= [36-69] TPR 1 repeat .....................................==..................... [38-39] In Ref. 5; BAA82983. sequence conflict MERQKRKADIEKGLQFIQSTLPLKQEEYEAFLLKLVQNLFAEGNDLFREKDYKQALVQYM 1-60 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain =========................................................... [36-69] TPR 1 repeat .....................==================================..... [82-115] TPR 2 repeat .......................................................===== [116-149] TPR 3 repeat ...........................=................................ [88] In Ref. 4; AAF05541. sequence conflict EGLNVADYAASDQVALPRELLCKLHVNRAACYFTMGLYEKALEDSEKALGLDSESIRALF 61-120 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain =============================............................... [116-149] TPR 3 repeat RKARALNELGRHKEAYECSSRCSLALPHDESVTQLGQELAQKLGLRVRKAYKRPQELETF 121-180 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain ....................................................=....... [233] Phosphoserine modified residue ............................================................ [209-224] In isoform 2. splice variant ...................................................=........ [232] In Ref. 4; AAF05541. sequence conflict SLLSNGTAAGVADQGTSNGLGSIDDIETGNVPDTREQVEIGAPRDCYVDPRGSPALLPST 181-240 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain .......................=========............................ [264-272] LD motif; interaction with NSP3 short sequence motif ................===......................................... [257-259] Almost no effect on NSP3 binding. mutagenesis site ...........................===.............................. [268-270] Complete loss of NSP3 binding. mutagenesis site ......=..................................................... [247] In Ref. 3; AAI52559. sequence conflict PTMPLFPHVLDLLAPLDSSRTLPSTDSLDDFSDGDVFGPELDTLLDSLSLVQGGLSGSGV 241-300 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain PSELPQLIPVFPGGTPLLPPVVGGSIPVSSPLPPASFGLVMDPSKKLAASVLDALDPPGP 301-360 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain ......................................................=..... [415] Phosphoserine modified residue ..................=......................................... [379] In dbSNP:rs9607793. sequence variant .........................=.................................. [386] In Ref. 4; AAF05541. sequence conflict .............................................=.............. [406] In Ref. 3; AAI52559. sequence conflict TLDPLDLLPYSETRLDALDSFGSTRGSLDKPDSFMEETNSQDHRPPSGAQKPAPSPEPCM 361-420 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain PNTALLIKNPLAATHEFKQACQLCYPKTGPRAGDYTYREGLEHKCKRDILLGRLRSSEDQ 421-480 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain ...................=========================................ [500-524] C3H1-type 1 zinc finger region TWKRIRPRPTKTSFVGSYYLCKDMINKQDCKYGDNCTFAYHQEEIDVWTEERKGTLNRDL 481-540 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain LFDPLGGVKRGSLTIAKLLKEHQGIFTFLCEICFDSKPRIISKGTKDSPSVCSNLAAKHS 541-600 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain ...............................=======================...... [632-654] C3H1-type 2 zinc finger region FYNNKCLVHIVRSTSLKYSKIRQFQEHFQFDVCRHEVRYGCLREDSCHFAHSFIELKVWL 601-660 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain ...=........................................................ [664] Phosphotyrosine modified residue ...............=............................................ [676] In Ref. 4; AAF05541. sequence conflict LQQYSGMTHEDIVQESKKYWQQMEAHAGKASSSMGAPRTHGPSTFDLQMKFVCGQCWRNG 661-720 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain .................................................=========== [770-798] C3H1-type 3 zinc finger region QVVEPDKDLKYCSAKARHCWTKERRVLLVMSKAKRKWVSVRPLPSIRNFPQQYDLCIHAQ 721-780 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain ==================.......................................... [770-798] C3H1-type 3 zinc finger region NGRKCQYVGNCSFAHSPEERDMWTFMKENKILDMQQTYDMWLKKHNPGKPGEGTPISSRE 781-840 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain .................=========================.................. [858-882] C2H2-type zinc finger region GEKQIQMPTDYADIMMGYHCWLCGKNSNSKKQWQQHIQSEKHKEKVFTSDSDASGWAFRF 841-900 ============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain .=============================.............................. [902-930] C3H1-type 4 zinc finger region .....................===================================.... [922-956] null coiled-coil region .................=.......................................... [918] In Ref. 3; AAI52559. sequence conflict PMGEFRLCDRLQKGKACPDGDKCRCAHGQEELNEWLDRREVLKQKLAKARKDMLLCPRDD 901-960 ================================= [1-993] Zinc finger CCCH domain-containing protein 7B chain .....................========.... [982-989] Poly-Ala compositionally biased region ..........=...................... [971] In Ref. 1; BAG37501 and 3; AAI52559. sequence conflict DFGKYNFLLQEDGDLAGATPEAPAAAATATTGE 961-993
Source code
import java.net.URL;
import java.util.List;
import javax.xml.bind.JAXBContext;
import uniprot.Entry;
import uniprot.FeatureType;
import uniprot.LocationType;
import uniprot.PositionType;
import uniprot.Uniprot;
// xjc -p "uniprot" "http://www.uniprot.org/support/docs/uniprot.xsd"
public class UniprotAscii
{
private UniprotAscii() throws Exception
{
}
private void run(String id) throws Exception
{
final int line_length=60;
JAXBContext jc = JAXBContext.newInstance("uniprot");
String uri="http://www.uniprot.org/uniprot/"+id+".xml";
Uniprot uniprot=(Uniprot)jc.createUnmarshaller().unmarshal(new URL(uri));
for(Entry entry:uniprot.getEntry())
{
System.out.println(entry.getName().get(0));
System.out.println(uri);
String sequence= entry.getSequence().getValue().replaceAll("[ \n\t\r]", "");
List<FeatureType> features=entry.getFeature();
int start=0;
while(start< sequence.length())
{
int end=Math.min(start+line_length, sequence.length());
for(FeatureType feat:features)
{
int x0=0;
int x1=0;
LocationType t=feat.getLocation();
if(t==null) continue;
PositionType begT=t.getBegin();
PositionType endT=t.getEnd();
PositionType posT=t.getPosition();
String range=null;
if(begT!=null && endT!=null)
{
x0=begT.getPosition().intValue();
x1=endT.getPosition().intValue()+1;
range="["+begT.getPosition()+"-"+endT.getPosition()+"]";
}
else if(posT!=null)
{
x0=posT.getPosition().intValue();
x1=x0+1;
range="["+posT.getPosition()+"]";
}
else
{
System.err.println("BOUM");
continue;
}
if(x0>=end) continue;
if(x1<start) continue;
int x=start;
x0--;
x1--;
x0=Math.max(start, x0);
x1=Math.min(end,x1);
while(x<x0) {x++;System.out.print(".");}
while(x<x1) {x++;System.out.print("=");}
while(x<end) {x++;System.out.print(".");}
System.out.println(" "+range+" " + feat.getDescription()+" "+feat.getType());
}
System.out.println(sequence.subSequence(start, end)+" "+(start+1)+"-"+end);
System.out.println();
start+=line_length;
}
}
//System.err.println("OK");
}
public static void main(String[] args)
{
try
{
UniprotAscii app=new UniprotAscii();
int optind=0;
while(optind<args.length)
{
if(args[optind].equals("-h"))
{
return;
}
else if(args[optind].equals("-L"))
{
//app.readLength=Integer.parseInt(args[++optind]);
}
else if(args[optind].equals("--"))
{
optind++;
break;
}
else if(args[optind].startsWith("-"))
{
System.err.println("Unnown option: "+args[optind]);
return;
}
else
{
break;
}
++optind;
}
if(optind==args.length)
{
return;
}
else
{
while(optind< args.length)
{
String inputName=args[optind++];
app.run(inputName);
}
}
}catch(Throwable err)
{
err.printStackTrace();
}
}
}
Received update from IG
for the indels: new vs old headers
1 Position.Build36 1 Position.Build36
2 chrom 2 chrom
3 Depth 3 Depth
4 CIGAR 4 CIGAR
5 ref_upstream 5 ref_upstream
6 ref.indel 6 ref.indel
7 ref_downstream 7 ref_downstream
8 Q.indel. 8 Q.indel.
9 max_gtype 9 max_gtype
10 Q.max_gtype. 10 Q.max_gtype.
11 max2_gtype 11 max2_gtype
12 bp1_reads 12 bp1_reads
13 ref_reads 13 ref_reads
14 indel_reads 14 indel_reads
15 other_reads 15 other_reads
16 repeat_unit 16 repeat_unit
17 ref_repeat_count 17 ref_repeat_count
18 indel_repeat_count 18 indel_repeat_count
19 Gene.name 19 Gene.name
20 Gene.start 20 Gene.start
21 Gene.end 21 Gene.end
22 Strand 22 Strand
23 Nbr.exon 23 Nbr.exon
24 refseq 24 refseq
25 UCSC.ID 25 type
26 type 26 type.pos
27 type.pos 27 Intron.start
28 Intron.start 28 Intron.end
29 Intron.end 29 region.splice
30 region.splice