User:Lindenb/Notebook/UMR915/20101115

From OpenWetWare
Revision as of 13:40, 15 November 2010 by Lindenb (talk | contribs) (→‎Received update from IG)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

20101112        Top        20101116       


Working for cedric, wrote a small tool to print the annotation of a swissprot entry:

Z3H7B_HUMAN http://www.uniprot.org/uniprot/Q9UGR2.xml

============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain
...................................========================= [36-69] TPR 1 repeat
.....................................==..................... [38-39] In Ref. 5; BAA82983. sequence conflict
MERQKRKADIEKGLQFIQSTLPLKQEEYEAFLLKLVQNLFAEGNDLFREKDYKQALVQYM 1-60

============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain
=========................................................... [36-69] TPR 1 repeat
.....................==================================..... [82-115] TPR 2 repeat
.......................................................===== [116-149] TPR 3 repeat
...........................=................................ [88] In Ref. 4; AAF05541. sequence conflict
EGLNVADYAASDQVALPRELLCKLHVNRAACYFTMGLYEKALEDSEKALGLDSESIRALF 61-120

============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain
=============================............................... [116-149] TPR 3 repeat
RKARALNELGRHKEAYECSSRCSLALPHDESVTQLGQELAQKLGLRVRKAYKRPQELETF 121-180

============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain
....................................................=....... [233] Phosphoserine modified residue
............................================................ [209-224] In isoform 2. splice variant
...................................................=........ [232] In Ref. 4; AAF05541. sequence conflict
SLLSNGTAAGVADQGTSNGLGSIDDIETGNVPDTREQVEIGAPRDCYVDPRGSPALLPST 181-240

============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain
.......................=========............................ [264-272] LD motif; interaction with NSP3 short sequence motif
................===......................................... [257-259] Almost no effect on NSP3 binding. mutagenesis site
...........................===.............................. [268-270] Complete loss of NSP3 binding. mutagenesis site
......=..................................................... [247] In Ref. 3; AAI52559. sequence conflict
PTMPLFPHVLDLLAPLDSSRTLPSTDSLDDFSDGDVFGPELDTLLDSLSLVQGGLSGSGV 241-300

============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain
PSELPQLIPVFPGGTPLLPPVVGGSIPVSSPLPPASFGLVMDPSKKLAASVLDALDPPGP 301-360

============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain
......................................................=..... [415] Phosphoserine modified residue
..................=......................................... [379] In dbSNP:rs9607793. sequence variant
.........................=.................................. [386] In Ref. 4; AAF05541. sequence conflict
.............................................=.............. [406] In Ref. 3; AAI52559. sequence conflict
TLDPLDLLPYSETRLDALDSFGSTRGSLDKPDSFMEETNSQDHRPPSGAQKPAPSPEPCM 361-420

============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain
PNTALLIKNPLAATHEFKQACQLCYPKTGPRAGDYTYREGLEHKCKRDILLGRLRSSEDQ 421-480

============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain
...................=========================................ [500-524] C3H1-type 1 zinc finger region
TWKRIRPRPTKTSFVGSYYLCKDMINKQDCKYGDNCTFAYHQEEIDVWTEERKGTLNRDL 481-540

============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain
LFDPLGGVKRGSLTIAKLLKEHQGIFTFLCEICFDSKPRIISKGTKDSPSVCSNLAAKHS 541-600

============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain
...............................=======================...... [632-654] C3H1-type 2 zinc finger region
FYNNKCLVHIVRSTSLKYSKIRQFQEHFQFDVCRHEVRYGCLREDSCHFAHSFIELKVWL 601-660

============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain
...=........................................................ [664] Phosphotyrosine modified residue
...............=............................................ [676] In Ref. 4; AAF05541. sequence conflict
LQQYSGMTHEDIVQESKKYWQQMEAHAGKASSSMGAPRTHGPSTFDLQMKFVCGQCWRNG 661-720

============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain
.................................................=========== [770-798] C3H1-type 3 zinc finger region
QVVEPDKDLKYCSAKARHCWTKERRVLLVMSKAKRKWVSVRPLPSIRNFPQQYDLCIHAQ 721-780

============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain
==================.......................................... [770-798] C3H1-type 3 zinc finger region
NGRKCQYVGNCSFAHSPEERDMWTFMKENKILDMQQTYDMWLKKHNPGKPGEGTPISSRE 781-840

============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain
.................=========================.................. [858-882] C2H2-type zinc finger region
GEKQIQMPTDYADIMMGYHCWLCGKNSNSKKQWQQHIQSEKHKEKVFTSDSDASGWAFRF 841-900

============================================================ [1-993] Zinc finger CCCH domain-containing protein 7B chain
.=============================.............................. [902-930] C3H1-type 4 zinc finger region
.....................===================================.... [922-956] null coiled-coil region
.................=.......................................... [918] In Ref. 3; AAI52559. sequence conflict
PMGEFRLCDRLQKGKACPDGDKCRCAHGQEELNEWLDRREVLKQKLAKARKDMLLCPRDD 901-960

================================= [1-993] Zinc finger CCCH domain-containing protein 7B chain
.....................========.... [982-989] Poly-Ala compositionally biased region
..........=...................... [971] In Ref. 1; BAG37501 and 3; AAI52559. sequence conflict
DFGKYNFLLQEDGDLAGATPEAPAAAATATTGE 961-993

Source code

import java.net.URL;
import java.util.List;

import javax.xml.bind.JAXBContext;




import uniprot.Entry;
import uniprot.FeatureType;
import uniprot.LocationType;
import uniprot.PositionType;
import uniprot.Uniprot;


// xjc -p "uniprot"  "http://www.uniprot.org/support/docs/uniprot.xsd"
public class UniprotAscii
	{
	
	private UniprotAscii() throws Exception
		{
		}
	private void run(String id)  throws Exception
		{
		final int line_length=60;
		JAXBContext jc = JAXBContext.newInstance("uniprot");
		String uri="http://www.uniprot.org/uniprot/"+id+".xml";
		Uniprot uniprot=(Uniprot)jc.createUnmarshaller().unmarshal(new URL(uri));
		for(Entry entry:uniprot.getEntry())
			{
			System.out.println(entry.getName().get(0));
			System.out.println(uri);
			String sequence= entry.getSequence().getValue().replaceAll("[ \n\t\r]", "");
			List<FeatureType> features=entry.getFeature();
			int start=0;
			while(start< sequence.length())
				{
				int end=Math.min(start+line_length, sequence.length());
				for(FeatureType feat:features)
					{
					int x0=0;
					int x1=0;
					LocationType t=feat.getLocation();
					if(t==null) continue;
					PositionType begT=t.getBegin();
					PositionType endT=t.getEnd();
					PositionType posT=t.getPosition();
					String range=null;
					if(begT!=null && endT!=null)
						{
						x0=begT.getPosition().intValue();
						x1=endT.getPosition().intValue()+1;
						range="["+begT.getPosition()+"-"+endT.getPosition()+"]";
						}
					else if(posT!=null)
						{
						x0=posT.getPosition().intValue();
						x1=x0+1;
						range="["+posT.getPosition()+"]";
						}
					else
						{
						System.err.println("BOUM");
						continue;
						}
					
					if(x0>=end) continue;
					if(x1<start) continue;
					int x=start;
					x0--;
					x1--;
					x0=Math.max(start, x0);
					x1=Math.min(end,x1);
					while(x<x0) {x++;System.out.print(".");}
					while(x<x1) {x++;System.out.print("=");}
					while(x<end) {x++;System.out.print(".");}
					System.out.println(" "+range+" " + feat.getDescription()+" "+feat.getType());
					}
				System.out.println(sequence.subSequence(start, end)+" "+(start+1)+"-"+end);
				System.out.println();
				start+=line_length;
				}
			}
		
		
		//System.err.println("OK");
		}
	public static void main(String[] args)
		{
		try
		{
		UniprotAscii app=new UniprotAscii();
		int optind=0;
		while(optind<args.length)
			{
			if(args[optind].equals("-h"))
				{
				return;
				}
			else if(args[optind].equals("-L"))
				{
				//app.readLength=Integer.parseInt(args[++optind]);
				}
			else if(args[optind].equals("--"))
				{
				optind++;
				break;
				}
			else if(args[optind].startsWith("-"))
				{
				System.err.println("Unnown option: "+args[optind]);
				return;
				}
			else
				{
				break;
				}
			++optind;
			}
		
		if(optind==args.length)
			{
			return;
			}
		else
			{
			while(optind< args.length)
				{
				String inputName=args[optind++];
				
				app.run(inputName);
				}
			}
		}catch(Throwable err)
		{
			err.printStackTrace();
		}
	}
	}

Received update from IG

for the indels: new vs old headers

     1	Position.Build36	     1	Position.Build36
     2	chrom	     2	chrom
     3	Depth	     3	Depth
     4	CIGAR	     4	CIGAR
     5	ref_upstream	     5	ref_upstream
     6	ref.indel	     6	ref.indel
     7	ref_downstream	     7	ref_downstream
     8	Q.indel.	     8	Q.indel.
     9	max_gtype	     9	max_gtype
    10	Q.max_gtype.	    10	Q.max_gtype.
    11	max2_gtype	    11	max2_gtype
    12	bp1_reads	    12	bp1_reads
    13	ref_reads	    13	ref_reads
    14	indel_reads	    14	indel_reads
    15	other_reads	    15	other_reads
    16	repeat_unit	    16	repeat_unit
    17	ref_repeat_count	    17	ref_repeat_count
    18	indel_repeat_count	    18	indel_repeat_count
    19	Gene.name	    19	Gene.name
    20	Gene.start	    20	Gene.start
    21	Gene.end	    21	Gene.end
    22	Strand	    22	Strand
    23	Nbr.exon	    23	Nbr.exon
    24	refseq	    24	refseq
    25	UCSC.ID	    25	type
    26	type	    26	type.pos
    27	type.pos	    27	Intron.start
    28	Intron.start	    28	Intron.end
    29	Intron.end	    29	region.splice
    30	region.splice