Wikiomics:Cloning in silico

From OpenWetWare
Revision as of 08:03, 12 July 2007 by Darked (talk) (+drafts)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Cloning in silico proceduro of obtaining full or partial cDNA sequence of a gene by using computer only.

There are several variants:

  • discovery of new splice forms of a known gene
  • cloning a novel orthologue gene in new species
  • cloning a new gene(s) using ESTs database alone (ESTs clustering)

Procedures for cloning new genes from ESTs (Expressed Sequence Tags).

Getting ESTs sequences/traces for gene assembly 2.1 getting sequences from multiple ESTs

Sometimes you have no other option but working with plain EST sequences with no traces. To get them easily from the blast output we use file containing Acc. Nos. for Batch Entrez and process the output with manualy

  • open nedit using nc command
  • mark hits in the blast output window (firefox etc.)
  • copy it to the editor
gb|AA449543|AA449543 zx08a09.r1 Soares total fetus Nb2HF8 9w Ho... 194

gb|AA007668|AA007668 zh99g06.r1 Soares fetal liver spleen 1NFLS... 100

gb|W88626|W88626 zh73b12.r1 Soares fetal liver spleen 1NFLS S1 ... 101

gb|AA465253|AA465253 aa33a08.r1 NCI\_CGAP\_GCB1 Homo sapiens cDNA... 62
  • replace pipe symbol "|" with spaces:

gb AA449543 AA449543 zx08a09.r1 Soares total fetus Nb2HF8 9w

gb AA007668 AA007668 zh99g06.r1 Soares fetal liver spleen

gb W88626 W88626 zh73b12.r1 Soares fetal liver spleen 1NFLS
  • copy column containing Acc. Nos. to the final file: AA449543



  • Start firefox go to Batch Entrez page: [1]

Retrieve all sequences from file of Gis/Accessions using Format: Fasta. You have to select: Browse -> final_file.txt

  • save result file as ESTs.current_date.fasta
  • assembly sequences using phrap:
phrap ESTs.current_date.fasta

importing human, mouse and zebrafish EST trace files

For a significant subset of human, mouse and zebrafish ESTs there are available trace and even experiment files. For a sane gene cloning we need them because:

  • sequences in GeneBank are usually shorter than original trace files
  • there is no way you can detect a sequencing error in plain text/fasta file without looking at trace file
  • you can get sequence from the other end of the clone (also possible with some ESTs for which we may get trace files have a naming convention: o human ESTs: start with "a", "y" or "z" (like aa09h01.r1, ye12c01.s1, ze34c06.r1) o mouse ESTs start with "m" , "u" or "v". You can get 5'-ends only. o zebrafish ESTs start with "f"

In order to get them one can search for relevant trace files using Sanger's Trace server: