OpenWetWare:Software/Online Database Access/GenBank Accession Numbers

From OpenWetWare
Jump to navigationJump to search

GenBank Accession Numbers


The unique identifier for a sequence record. An accession number applies to the complete record and is usually a combination of a letter(s) and numbers, such as a single letter followed by five digits (e.g., U12345) or two letters followed by six digits (e.g., AF123456). Some accessions might be longer, depending on the type of sequence record.

Accession numbers do not change, even if information in the record is changed at the author's request. Sometimes, however, an original accession number might become secondary to a newer accession number, if the authors make a new submission that combines previous sequences, or if for some reason a new submission supercedes an earlier record.

Records from the RefSeq database of reference sequences have a different accession number format that begins with two letters followed by an underscore bar and six or more digits, for example:

  • NT_123456 constructed genomic contigs
  • NM_123456 mRNAs
  • NP_123456 proteins
  • NC_123456 chromosomes

(a complete list of RefSeq accession number prefixes is available at NCBI).

Note: compare accession number with Sequence Identifiers such as Version and GI for nucleotide sequences and protein_id and GI for amino acid sequences.

Entrez Search Field: Accession [ACCN] Search Tip: The letters in the accession number can be written in upper- or lowercase. RefSeq accessions must contain an underscore bar between the letters and the numbers, e.g., NM_002111.