keith at thale.life.nottingham.ac.uk
Tue Sep 26 07:55:16 EST 2000
> It is a bit harsh to say that Keith's letter should have been removed
> by the moderator. I believe moderation of this news group should be
> to keep junk mail and off topic messages out of the group, not to
> remove misunderstanding, which would be a daunting duty.
Maybe I should elaborate on why I asked the question. In addition to
reading in many 1000's of EMBL sequences into my ACEDB database, we also
have started to receive many 'insert sequences' from groups, which have
not yet been published in GenBank or EMBL. These (usually short)
sequences reflect the sequencing of genomic DNA adjacent to
(deliberately engineered) transposon insertions.
We recently received a lot of these sequences and checked them (by eye)
and they looked ok (i.e. just A,T,C,G, and N's) but we later found that
some of them contained X's. I guess this is because some researchers who
are not familiar with the IUPAC code just use 'X' to represent any base.
I believe as well (but am not certain) that some (old) sequencing machines
can return an 'X' in their output??? We only found this out when we
noticed that AceBrowser was treating them strangely.
So whilst 'X' may not be a standart IUPAC code, it is the case that some
researchers will use 'X'...maybe acedb could run a simple check when
sequences are loaded and not accept sequences with any non-standard
code...or at least inform the curator when something non-standard is read
> 'X' is not an IUPAC
> nucleotide ambiguity code (I think it is the protein ambiguity code).
And I've also seen it used to represent a stop codon in a peptide sequence
(along with asterisks).
~ Keith Bradnam - Developer, Arabidopsis Genome Resource (AGR)
~ Nottingham Arabidopsis Stock Centre - http://nasc.nott.ac.uk/
~ University Park, University of Nottingham, NG7 2RD, UK
~ Tel: (0115) 951 3091
More information about the Acedb