alcat at cyberspace.org (Alberto Catalano) wrote:
>Am I right in assuming that STS's developed in other centres are not
>referred to by AFM numbers?
>
Yes! Others I know about include WI for Whitehead Institute markers, CHLC
(e.g. CHLC.ATA22D12) from the Cooperative Human Linkage Center, STSG for STSs
fron the Sanger Centre EST mapping project, etc.
>If not then what is the standard for naming STS sequences or primer pairs?
>I originally thought that D#S### type nomenclature (e.g. D12S1057) was the
>standard, but the databases are saturated with AFM type names that don't
>seem to have a corresponding D number.
You are partly right to think that the D number system is the standard.
Originally the scheme was to assign all the anonymous DNA markers these D
numbers, and this is now done through the central auspices of GDB. However the
recent acceleration in the number of markers generated and access via WWW/ftp
sites to marker lists means that many markers are getting widely distributed
before D numbers are assigned. Eventually most of the marker do end up with at
least two names, their lab name and the D number of the "locus" they detect.
Furthermore there is some question about whether the system will be maintained
- see http://info.gdb.org/showschema/nar/nar1995.html for (lengthy) discussion.
>If there isn't a standard, or a (group of) committee(s) that controls the
>nomenclature, how can we be sure that someone doesn't duplicate a name for
>an STS.
To my knowledge this hasn't happened very often, if at all. It'd be wise to
name your STS with a unique system until you get D numbers, but at the end of
the day the oligos are the unique fingerprint of all (single copy) STSs.
There is a real problem, but it generally occurs in a slightly different way.
When another lab picks up an STS oligo pair they have a (very natural) tendency
to rename it to suit their lab system. If their data then goes out on to
WWW/ftp sites without the original name and the D number if it exists,
confusion reigns. You also get problems when the oligos for a particular
sequence have been redesigned, so that they, say, one oligo is the same and the
other is shifted 5 or 10 nucleotides. I have seen this happen in several ways
over the past two years, and is only to be expected with the volumes of data
that are being released. Here's how we deal with it in our local analyses and
in data releases ....
1. We always store all names that we know about in our ACEDB databases. There
are tags available for most of the permutations. We also store the EMBL
accession numbers for the sequence from which an oligo pair was derived, and
other sequences which contain exact matches to the oligo pair. We store the
origin of the STS using the tags "Location" and "Originator". We name the STS
by the name used by its originators e.g. AFM268yg1 instead of D22S1170.
2. We try to detect all cases where oligo pairs are the same or when one oligo
of the pair is the same between STSs that have been obtained from different
sources, and then resolve the namings by hand. For STSs/oligo pairs that are
different but have exact matches to the same sequence, we record this
information and can display it graphically when looking at the sequence.
3. When an STS is newly designed here from an EMBL sequence or our
own sequence we assign a name of the type stSG-1234 and the sequence accession
number is stored as above. As the STS accumulates other names such as
a GDB D number, we record that information.
4. On releasing data we try to document all the known names.
The key really is to use the DNA sequences (if they are available) as your
cross-reference. A simple check of all the oligos against each other is often
sufficient.
Cheers
Ian
--------------------------------------------------------------------------
\
// Ian Dunham Tel: 01223 494948
\ Sanger Centre FAX: 01223 494919
// Hinxton Hall
\ Hinxton
// Cambridge Email: id1 at sanger.ac.uk
\ CB10 1RQ idunham at hgmp.mrc.ac.uk
// www: http://www.sanger.ac.uk/~id1
==========================================================================