Database of repetitive sequences?

CLARK at SALK-SC2.SDSC.EDU CLARK at SALK-SC2.SDSC.EDU
Mon Mar 23 22:13:38 EST 1992


Bruce Roe suggests:

/1. Run Strings:
/
/        Search for the keyword "repeat" and search the GenEMBL
/        database with the output set to GENEMBL.STRINGS
/
/2. Run DataSet
/
/        To create the GCG data library from the set of sequences
/        in GCG format obtained as output from STRINGS
/
/        Assemble DATASET from what sequence(s) ?  @genembl.strings
/
/        What should I call the data library ?  repeats
/
/3. Sit back and watch all the work get done for you.

Bruce,

	This is easier and more difficult than you indicate. Easier
because you don't need to create a dataset (GCG's implementation of fasta
will accept "@genembl.strings" [or even *.seq] as the name of the database
to search). More difficult because it is *NEVER* a good idea to pull a
bunch of sequences out of GenBank and assume you have what you want (the
descriptions of them just aren't very good). Unless it's something I don't
care about (in which case I don't do it anyway :-) ) I always look at each
sequence to make sure it is what I want. Also, in this particular instance,
I don't need a zillion examples of Alu repeats. Furthermore, by looking at
the sequences you get a good idea of which repeats weren't found by
Strings*earch because your keyword repeat wasn't present. At the very least
I would look for REPETITIVE as well. 


Steve Clark

clark at salk-sc2.sdsc.edu  (Internet)
clark at salk               (Bitnet)




More information about the Bio-soft mailing list