Searching GenBank for nucleotide repeats
Keith Robinson
keith at bones.biochem.ualberta.ca
Tue Mar 1 16:52:54 EST 1994
A staff member here wants to search all of the rodent cDNA sequences in
GenBank to get an idea of the frequency of occurence of:
- Single base repeats (e.g. GGGGGG) of lengths 6 to 20 inclusive
for all 4 possible combinations
- double-base repeats (e.g. GAGAGAGAGAGA) of length 6 to 20, for
all 16 possible combinations
- triple-base repeats (e.g. GATGATGATGATGATGAT) of length 6 to 20,
for all 64 possible combinations
We use GCG here, and it is possible to perform this search with GCG's
"findpatterns" command (e.g. searching for G repeats can be done with
the pattern ~GG{6,20}~G), but this is time consuming (human and computer),
and processing the resulting output file is rather tedious. Before
attempting to write our own program, does anyone know of any software
which would make setting up and interpreting results of these searches
easier?
Keith
--
Keith Robinson Dept. of Biochemistry
The University of Alberta Edmonton, Alberta Canada
........................................................
"The information highway is like teenagers and sex -
all talk, but no action." overheard
More information about the Bio-soft
mailing list