? tool to remove redundancy from sequence set
arlin at is.dal.ca
Wed Jul 3 19:13:37 EST 1996
Geoff Barton wrote:
> Steven Brenner wrote:
> > There are two subtle issues which are involved here:
> As Steve points out, this is not a straightforward problem to solve.
[subtle and complicated issues deleted]
These considerations are valid, in principle, but don't apply to
my problem. I'm just looking at 1000's of splice junction sequences
with the intention of analyzing informational signals, and I
don't want the results to be biased by large sets of nearly
identical sequences (e.g., human antibody genes). All of the
sequence fragments are the exactly the same length, there are
no gaps, and a simple measurement of nucleotide identity is
sufficient to quantify the relationship between any two
sequences (unless I want to take into account base composition).
One possible solution is the CLEANUP program (brought to my
attention by Sabino Liuni, one of the developers):
Grillo, et al. 1996, CABIOS 12(1):1, CLEANUP: a fast computer
program for removing redundancies from nucleotide sequence databases.
which is dependent on GCG and NCBI libraries (I don't have the
GCG libraries, yet, but I'm working on it).
Department of Biochemistry
Halifax, Nova Scotia B3H 4H7 CANADA
(email) arlin at is.dal.ca
More information about the Bio-soft