large DNA sequences --> smaller overlapping sequences

Roland Walker walker at ncbi.nlm.nih.gov
Thu Jun 11 09:48:56 EST 1998


kappe wrote:
> Does anybody know of a program to split up a large DNA-sequence file
> (4 Mb) into smaller files/sequences of 200 kb with 10 kb overlap?

The SEALS package

  http://www.ncbi.nlm.nih.gov/Walker/SEALS/index.html

contains a number of little widgets for common manipulations of
sequence such as this.

If your sequence was in FASTA format in a file called 'chromosome.fa'
you could split it with this command

  fenestrate chromosome.fa -window= 200_000 -overlap=10_000 

To save each subsquence in different files you might try

  fenestrate chromosome.fa -window= 200_000 -overlap=10_000 | \
  shatter -word= -5

Roland




More information about the Bio-soft mailing list