Randomised sequences

Brian Fristensky frist at cc.umanitoba.ca
Thu Feb 27 05:29:36 EST 2003

Hans Stenvien wrote:
> I have data sets consisting of several thousand unique coding sequences. For
> each sequence, I would like to generate a number of randomised sequences
> with the same length and nucleotide composition. Does anyone know of
> appropriate online service/software allowing me to input all of my sequences
> from one file? It would be great if one file is generated containing all the
> random sequences for all original sequences. I have so far only found online
> services/softwares enabling me to shuffle nucleotide order in one sequence
> at a time (i.e. manual input of each sequence). I am afraid I am not able to
> do any programming myself.
> Any help is appreciated.
> Best regards,
> Hans

The shuffle program from XYLEM will do that. The manual page is appended
below. XYLEM can be downloaded from:


Binaries are available for Solaris and Linux, and source
code should compile readily on other platforms, since
the code is pretty simple.

Brian Fristensky (ON SABBATICAL til July 1, 2003)
Department of Plant Science
University of Manitoba
Winnipeg, MB R3T 2N2  CANADA
frist at cc.umanitoba.ca
Sabbatical phone: 204-474-6724
Voicemail:        204-474-6085
Home phone:       204-261-3960
FAX:              204-474-7528

The most unforgiveable sin of all is being right too soon.


      shuffle.doc                                           update 3 Feb 94

            shuffle -sn [-wn -on]

           Shuffles sequences locally. See Lipman DJ, Wilbur WJ, Smith TF
           and Waterman MS (1984) On the statistical significance of nucleic
           acid similarities. Nucl. Acids Res. 12:215-226. 

           -sn    n is a random integer between 0 and 32767. This number
                  must be provided for each run.

           -wn    n is an integer, indicating the width of the window for
                  random localization. If w exceeds the length of a 
                  or is negative, the entire sequence is scrambled as a 
                  window. This is also the case if w is not specified. 

           -on    n is an integer, indicating the number of nucleotides
                  overlap between adjacent windows. It should never exceed
                  the window size.  o defaults to 0 if not specified.

           If w and o are specified, overlapping windows of w nucleotides
           are shuffled, thus preserving the local characteristic base
           composition. Windows overlap by o nucleotides. 

           If w and o are not specified, each sequence is shuffled 
           thus preserving the overall base composition, but not the local
           variations in comp.

           Any number of sequences may be processed from a single input
           file.  In Pearson-format files, each new sequence begins with a
           '>' comment line, indicating the name and a short description of
           the sequence.

           No distinction is made between protein or nucleic acid sequences.
           That is, shuffle will read any of the following characters as


           where '*' is the result of translating a stop codon, and '-'
           is a gap generated during sequence alignment. Lowercase is
           also accepted.

           A sample output file is shown below. Note that the first two
           lines of output are comment lines, listing the version of the
           program and the parameters used in the run.

           >SHUFFLE                   VERSION 11/ 8/93
           >RANDOM SEED:     9873          WINDOW:   12 
           >BAZFAZ - Borborigmus azerbi F-actin-zeta gene

        Dr. Brian Fristensky
        Dept. of Plant Science
        University of Manitoba
        Winnipeg, MB  Canada  R3T 2N2
        Phone: 204-474-6085
        FAX: 204-474-7528
        frist at cc.umanitoba.ca

        Fristensky, B. (1993) Feature expressions: creating and manipulating
        sequence datasets. Nucleic Acids Research 21:5997-6003.

More information about the Bio-soft mailing list