Bootscanning for recombination
msalminen at pasteur.hjf.org
Sat Jul 6 15:02:27 EST 1996
The software package for performing "Bootscanning" is now available from
our website at
"Bootscanning" is a method for detecting recombination in viral
sequences. We have only
applied it to HIV-1, but it should work for any genes that contain
You will need GDE and Phylip to run the package, and at the current
time, we only provide
SUN executables. However, the source-code is also included, so anyone
who would care to
compile on another system capable of running GDE and Phylip is free to
Below is an excerpt from some of the documentation:
Bootscanning Package v. 1.0beta1
Principal Idea and Approach: Mika Salminen, msalminen at hiv.hjf.org
Design and programming: Wayne Cobb, wcobb at reed.hjf.org
(c) 1994-1996: Mika Salminen, Wayne Cobb, Henry M. Jackson Foundation
Bootscanning is a method for anaysis of viral recombination. It can be
compare an unknown, suspected recombinant sequence, to a set of
potential parental sequences. It should be independent of organism, but
only used it for HIV-1 and suspect that it only works for sufficiently
In the case of HIV-1 there are predefined genetic subtypes which have
called A-H in analyses based on the envelope and gag genes, with the
exception that the E-subtype does not exist in gag. Viruses with
envelopes group with the A-type viruses in gag.
Bootscanning relies on the alignment of a suspected recombinant sequence
a set of potential parental reference sequences (groups of sequences
differnt subtypes, or consensus sequences created from sets of reference
sequences). After optimal alignment, the alignment is broken into
overlapping segments (or windows) which are fed to a program for
analysis (any of the sequence programs of Phylip could be used, we have
menus for three methods). Bootstrapped phylogenetic trees are built for
segment and finally the bootstrap value for placing the unknown with
the reference sequences/sequence groups is tabulated and plotted along
The assumption is that the unknown will always reach high bootstrap
it's parental subtypes in windows covering areas of that subtype in the
When a recombination breakpoint is reached, the bootsrap value for one
parent+the unknown will go down, and the bootsrap value for the second
the unknown should go up.
Therefore we should find the breakpoint in the intersection of the
Reference alignments of non-recombinant gag and env genes and
sequences are included in the package. However, be aware that the B and
D-subtypes are sometimes difficult to separate in gag and that the
some regions separates to at least 2 subclusters. Therefore we have
consensus sequences A1 and A2 in the cocnsensus alignments.
The package contains the following components:
Menufile for GDE:
Insert the menus in this file into your ".GDEmenus"-file.
Shell-scripts (place all in /GDE/bin/):
Performs Maximum Likelihood Bootscanning analysis using SEQBOOT,
DNADIST, FITCH and CONSENSE
Performs Neighbor Joining Bootscanning analysis using SEQBOOT, DNADIST,
NEIGHBOR and CONSENSE
Performs Neighbor Joining Bootscanning analysis using SEQBOOT, DBNAPARS
Shell script to run the program analyze.
Shell script to run the program chop
Executables (place all in /GDE/bin/):
readseq Fixed sequence format converter.
"PhyloMask" Creates masks to exclude gaps (menu available in
"chop" Breaks up masked alignment into individual segments which are
created as sequentially numbered input files to the Phylip
programs in a subdirectory of your home directory.
Each window is numbered at the midpoint of the segment in the
alignment. For each segmnet a corresponding outfile and treefile
is created which will contain the bootstrap values and the
consensus trees. The programs 'analyze' and 'report' are
used to extract and tabulate the bootstrap values of specified
groups of sequences (taxa or clades). Plotting the tables
using the alignment as the x-axis and the bootrap value as the
y-axis can be used to identify recombination points.
"analyze" Extracts bootstrap values from outfiles. Use analyze.sh to
"report" Collects bootstrap values in tab-delimited table.
Attached to this file is also an example of a bootscan-plot of an
We hope that you will be able to use the method to produce some useful
and would certainly be happy to hear comments and critique about the
and certainly also bug-reports. We would especially be delighted to here
anyone who has managed to successfully install the package.
We also acknowledge that the package is crude and simple, we have not
a lot of effort into getting it very elegant, but it works for us, and
that it will for other people, too. We have tried to remove any bugs
have crept up during development, but there will certainly be more.
the package will do it on their own risk, we take no responsibility for
loss of data or hardware that may result from the use of the package
there will be none!).
Finally, some acknowledgements and disclaimers:
This work was supported in part by cooperative agreement N.
between the United States Army Medical Research and Materiel Command and
Henry M. jackson Foundation for the Advancement of Military Medicine,
grant No. 19191 from the Finnish Academy of Science (Mika Salminen). The
views and opinions expressed herein are those of the authors and do not
purport to reflect the official policy or position of the US Army or of
the Department of Defense.
Please refer to the following publications if using the package:
Mika O. Salminen, Jean K. Carr, Donald S. Burke and Francine E.
(1995). Identification of Breakpoints in Intergenotypic Recombinants of
HIV-1 by Bootscanning. AIDS Research and Human Retroviruses, 11,
Smith S, Overbeek R, Woese CR, Gilbert W and Gillevet P. (1994) The
Genetic Data Environment: An expandable GUI for Multiple Sequence
Analysis. Comp Appl Biol Sci, 10, 671-675.
Felsenstein, J. (1991). "Phylip Manual", v. 3.4, University Herbarium,
University of California, Berkeley, California.
-------------- next part --------------
Skipped content of type multipart/appledouble
More information about the Bio-soft