Bootscanning for recombination

Mika Salminen msalminen at pasteur.hjf.org
Sat Jul 6 15:02:27 EST 1996


Announcement: 

The software package for performing "Bootscanning" is now available from
our website at
http://hivgenome.hjf.org/

"Bootscanning" is a method for detecting recombination in viral
sequences. We have only
applied it to HIV-1, but it should work for any genes that contain
sufficient phylo-
genetic signal.

You will need GDE and Phylip to run the package, and at the current
time, we only provide
SUN executables. However, the source-code is also included, so anyone
who would care to
compile on another system capable of running GDE and Phylip is free to
do so.

Below is an excerpt from some of the documentation:

Bootscanning Package v. 1.0beta1
Principal Idea and Approach: 	Mika Salminen, msalminen at hiv.hjf.org
Design and programming:		Wayne Cobb, wcobb at reed.hjf.org
(c) 1994-1996: Mika Salminen, Wayne Cobb, Henry M. Jackson Foundation

Description:

Bootscanning is a method for anaysis of viral recombination. It can be
used to
compare an unknown, suspected recombinant sequence, to a set of
predefined
potential parental sequences. It should be independent of organism, but
we have
only used it for HIV-1 and suspect that it only works for sufficiently
variable
genes.

In the case of HIV-1 there are predefined genetic subtypes which have
been 
called A-H in analyses based on the envelope and gag genes, with the
notable
exception that the E-subtype does not exist in gag. Viruses with
E-subtype 
envelopes group with the A-type viruses in gag.

Bootscanning relies on the alignment of a suspected recombinant sequence
with
a set of potential parental reference sequences (groups of sequences
from the 
differnt subtypes, or consensus sequences created from sets of reference 
sequences). After optimal alignment, the alignment is broken into
sequential, 
overlapping segments (or windows) which are fed to a program for
phylogenetic 
analysis (any of the sequence programs of Phylip could be used, we have
included
menus for three methods). Bootstrapped phylogenetic trees are built for
each 
segment and finally the bootstrap value for placing the unknown with
each of 
the reference sequences/sequence groups is tabulated and plotted along
the genome.

The assumption is that the unknown will always reach high bootstrap
values with 
it's parental subtypes in windows covering areas of that subtype in the
unknown.
When a recombination breakpoint is reached, the bootsrap value for one 
parent+the unknown will go down, and the bootsrap value for the second
parent+
the unknown should go up.

Therefore we should find the breakpoint in the intersection of the
plotted
bootscanning lines.

Reference alignments of non-recombinant gag and env genes and
subtype-reference
sequences are included in the package. However, be aware that the B and 
D-subtypes are sometimes difficult to separate in gag and that the
A-subtype in
some regions separates to at least 2 subclusters. Therefore we have
included
consensus sequences A1 and A2 in the cocnsensus alignments.

The package contains the following components:

Menufile for GDE:

"GDEmenus.Bootscan"
Insert the menus in this file into your ".GDEmenus"-file. 

Shell-scripts (place all in /GDE/bin/):

"mlbootscan.sh"
Performs Maximum Likelihood Bootscanning analysis using SEQBOOT,
DNADIST, FITCH and CONSENSE

"njbootscan.sh"
Performs Neighbor Joining Bootscanning analysis using SEQBOOT, DNADIST,
NEIGHBOR and CONSENSE

"parsboot.sh"
Performs Neighbor Joining Bootscanning analysis using SEQBOOT, DBNAPARS
and CONSENSE

"analyze.sh"
Shell script to run the program analyze.

"genchop.sh"
Shell script to run the program chop

Executables (place all in /GDE/bin/):

readseq		Fixed sequence format converter.
"PhyloMask"	Creates masks to exclude gaps (menu available in 
		GDEmenus.Bootscan).	

"chop"		Breaks up masked alignment into individual segments which are 
		created as sequentially numbered input files to the Phylip 
		programs in a subdirectory of your home directory. 
		Each window is numbered at the midpoint of the segment in the
		alignment. For each segmnet a corresponding outfile and treefile
		is created which will contain the bootstrap values and the 
		consensus trees. The programs 'analyze' and 'report' are
		used to extract and tabulate the bootstrap values of specified
		groups of sequences (taxa or clades). Plotting the tables
		using the alignment as the x-axis and the bootrap value as the 
		y-axis can be used to identify recombination points.

"analyze"		Extracts bootstrap values from outfiles. Use analyze.sh to
run

"report"		Collects bootstrap values in tab-delimited table.

Attached to this file is also an example of a bootscan-plot of an
A/D-recombinant virus.


We hope that you will be able to use the method to produce some useful
data,
and would certainly be happy to hear comments and critique about the
package
and certainly also bug-reports. We would especially be delighted to here
from 
anyone who has managed to successfully install the package.

We also acknowledge that the package is crude and simple, we have not
put
a lot of effort into getting it very elegant, but it works for us, and
we hope 
that it will for other people, too. We have tried to remove any bugs
that
have crept up during development, but there will certainly be more.
Anyone using
the package will do it on their own risk, we take no responsibility for
any 
loss of data or hardware that may result from the use of the package
(hopefully
there will be none!).

Finally, some acknowledgements and disclaimers:

This work was supported in part by cooperative agreement N.
DAMD17-93-V-3004, 
between the United States Army Medical Research and Materiel Command and
the 
Henry M. jackson Foundation for the Advancement of Military Medicine,
and by 
grant No. 19191 from the Finnish Academy of Science (Mika Salminen). The 
views and opinions expressed herein are those of the authors and do not 
purport to reflect the official policy or position of the US Army or of 
the Department of Defense.

Please refer to the following publications if using the package:

Mika O. Salminen, Jean K. Carr, Donald S. Burke and Francine E.
McCutchan. 
(1995). Identification of Breakpoints in Intergenotypic Recombinants of
HIV-1 by Bootscanning. AIDS Research and Human Retroviruses, 11,
1423-25. 

Smith S, Overbeek R, Woese CR, Gilbert W and Gillevet P. (1994) The 
Genetic Data Environment: An expandable GUI for Multiple Sequence 
Analysis. Comp Appl Biol Sci, 10, 671-675.

Felsenstein, J. (1991). "Phylip Manual", v. 3.4, University Herbarium, 
University of California, Berkeley, California.
-------------- next part --------------
Skipped content of type multipart/appledouble


More information about the Bio-soft mailing list