Oswaldo Trelles wrote:
> ... we have developed a new method to parallel implementation
> of the DNAml program, a method to construct phylogenetic tree
> from DNA sequences (please, see references). We have applied
> successfully to the analysis of ribosomal RNA data. Here we
> have the interest to apply it to analyse other interesting
> data set consisting of large number of DNA
> sequences and construct phylogenetic tree.
>> Who can point out to us such data sets?
There are over 30,000 sequences from primate immunodeficiency
viruses (HIV-1, HIV-2 and various SIVs). The complete genomes
are roughly 10,000 bp in length. All are apparenly derived
from a common ancestor, a lentivirus.
The Gag, Pol, Env and other genes from the primate
letiviruses can be reasonably aligned. The LTR and
other non-coding regions cannot be unambiguously aligned.
Attempting to build a phylogenetic tree from 30,000 or
even 3,000 sequences would be a bit rediculous. But
it is not uncommon for us to want to build a tree from
50 to 300 sequences. For example we have nearly 90 complete
genomes from HIV-1 isolates (The LTRs from within one type
of immunodeficiency virus can be aligned, it is only
aligning HIV-1 to HIV-2 or to SIVs that is difficult).
For an example of a data set, see the alginments at:
Select HIV-1 env DNA to see an alignment of the envelope
genes from about 220 different HIV-1 isolates. The alignments
can be downloaded in Intelligentetics or FASTA formats, or
viewed as printable text.
|Brian T. Foley btf at t10.lanl.gov |
|HIV Database (505) 665-1970 |
|Los Alamos National Lab http://hiv-web.lanl.gov/index.html |
|Los Alamos, NM 87544 U.S.A. http://www.t10.lanl.gov/~btf/home.html |