Benchmark protein-sequence data sets

Des Higgins fatherdes at eircom.net
Thu Feb 15 16:36:08 EST 2001


"Fernando Gonzalez" <fernando.gonzalez at uv.es> wrote in message
news:96fegm$njh$1 at mercury.hgmp.mrc.ac.uk...
>
>
> Mark Ragan wrote:
>
> > Colleagues,
> >
> > Can anyone point me to sets (matrices) of aligned protein sequences
> > for use in benchmarking software for phylogenetic inference?
> >
> > I'm hoping to find something similar to the Green Plant phylogeny
> > group's 232-sequence rRNA "challenge" data set -- only with protein
> > sequences.
> >
> > Something on the order of 12 to 40 protein sequences, of 100-400 aa's
> > in length each and without too many alignment gaps, would be ideal.
> > The data might be either real, or generated according to a specified
> > model.
> >
> > As I'm not a subscriber to this list, please email me directly at:
> >
> > m.ragan at imb.uq.edu.au
> >
> > Many thanks,
> >
> > Mark Ragan
> > Institute for Molecular Bioscience
> > The University of Queensland
> > Brisbane, Qld 4072 Australia
> >
> > http://www.imb.uq.edu.au/Ragan.html
> >
> >
> > ---
> You can find a reference database with 142 protein alignments (Thompson
> et al. NAR 27:2682 , 1999) at
http://www-igbmc.u-strasbg.fr/BioInfo/BAliBASE
>
> Good luck!

Those alignments are for benchmarking alignment software.  They do not come
with any particular phylogeny and in some cases the sequences are very
highly divergent.

Des Higgins

>
> --
> Fernando Gonzalez
> --
> **************************************************************
> Dr. Fernando Gonzalez Candelas
> Instituto Cavanilles de Biodiversidad y Biologia Evolutiva
> Dept. de Genetica / Serv. Bioinformatica
> Universitat de Valencia       Phone: (+34) 963 983 653
> Apartado de Correos 22085     FAX (+34) 963 983 670
> E-46071 Valencia SPAIN        e-mail: Fernando.Gonzalez at uv.es
> **************************************************************
>
>
>



---







More information about the Mol-evol mailing list