Sequence editor that can delete columns?

Arthur Schuessler schueslr at sun0.urz.uni-heidelberg.de
Thu Feb 27 13:39:59 EST 1997


"sequence aligners",
if you have acces to any IBM compatible PC (from 386 to NT workstations)
you can use the >very< useful alignment edito written by Dominik
Hepperle. It's great, and problems like written in this discussion are
easyest to sole. Moreover, the programm has many more features. I list
some below.

The program was written by a frien of mine (I "cooperated" in testing
it) during the last two years. It is for PC (Win3.x, Win95/NT). Two
Versions are available, one 16 and one 32 bit. The 16bit version can be
tested as a demo version available in WWW (see below), it runs very well
under Win3.1x and Win95. 
It is not for free, he takes 100 $ US for it now, but I think it is one
of the best (for me it's the best) alignment editors available. It will
probably become more expensive after some month. It also handles protein
sequences and is fully color coded. It is not only for alignment, but
also searches for restriction sites, makes consensus sequences,
statistics, and so on. It has comfortable features, like search and
replace nucleotides (e.g. T->U), making masks and replace non masked
regions (!), translation DNA or RNA -> Protein, RNA <-> DNA, showing
open reading frames, codon usage can be edited, and so on. One also can
easyly include secondary structure elements in the alignment - they can
be removed again for analysis.
>>>>>>>> I have no commercial interest <<<<<<<< - I'm just informing you. 
The adress and e-mail etc. of Dominik Hepperle (he wrote the programme)
is shown in the DEMO-version available at the WWW adress given below.
This version can >not< save files (the programm is not for free), all
other features can be tested. It handles e.g. PAUP, PHYLIP and MEGA
files.
It is really a good programm!
It has no problems with lets say 1000 LSU rRNAs (the number is only
dependent on the memory you have). I aligned 80 fungal SSU rRNAs (about
2000 sites) with the program easyly on a 486/33 PC with 8 MB RAM (my old
private computer!). I didn't try how many are possible! probably much
more!.
The program was written during the last 2-3 years, I use it since about
one year. All serious bugs are fixed, it makes no problems and is very
stable. He had to buy the programming software and spent much time in
it, 100$ are not very much. 
But, just try it.

WWW adress:
http://www.winsite.com/info/pc/win3/demo/align39.zip/

Ciao,
Arthur

PS: he is all the time implementing new features in the program, so if
there are suggestions or special needs from you or others, he simply
will include them, if possible (most is possible). E.g. special data
formats can be included, if needed.
Dr. Arthur Schuessler
University of Heidelberg 
Zellenlehre
Im Neuenheimer Feld 230
D-69120 Heidelberg
Germany
E-mail: schueslr at sun0.urz.uni-heidelberg.de
FAX: 06221 / 54 49 13

mathog at seqaxp.bio.caltech.edu wrote:
> 
> In article <3314A3F7.1C33 at freenet.carleton.ca>, ac562 at freenet.carleton.ca (Robert J. Forster) writes:
> >I have quite a few alignments of 16S rRNA sequences from which I would
> >like to delete a highly variable region before analysis.  I have used
> >ClustalX and Seqpup on a mac to produce the alignments. When I select a
> >block of sequences in seqpup and then hit the Edit clear or cut
> >functions nothing happens.  I can delete the region one sequence at a
> >time, but with hundreds of sequences I am searching for an easier way.
> >I have put GDE on an HP-UX machine and DCSE on Linux, but both of these
> >installations are not exactly stable, and the documentation for these
> >programs does not indicate whether the proposed task would be very easy.
> >If anyone knows of a program that could help me out I would appreciate
> >some pointers.
> 
> If you have access to EGCG you will find a program CREFORMAT, which is a
> variant of GCGs REFORMAT that I wrote specifically to address the needs of
> a user here who has zillions of aligned tRNAs - a situation fairly similar
> to yours.  CREFORMAT adds these switches to the standard ones:
> 
>                             file
> /BEGin           beginning of range, defaults to 1
> /END             end of range, defaults to maximum sequence length
>    Use these to extract a subsequence from a sequence or MSF file.
> /DELete          delete the subsequence in the range, leave the rest
> /REVerse         return the reverse strand
> /LOOKup="U.,TZ"  convert characters in first string to matching character
>                  in second string.
> 
> You can use this to automate column deletions for use in batch files and so
> forth, or do it interactively from the command line.  It will operate on
> any sequence file that REFORMAT can (*.seq, whatever.msf{*}, @file.list).
> 
> So in your case, you could do:
> 
> $ creformat/infile=whatever.msf{*}/msf/begin=90/end=100/delete
> 
> and that would remove columns 90 through 100 inclusive.  (Note that when
> doing multiple columns you specify the regions to remove back to front
> so the numbering doesn't change as you go along.)
> 
> It's also handy for picking out a column of data, like this:
> 
> $ creformat/infile=whatever.msf{*}/msf/begin=90/end=100/outfile=thin.msf
> 
> or for just yanking a subregion out of a database entry, when you know
> a priori where the region of interest is, as here, when yanking the CDS
> for the glucose transporter gene out of the 338234 bp entry for the
> Bithorax Complex:
> 
> $ creformat/infile=GB_IN:DMU31961/begin=193566/end=195089 -
>   /reverse/outf=glucose.seq
> 
> If you don't have EGCG you could use NEDIT, which will also do column cuts
> and pastes, and is free.
> 
> Regards,
> 
> David Mathog
> mathog at seqaxp.bio.caltech.edu
> Manager, sequence analysis facility, biology division, Caltech

--



More information about the Mol-evol mailing list