[Bio-software] MIRA V3.0.0 sequence assembler
Bastien Chevreux
via bio-soft%40net.bio.net
(by bach from chevreux.org)
Sun Jan 31 16:06:50 EST 2010
Dear all
I am pleased to announce that the MIRA 3.0.0 sequence assembler is available
at SourceForge:
http://sourceforge.net/projects/mira-assembler/
Documentation for MIRA can be found both in the binary packages and on the
MIRA-Wiki, the latter being here:
http://sourceforge.net/apps/mediawiki/mira-assembler/
So, what's new in this release? Actually, a lot!
The 3.0 version of MIRA is the result of a long development to make de-novo
and mapping assemblies of Sanger, 454 and Solexa (Illumina) data as easy and
straightforward as possible while keeping a maximum accuracy.
Another focus was to make it possible to use results from Solexa mapping
projects in current finishing programs, not only viewers. MIRA introduces
for that the notion of coverage equivalent reads (CER) which reduces the
data volume by 70 to 90%. This allows painless use of such data sets in
"gap4" and "consed".
A third focus was to reduce time for researchers to find and evaluate
differences in mapping assemblies. MIRA supports that by setting tags and
places of interest and creating easy to understand HTML files and tables
ready to be imported into spreadsheet programs.
As a lot has changed since the 2.8.x series of MIRA, the following list has
just a few highlights which came in during the 2.9.x series:
- sequencing technologies: MIRA handles different sequencing technologies
independently from each other and has specialised routine for working with
each of them.
- command-line parameters: MIRA has now a handful of "Do-What-I-Mean" one-
stop switches which allows to configure the assembler for 90% of all use
cases. Furthermore, many parameters can be adjusted for each sequencing
technology so that the assembly engine can be tweaked for very specialised
cases if needed.
- all sequencing technologies (Sanger, 454, Solexa) have now
- recognition of chimeras
- new assembly routines to for improved repeat resolving
- improved data preprocessing that gets rid of low quality data and
sequencing errors at ends of reads ... even when no quality data is
available.
- 454 data:
- fully developed capability for de-novo and mapping assembly of 454 data
(paired and non-paired)
- automated contig editor to remove most obvious and/or annoying
sequencing errors
- improved consensus calling streamlined to minimise the dreaded
homopolymer problem
- Solexa data
- can handle Solexa data of any length, no restriction to very short
sequences.
- memory/space saving: MIRA has special mapping mode which creates data
so that widely used finishing tools like gap4 and consed can load these
projects and still be fairly quick
- alignments enriched with features: MIRA adds information like
repetitiveness or repeat marker bases as tags in the assembly so that
these can be used during finishing
- assembly information files: MIRA writes more information files which can
be easily parsed and/or read
- mapping assemblies: MIRA has a full SNP analysis for prokaryotic data
- comprehensive tables and HTML result files (mapping assembly): the
convert_project program can now create easy to use tables and HTML files
which show the data in a way suited for less computer-interested people
(biologists etc. :-)
- memory management: MIRA can now be told to use an upper limit of RAM.
- file formats: MIRA can now parse or write more file formats. Notable are
change from SSAHA to SSAHA2 for clipping, FASTQ for data input and MAF
format for output.
- MacOS support: MIRA now compiles on MacOS-X
- speed and memory: compared to 2.8.x, MIRA now uses way less memory and is
a lot faster.
- tons of other features, tweaks and bug fixes. See the CHANGES_old.txt file
Have a lot of fun with MIRA :-)
Bastien
More information about the Bio-soft
mailing list