[Bio-software] MIRA V3.0.0 sequence assembler

Bastien Chevreux via bio-soft%40net.bio.net (by bach from chevreux.org)
Sun Jan 31 16:06:50 EST 2010


Dear all

I am pleased to announce that the MIRA 3.0.0 sequence assembler is available 
at SourceForge:

  http://sourceforge.net/projects/mira-assembler/

Documentation for MIRA can be found both in the binary packages and on the 
MIRA-Wiki, the latter being here:

  http://sourceforge.net/apps/mediawiki/mira-assembler/

So, what's new in this release? Actually, a lot!

The 3.0 version of MIRA is the result of a long development to make de-novo 
and mapping assemblies of Sanger, 454 and Solexa (Illumina) data as easy and 
straightforward as possible while keeping a maximum accuracy.

Another focus was to make it possible to use results from Solexa mapping
projects in current finishing programs, not only viewers. MIRA introduces
for that the notion of coverage equivalent reads (CER) which reduces the
data volume by 70 to 90%. This allows painless use of such data sets in
"gap4" and "consed".

A third focus was to reduce time for researchers to find and evaluate 
differences in mapping assemblies. MIRA supports that by setting tags and 
places of interest and creating easy to understand HTML files and tables
ready to be imported into spreadsheet programs.

As a lot has changed since the 2.8.x series of MIRA, the following list has 
just a few highlights which came in during the 2.9.x series:

- sequencing technologies: MIRA handles different sequencing technologies
  independently from each other and has specialised routine for working with
  each of them.
- command-line parameters: MIRA has now a handful of "Do-What-I-Mean" one-
  stop switches which allows to configure the assembler for 90% of all use
  cases. Furthermore, many parameters can be adjusted for each sequencing
  technology so that the assembly engine can be tweaked for very specialised
  cases if needed.
- all sequencing technologies (Sanger, 454, Solexa) have now
  - recognition of chimeras
  - new assembly routines to for improved repeat resolving
  - improved data preprocessing that gets rid of low quality data and
    sequencing errors at ends of reads ... even when no quality data is
    available.
- 454 data:
  - fully developed capability for de-novo and mapping assembly of 454 data
    (paired and non-paired)
  - automated contig editor to remove most obvious and/or annoying
    sequencing errors
  - improved consensus calling streamlined to minimise the dreaded
    homopolymer problem
- Solexa data
  - can handle Solexa data of any length, no restriction to very short
    sequences.
  - memory/space saving: MIRA has special mapping mode which creates data
    so that widely used finishing tools like gap4 and consed can load these
    projects and still be fairly quick
- alignments enriched with features: MIRA adds information like
  repetitiveness or repeat marker bases as tags in the assembly so that
  these can be used during finishing
- assembly information files: MIRA writes more information files which can
  be easily parsed and/or read
- mapping assemblies: MIRA has a full SNP analysis for prokaryotic data
- comprehensive tables and HTML result files (mapping assembly): the
  convert_project program can now create easy to use tables and HTML files
  which show the data in a way suited for less computer-interested people
  (biologists etc. :-)
- memory management: MIRA can now be told to use an upper limit of RAM.
- file formats: MIRA can now parse or write more file formats. Notable are
  change from SSAHA to SSAHA2 for clipping, FASTQ for data input and MAF
  format for output.
- MacOS support: MIRA now compiles on MacOS-X
- speed and memory: compared to 2.8.x, MIRA now uses way less memory and is
  a lot faster.
- tons of other features, tweaks and bug fixes. See the CHANGES_old.txt file


Have a lot of fun with MIRA :-)

Bastien



More information about the Bio-soft mailing list