FASTA format - proposed max line limit

Simon R Tomlinson plxsrt at pln1.nott.ac.uk
Wed Dec 9 08:57:43 EST 1998


You really have to consider why Fasta format is so useful and
therefore so commonly used.  One of the reasons for popularity is that
it is a simple file format.    Programs that stuff a lot of details
into the header are bound to make the format more complex.  
I would suggest that if the format becomes more complex then it'll
become less popular.

In an extreme case you could take the header from any sequence record
and stuff it onto a single line.  But surely this is just reinventing
an old format without the end-of-line breaks?!  [and without one
sequence per file].  I don't think this is desirable.

A lot of programs that I use will truncate the long header line anyway
(eg clustalw) so you lose the long header details.   To avoid this I
usually truncate the header myself to give the record a unique
identifier.   I usually use  the accession number or sequence name.
This gives the best of both worlds, the Fasta format remains simple
but I have access to the original record  to retrieve any addiitional
details that I require.  What is wrong with using Fasta this way?

Simon T 





More information about the Bio-soft mailing list