FASTA format - proposed max line limit

Jerry Learn learn at u.washington.edu
Sat Dec 5 14:59:01 EST 1998


In article <74a30m$sjg at gap.cco.caltech.edu>, mathog at seqaxp.bio.caltech.edu
wrote:

> In article <73mjou$tpm$1 at news.fas.harvard.edu>, "tendo"
<tendo at fas.harvard.edu> writes:
> >There is a very good standard that only one comment line which starts with
> >'>' character is allowed for each sequence.
> >If lengths are really between 80-1000 chars, all you need is to just prepare
> >1002 bytes buffer should be enough for reading or 2000 bytes for security.
> >It's not a problem at all for any kind of recent computers, is it?
> 
> Yes, it is, but not in the sense you meant it.  The fundamental problem
> with lines >80 characters is that there is no consistency in how they will
> be displayed.  They might wrap, they might truncate, they might be scrolled
> off the right hand side of the screen (which an end user might not notice
> when scanning quickly through a 100 entry FASTA file with a tool like
> "nedit" or "notepad").  There are even a few tools around which will do
> nasty things when they encounter overly long "text" records, for instance
> EDT on VMS will truncate them to 255 characters. 
> 
> >By my understanding, David's proposal is mainly focused on easy handling of
> >FASTA fomrat data in programs and compatibility with available programs.
> 
> and consistency of display with a variety of tools.
> 
> FASTA is a TEXT format, so fasta files should look very much the same with
> the widest range of existing text tools.  Long lines are not compatible 
> with that goal.
> 
One other benefit with constraining the comment line to 80 characters is
that the sequence files are e-mailable. Often I ask collaborators to send
me sequence data via email. It is usually the most convenient LCD method
to exchange data. If the comment line is >80 characters, the remaining
characters become sequence data.

Jerry Learn




More information about the Bio-soft mailing list