FASTA format - proposed max line limit
learn at u.washington.edu
Sat Dec 5 14:59:01 EST 1998
In article <74a30m$sjg at gap.cco.caltech.edu>, mathog at seqaxp.bio.caltech.edu
> In article <73mjou$tpm$1 at news.fas.harvard.edu>, "tendo"
<tendo at fas.harvard.edu> writes:
> >There is a very good standard that only one comment line which starts with
> >'>' character is allowed for each sequence.
> >If lengths are really between 80-1000 chars, all you need is to just prepare
> >1002 bytes buffer should be enough for reading or 2000 bytes for security.
> >It's not a problem at all for any kind of recent computers, is it?
> Yes, it is, but not in the sense you meant it. The fundamental problem
> with lines >80 characters is that there is no consistency in how they will
> be displayed. They might wrap, they might truncate, they might be scrolled
> off the right hand side of the screen (which an end user might not notice
> when scanning quickly through a 100 entry FASTA file with a tool like
> "nedit" or "notepad"). There are even a few tools around which will do
> nasty things when they encounter overly long "text" records, for instance
> EDT on VMS will truncate them to 255 characters.
> >By my understanding, David's proposal is mainly focused on easy handling of
> >FASTA fomrat data in programs and compatibility with available programs.
> and consistency of display with a variety of tools.
> FASTA is a TEXT format, so fasta files should look very much the same with
> the widest range of existing text tools. Long lines are not compatible
> with that goal.
One other benefit with constraining the comment line to 80 characters is
that the sequence files are e-mailable. Often I ask collaborators to send
me sequence data via email. It is usually the most convenient LCD method
to exchange data. If the comment line is >80 characters, the remaining
characters become sequence data.
More information about the Bio-soft