Minor READSEQ bug and fix.

John Powell jip at helix.nih.gov
Mon Oct 30 16:00:38 EST 1995


We have detected a minor bug in readseq and offer a simple one line
fix.  The problem is when the last line of some input sequences do not
have end of line characters.  Although this is NOT a likly occurrence, moving
files between platforms, cutting and pasting, using WEB browsers, etc., it is
a possibility - it happened at our site. 

Two examples which exhibit the bug are given here:

Example 1:

Input sequence is a fasta format with 1419 bases.  The Sequence has 
50 characters per line but NO end of line character for the last line:

% cat -e example.one
>HIL15COM Human interleukin 15 (IL15) mRNA, complete cds., 1419 bases, 99E46F98 checksum.$
TGTCCGGCGCCCCCCGGGAGGGAACTGGGTGGCCGCACCCTCCCGGCTGC$
GGTGGCTGTCGCCCCCCACCCTGCAGCCAGGACTCGATGGAGAATCCATT$
-CUT-
TAATTTAGTTATTGATGTATAAAGCAACTGTTATGAAATAAAGAAATTGC$
AATAAAAAAAAAAAAAAAA    

% readseq -f5 example.one -pipe > example_one.gcg
% cat example_one.gcg
HIL15COM Human interleukin 15 (IL15) mRNA, complete cds.
    HIL15COM  Length: 1400  (today)  Check: 5081  ..
    1  TGTCCGGCGC CCCCCGGGAG GGAACTGGGT GGCCGCACCC TCCCGGCTGC
   51  GGTGGCTGTC GCCCCCCACC CTGCAGCCAG GACTCGATGG AGAATCCATT
   -CUT-
 1351  TAATTTAGTT ATTGATGTAT AAAGCAACTG TTATGAAATA AAGAAATTGC

NOTE: The sequence is truncated at 1400 (the last line with an end of
line character).


Example 2:

Input sequence is a fasta format with 1419 bases. All the seqeunce is
in a single line with no end of line character.

% cat -e example.two
>HIL15COM Human interleukin 15 (IL15) mRNA, complete cds., 1419 bases, 99E46F98
checksum.$
TGTCCGGCGCCCCCCGGGAGGGAACTGGGTGGCCGCACCCTCCCGGCTGC -CUT- ATAAAAAAAAAAAAAAAA

% readseq -f5 example.two -pipe > example_two.gcg
% cat example_two.gcg
IL15COM Human interleukin 15 (IL15) mRNA, complete cds.
    HIL15COM  Length: 1275  (today)  Check: 1094  ..
    1  TGTCCGGCGC CCCCCGGGAG GGAACTGGGT GGCCGCACCC TCCCGGCTGC
   51  GGTGGCTGTC GCCCCCCACC CTGCAGCCAG GACTCGATGG AGAATCCATT
   -CUT-
 1201  TAATGCTGCA GGTCAACAGC TATGCTGGTA GGCTGAACCA CTGACTACTG
 1251  GCTCCCATTG ACTTCCTTCA TAAGC


NOTE: The sequence is truncated at 1275 (The last multible of full
255 "fgets" function calls).

A fix, provided by Rao Parasa of our group, is to add the line following
the comments to the readline routine in ureadseq.c:

Local void readline(FILE *f, char *s, long *linestart)
{
  char  *cp;

  *linestart= ftell(f);
  if (NULL == fgets(s, 256, f))
    *s = 0;
  else {
    cp = strchr(s, '\n');
    if (cp != NULL) *cp = 0;
    /*
     *  Following line fixes BUG when last line in the input does not have
     *  an EOL character
     */
    if (feof(f)) clearerr(f);
    }

}


-- 
--------
	John Powell                     phone: (301) 496-2963
	Building 12A, Room 2033		FAX: (301) 402-2867
	National Institutes of Health
	Bethesda, MD 20892		Internet: jip at helix.nih.gov




More information about the Bio-soft mailing list