GCG Format

Mr G Slater gslater at hgmp.mrc.ac.uk
Fri May 15 08:46:28 EST 1998


Jennifer Hallinan wrote:
> 
> Can anyone tell me how to compute the checksum in the header of a GCG
> format DNA sequence file?
> 
> Thanks,
> 
> Jennifer

GCG checksums are calculated by a simple hashing, much like
the hash function examples in K&R.

Here's an example in C, with SwissProt:CALM_HUMAN as the test sequence.
The Checksum should be 2160.

Hope this helps,

Guy.
--

/* START EXAMPLE */

#include <stdio.h>
#include <ctype.h>

static int CheckSumGCG(char *seq){
    register int i, check = 0;
    for(i = 0; seq[i] != '\0'; i++)
        if(isalpha(seq[i]))
            check += ((i % 57) + 1) * seq[i];
    return check % 10000;
    }

int main(){
    register char *calm_human =
        "ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD"
        "MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYI"
        "SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK";

    printf("Human Calmodulin GCG Checksum = %d\n", 
            CheckSumGCG(calm_human) );
    return 0;
    }

/* END EXAMPLE */

-- 
 ----------------------------------------------------------------------
 Guy St.C. Slater,                              Tel : (44) 1223 494 565
 Human Genome Mapping Project Resource Centre,  Fax : (44) 1223 494 512
 Wellcome Trust Genome Campus,            mailto:gslater at hgmp.mrc.ac.uk
 Hinxton, Cambridge, CB10 1SB.      http://www.hgmp.mrc.ac.uk/~gslater/
 ----------------------------------------------------------------------




More information about the Bio-soft mailing list