GCG Format

Peter Rice pmr at sanger.ac.uk
Fri May 15 10:40:24 EST 1998


Mr G Slater <gslater at hgmp.mrc.ac.uk> writes:

> GCG checksums are calculated by a simple hashing, much like
> the hash function examples in K&R.
> 
> Here's an example in C, with SwissProt:CALM_HUMAN as the test sequence.
> The Checksum should be 2160.
> 
> Hope this helps,
> 
> Guy.
> --
> 
> /* START EXAMPLE */
> 
> #include <stdio.h>
> #include <ctype.h>
> 
> static int CheckSumGCG(char *seq){
>     register int i, check = 0;
>     for(i = 0; seq[i] != '\0'; i++)
>         if(isalpha(seq[i]))
>             check += ((i % 57) + 1) * seq[i];
>     return check % 10000;
>     }
> 
> int main(){
>     register char *calm_human =
>         "ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD"
>         "MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYI"
>         "SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK";
> 
>     printf("Human Calmodulin GCG Checksum = %d\n", 
>             CheckSumGCG(calm_human) );
>     return 0;
>     }
> 
> /* END EXAMPLE */

Sorry Guy - that may be close but it ain't close enough.

Try "fetch test.seq" in GCG and see whether you get the same checksum.

For example. the following modification to your code gives an answer
of 2584 for gcg_chars but GCG reformat gives an answer of 7132. GCG's
reformat has to be the authority here - anything except the
reformatted value will be rejected by GCG programs.

Your version fails on simple lower case. It should still return 2160.
That's before worrying about the other 'valid' GCG sequence characters.

int main(){

    register char *calm_human =
        "adqlteeqiaefkeafslfdkdgdgtittkelgtvmrslgqnpteaelqd"
        "minevdadgngtidfpefltmmarkmkdtdseeeireafrvfdkdgngyi"
        "saaelrhvmtnlgekltdeevdemireadidgdgqvnyeefvqmmtak";

    register char *gcg_chars =
     "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
     "abcdefghijklmnopqrstuvwxyz.*~&@";

    printf("Human Calmodulin GCG Checksum = %d\n", 
            CheckSumGCG(calm_human) );
    printf("GCG sequence characters Checksum = %d\n", 
            CheckSumGCG(gcg_chars) );
    return 0;
    }



-- 
----------------------------------------------------------------------
Peter Rice                | Informatics Division, The Sanger Centre,
E-mail: pmr at sanger.ac.uk  | Wellcome Trust Genome Campus,
Tel: (44) 1223 494967     | Hinxton, Cambridge, CB10 1SA, England
Fax: (44) 1223 494919     | URL: http://www.sanger.ac.uk/Users/pmr/




More information about the Bio-soft mailing list