GCG Format

Guy St.C. Slater.. gslater at hgmp.mrc.ac.uk
Mon May 18 12:15:40 EST 1998


Peter Rice wrote:
>
<SNIP>
>
> Sorry Guy - that may be close but it ain't close enough.
> 
> Try "fetch test.seq" in GCG and see whether you get the same checksum.
> 
> For example. the following modification to your code gives an answer
> of 2584 for gcg_chars but GCG reformat gives an answer of 7132. GCG's
> reformat has to be the authority here - anything except the
> reformatted value will be rejected by GCG programs.
> 
> Your version fails on simple lower case. It should still return 2160.
> That's before worrying about the other 'valid' GCG sequence characters.
> 
<SNIP>
>

OK; it was a rather clumsy solution used for writing
sequences which I knew to contain only upper case A-Z.

Below is a revised function which works for the cases you mentioned.

Guy.
--

/* START REVISED EXAMPLE */

#include <stdio.h>

static int CheckSumGCG(char *seq){
    register int i, ch, check = 0;
    register char *index = 
    "--------------------------------------&---*---.-----------------"
    "@ABCDEFGHIJKLMNOPQRSTUVWXYZ------ABCDEFGHIJKLMNOPQRSTUVWXYZ---~-"
    "----------------------------------------------------------------"
    "----------------------------------------------------------------";
    for(i = 0; seq[i] != '\0'; i++)
        if((ch = index[seq[i]]) != '-')
            check += ((i % 57) + 1) * ch;
    return check % 10000;
    }

int main(){
    register char *calm_human =
        "adqlteeqiaefkeafslfdkdgdgtittkelgtvmrslgqnpteaelqd"
        "minevdadgngtidfpefltmmarkmkdtdseeeireafrvfdkdgngyi"
        "saaelrhvmtnlgekltdeevdemireadidgdgqvnyeefvqmmtak";
    register char *test_seq =
        "GCTGCCGCAGCGGCXGATGACAATAACRAYTGTTGCTGYGATGACGAYGA"
        "AGAGGARTTTTTCTTYGGTGGCGGAGGGGGXCATCACCAYATTATCATAA"
        "THAAAAAGAARTTGTTACTTCTCCTACTGTTRCTXYTAYTGYTRYTXATG"
        "AATAACAAYCCTCCCCCACCGCCXCAACAGCARCGTCGCCGACGGCGGAG"
        "AAGGCGXAGRMGAMGGMGRMGXTCTTCCTCATCGAGTAGCTCXAGYWSXA"
        "CTACCACAACGACXGTTGTCGTAGTGGTXTGGXXXTATTACTAYGAAGAG"
        "CAACAGSARTAATAGTGATARTRATRRABCDEFGHIJKLMNOPQRSTUVW"
        "XYZ.~@&*abcdefghijklmnopqrstuvwxyz*@&~.";
    register char *gcg_chars =
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        "abcdefghijklmnopqrstuvwxyz.*~&@";

    printf("Human Calmodulin GCG Checksum = %d\n",
            CheckSumGCG(calm_human) ); /* 2160 */
    printf("GCG test sequence Checksum = %d\n", 
            CheckSumGCG(test_seq) );   /* 3365 */
    printf("GCG sequence characters Checksum = %d\n", 
            CheckSumGCG(gcg_chars) );  /* 7132 */
    return 0;
    }

/* END REVISED EXAMPLE */

-- 
 ----------------------------------------------------------------------
 Guy St.C. Slater,                              Tel : (44) 1223 494 565
 Human Genome Mapping Project Resource Centre,  Fax : (44) 1223 494 512
 Wellcome Trust Genome Campus,            mailto:gslater at hgmp.mrc.ac.uk
 Hinxton, Cambridge, CB10 1SB.      http://www.hgmp.mrc.ac.uk/~gslater/
 ----------------------------------------------------------------------




More information about the Bio-soft mailing list