Counting tripeptide frequencies

Randal M. Henne henne at zgi.com
Thu Feb 11 19:11:09 EST 1999


Andrew Dalke wrote:

> Rich Dudley asked:
> > Doe anyone know of a program (WW or Windows) that can enumerate the di-
> > and tri-peptide frequencies in a protein?  Ideally, it would contstruct
> > a table at the end of the input and have the sequence and number of
> > occurrences.
> >
>
> This isn't any help, but I figured may code was hard enough to
> understand that I would post it anyway <grin>.
>
> Here's a perl script for dipeptide pair counts, assuming single
> letter sequences on one line per record.
>
> perl -ne '%dict={};
>   s/(..)/$dict{$1}++,$1/ge;$_=substr($_,-(length)+1);
>   s/(..)/$dict{$1}++,$1/ge;
>   foreach $k (keys %dict) {print "$k $dict{$k}\n"}'
>
> ANAANOPOANO
> OA 1
> AA 1
> NO 2
> OP 1
> NA 1
> PO 1
> AN 3
>
> (Yeah! And "O" is the 21st Beatle^H^H^H^H^H^Hamino acid :)
>
> For tripeptides that's:
> perl -ne '%dict={};
>   s/(...)/$dict{$1}++,$1/ge;$_=substr($_,-(length)+1);
>   s/(...)/$dict{$1}++,$1/ge;$_=substr($_,-(length)+1);
>   s/(...)/$dict{$1}++/ge;
>   foreach $k (keys %dict) {print "$k $dict{$k}\n"}'
>
> ANANAPANA
> ANA 3
> APA 1
> NAN 1
> PAN 1
> NAP 1
>
>   Intuitively obvious to the most causual of observers, yes?
>
>                                                 Andrew

Yes it is . . . but Perl and not Python Andrew?





More information about the Bio-soft mailing list