Counting tripeptide frequencies
Randal M. Henne
henne at zgi.com
Thu Feb 11 19:11:09 EST 1999
Andrew Dalke wrote:
> Rich Dudley asked:
> > Doe anyone know of a program (WW or Windows) that can enumerate the di-
> > and tri-peptide frequencies in a protein? Ideally, it would contstruct
> > a table at the end of the input and have the sequence and number of
> > occurrences.
> >
>
> This isn't any help, but I figured may code was hard enough to
> understand that I would post it anyway <grin>.
>
> Here's a perl script for dipeptide pair counts, assuming single
> letter sequences on one line per record.
>
> perl -ne '%dict={};
> s/(..)/$dict{$1}++,$1/ge;$_=substr($_,-(length)+1);
> s/(..)/$dict{$1}++,$1/ge;
> foreach $k (keys %dict) {print "$k $dict{$k}\n"}'
>
> ANAANOPOANO
> OA 1
> AA 1
> NO 2
> OP 1
> NA 1
> PO 1
> AN 3
>
> (Yeah! And "O" is the 21st Beatle^H^H^H^H^H^Hamino acid :)
>
> For tripeptides that's:
> perl -ne '%dict={};
> s/(...)/$dict{$1}++,$1/ge;$_=substr($_,-(length)+1);
> s/(...)/$dict{$1}++,$1/ge;$_=substr($_,-(length)+1);
> s/(...)/$dict{$1}++/ge;
> foreach $k (keys %dict) {print "$k $dict{$k}\n"}'
>
> ANANAPANA
> ANA 3
> APA 1
> NAN 1
> PAN 1
> NAP 1
>
> Intuitively obvious to the most causual of observers, yes?
>
> Andrew
Yes it is . . . but Perl and not Python Andrew?
More information about the Bio-soft
mailing list