IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Processing Batches Of Mutation Data Trace Files

jkb at mrc-lmb.cam.ac.uk jkb at mrc-lmb.cam.ac.uk
Mon Apr 7 08:42:59 EST 2003

In <CED81D34E37D5043A1211565277A51E53F24A9 at exchkc02.stowers-institute.org> MEC at Stowers-Institute.org ("Cook, Malcolm") writes:

> The limitation is expressed on page 213 of
> http://www.mrc-lmb.cam.ac.uk/pubseq/ftp/docs/manual_unix.pdf as:
> "For any batch of readings the reference traces are defined within pregap4's
> "Reference Traces" module. Note that this mode of operation, by allowing the
> specifcation of only one forward and one reverse trace, limits each batch of
> traces processed to those which correspond to a given pair of reference
> traces."

Yes is indeed a limitation unless you take the route you outlined below. The
goal was to make the user interface simple, without adding a limitation on
hard-coded sequence names.

> However, if I'm understanding Staden's architecture correctly, it should be
> possible to mix and process traces corresponding drawn from any number of
> reference traces.  The approach needs to be one of defining a WT_com proc in
> the the [global variables] section of the pregap.config file.


> The WT_com
> proc would access global $lines(ID) and 'figure out' based on some
> convention what file should be used as the reference trace.

However it is easier to get the naming scheme code to auto-generate the WT_com 
field for you. Take for example the standard "mutation" naming scheme. The
file specifies:

set ns_name "Mutation detection naming scheme"
set ns_regexp {(.*)([FfRr])(_[0-9]+)?$}
set ns_lt(TN) {$1}
set ns_lt(PR) {subst {$2 {[fF] 1} {[rR] 2} 0}}

This is really just a piece of Tcl code, with "set_name_scheme" being a
procedure to generate *_com procedures. In this case it generates:

proc PR_com {} {	global lines
    if {[regexp {(.*)([FfRr])(_[0-9]+)?$} $lines(ID) matched 1 2 3] == 0} {
	return ""
    if {[string match {[fF]} $2]} {return 1}
    if {[string match {[rR]} $2]} {return 2}
    return 0

proc TN_com {} {	global lines
    if {[regexp {(.*)([FfRr])(_[0-9]+)?$} $lines(ID) matched 1 2 3] == 0} {
	return ""
    return $1

You could of course manually edit the PR_com and TN_com procs, or write your
own WT_com proc. However it's probably easier to work out a regular expression 
(if one fits) matched against your sequence names and express WT that way.

James Bonfield (jkb at mrc-lmb.cam.ac.uk)   Fax: (+44) 01223 213556
Medical Research Council - Laboratory of Molecular Biology,
Hills Road, Cambridge, CB2 2QH, England.
Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/

More information about the Staden mailing list

Send comments to us at biosci-help [At] net.bio.net