Processing Batches Of Mutation Data Trace Files

Cook, Malcolm MEC at
Mon Apr 28 03:12:55 EST 2003

James (and fellow Staden Users),

> However it is easier to get the naming scheme code to auto-generate the WT_com 
> field for you. 

Thanks so much for your suggestion and encouragement.

In my case, the first 'field' (of 6) in our sequence identifiers encodes our 'SubjectID', and Subject 61 was my 'reference subject', so, I construct the name of the WT 'reference chromatogram' in the naming convention as follows:

set ns_lt(WT) {"61\_$2\_$3\_$4$5\_$6\.ab1"}; # Subject # 61 is reference subject

So, I can wind up not using the pregap4 modules named "Reference Traces and Reference Sequences" if I can provide the reference sequence in the naming convention, which, since $3 encodes my gene and I have annotated reference sequences stored in files named as <genename>.refseq.embl, I can do as follows:

set ns_lt(WT) {"$3.refseq.embl"}; # Subject # 61 is reference subject

The net result allows me to have a single gap project for all 7 PCR fragments of a single gene on which I am running the trace-diff approach to mutation detection.  It works quite nicely.

Oh.  I have learned that I also could have specified the WT in the #[global_variables] section of my pregap4 config file as:

proc WT_com {} { 
 	global lines
 	if {[regexp {(.+)[-_](.+)[_-](.+)[_-](.*)([RF])[_-](.*)} $lines(ID) matched 1 2 3 4 5 6] == 0} {
 		return ""
 	return  "61\_$2\_$3\_$4$5\_$6\.ab1"

I may adopt this approach if we later choose to implement strategy for dynamic selection of reference trace (say, using the highest quality sequence).

If the staden package gets a breath of life, may I suggest the following:

	Allow the WT labelled reference traces to be respected in gap4 when 'Settings > Trace Display > Auto-diff Traces' is turned on.  Currently, I must manually 'set as reference trace' - but using the strategy outlined above, I may be reviewing, in a single assembly, the trace-diff output based on comparison with multiple 'reference traces'.

	Allow for the automatic inclusion for assembly into gap4 database of the reference sequence file(s), as identified in the exp files RT lines.

Finally, If you're still with me, I'd like to repeat my earlier query that so far has gone addressed in this forum:

>I thought I would try try another approach to allowing mixing reads from seaparte primer-pairs in a single pregap4 mutation detection session. 
>I thought I would create a simple text database holding my WT (wildtype trace) and RS (reference sequence) values. 
>But, lo, through WT appears as a 'LINE TYPE' when I choose to 'ADD COLUMN', 'RS' does not! 
>Am I missing something, or is the Staden code? 

Thanks for your attention and consideration.  I will continue to scan the staden web-site for clues to there being any way I can contribute to petitioning the continued success of this fine analysis suite...


Malcolm Cook
Database Applications Manager
Stowers Institute for Medical Research
1000 E 50th Street
Kansas City, MO 64110
tel: 816-926-4449
fax: (816) 926-2098

More information about the Staden mailing list