efficient use of gcg

Stuart Rison stuart at ludwid.ucl.ac.uk
Fri Jul 25 04:08:40 EST 1997


In article <eisenman-ya02408000R2207971725150001 at nntp.stanford.edu>,
eisenman at cmgm.stanford.edu (David M. Eisenmann) wrote:

In article <newitt-ya02408000R2207971736280001 at news.nih.gov>,
newitt at nih.gov (John A. Newitt) wrote:

> In article <01BC943C.EC4625A0.mdorbell at es.co.nz>,
> "diane at phyton.otago.ac.nz" <diane at phyton.otago.ac.nz> wrote:
> 
> > I would like to search a large number of sequences to determine their 
> > susceptibility to being cut by two restriction digests.
> > 
> > I know that the map command can be used to search a particular sequence
to 
> > find restriction digest sites but is there a command that I can use to 
> > search multiple sequences at once?  It seems pretty tedious to have to 
> > search each sequence individually for the same enzyme sites.

> It used to be the case in an older version of GCG that if your directory
> contained all the sequence files, you could then make one new file (eg.,
> call it List) that was just a list of file names, and when the program (in
> this case MAP) asked you which file to use, you entered the name of that
> new file preceded by an at symbol (@List).  It then ran the program on all
> the files whose names were listed in @List.  I don't know if this still
> works, but I bet the answer would be found in the appendixes of GCG.

I don't know where I'm picking up this thread so my appologies if all answer
where given before.  I have recently had the same problem.  Although the
@list "technique" is very useful in GCG (e.g. when creating pile-ups) it
does not work with all programs... for example, it does not work with map
(at least not with my version 7.3.4).  Neither alas can you use wildcards
(e.g. map *.seq).

I have written a small and messy (but functional) perl script to perform the
task of finding all files in the format *.seq (i.e. any name with the
extension .seq) and creating a map for it.

------ Program starts ------

#!/usr/local/bin/perl

$workingDir=shift(@ARGV); # i.e. the project directory

$files=`ls $workingDir`;
@files=split("\n",$files);

foreach $arg (@files)  {
    $fileName=$arg;
	print "Dealing with $fileName ... ";
	if ($fileName=~/(.*)\.seq/)  {
		print "running map with $arg\n";
		`/gcg/gcgsoft/gcgcore/execute/map $arg -out=$1.map -enz=  -men=t -D`; 
		# in the line above I give the explicit path to the map command,
		# your path might be different or it might be already set in your
		# path environment variable.  Ask you system administrator if in doubt
		
		# Also, in the line above, you can set all the normal map variable
		# these are set after 'map $arg'. 
		# -out= --- sets the output filename.  In this case, sequence name with
.map extension
		# -enz= --- sets the enzymes in the name.  In this case ' ' just a space
i.e. no enzymes
		# -men= --- sets the translation param.  In this case 't' i.e. three
forward frames
		# -D    --- sets Default settings.  i.e. standard transalation, start on
base 1 end on last
		# You can also set a number of other params.  Type genhelp man, topic
command-line-summary
		# e.g. -beg= --- begining of map, -end= --- end of map
    } else {
    	print "file $arg does not match expected pattern... unchanged\n";
    };
};

------ Program ends -------

If you don't have perl, you could write a similar program in shell script.

I hope this helps.

Cheers,

Stuart.


+-------------------------+--------------------------------------+
| Stuart Rison            | Ludwig Institute for Cancer Research |
| Tel. (0171) 878 4127    | Courtauld Building                   |
| Fax. (0171) 878 4040    | 91 Riding House Street               |
+-------------------------+ London, W1P 8BT                      |
| stuart at ludwig.ucl.ac.uk | UNITED KINGDOM.                      |
+-------------------------+--------------------------------------+




More information about the Methods mailing list