gcg package

Reinhard Doelz doelz at comp.bioz.unibas.ch
Thu Aug 5 12:45:29 EST 1993

In article <2033 at alsys1.aecom.yu.edu>, zucker at leper1.ca.aecom.yu.edu (Thomas Zucker-Scharff x3513) writes:
|> In article 19595 at gserv1.dl.ac.uk, bss1jm at surrey.ac.uk (Dr Johnjoe Mcfadden) writes:
|> >Does anybody know how much memory do we need to run the'gcg package'
|> >from Genetics Computer Group Inc?
|> I would suggest running GCG from a Sparcstation 10 with at least 48mb of ram,
|> preferrably more.  The pamphlet includes the following info:
|> We are just switching over from the VMS version to the UNIX version on an SGI
|> Challenge M series with 96mb of ram and 5gig of diskspace.

It is basically irrelevant how much memory is _required_. It is the 
number of USERS and the number of jobs you want to run simultaneously 
which has to be accounted for. I personally run GCG on an AXP 4xxx
(128 MB), a VAXCluster (7xxx (256 MB), 6xxx (64 MB)) and a Silicon 
Graphics UNIX Cluster (Crimson 64MB, and some Indigos down to 16MB). 
We have about 300 Users in the GCG accounting file, about 80 of them 
being 'power-users' using the system daily and several times a week. 
Disks are in the order of > 15 GB.

As a rule of thumb, 8 MBytes of RAM and 50MB disk per head are a good start
for a workgroup (5 to 20 people). If you run more users, think of 
per simultaneous _active_head in terms of RAM, but account for the same 
disk space. In the end, add at least 1 Gigabyte for scratch space and 
backup (if a disk crashes). To run all sequence databases you need 
about 8Gigabyte at the end of 1994 (might be wrong, but my schedule
accounts for this). 

Big memory is needed for all programs which use virtual memory to access
indices. This is not particularly a GCG feature, and will apply for 
most searching programs which use dynamic mempry allocation to larger 
extent. The 'QUICKSEARCH' program of GCG is a memory hog, and you will 
not necessarily want to run it. BLAST (from NCBI) certainly runs also
much faster if you give it more memory to live in. 

Another valuable figure worth looking for is the I/O speed of the disk
(sub)system and the caching options of the processor. Depending on the 
code, searches might become I/O bound. If you run off a small disk 
in PC-style fashion, your CPU might start idling and wastes time for 
I/O. Again, if you look at (whatever)-table oriented software ith will 
require some time to load the data, and the actual computation is a breeze. 

Do not seriously expect that you can run a cluster via NFS for sequence 
searching if you have only Ethernet. The workstations of today 
splatter 2 MBit/sec on a routine basis, and I managed to crash an ethernet
with too many collissions and 8 workstations running FASTA accross the net.

And, last not least, don't forget to order a batch system as well. UNIX 
might be great but usually does not come with a generic batch system. 
Unless you have very few users you might want to impose restrictions 
on the use of cpu-intensive software and send it to batch (can be easily 
achieved in most software, also at GCG). 


|    Dr. Reinhard Doelz            | RFC     doelz at urz.unibas.ch         |
|      Biocomputing                | DECNET  20579::48130::doelz         |
|Biozentrum der Universitaet       | X25     022846211142036::doelz      |
|   Klingelbergstrasse 70          | FAX     x41 61 261- 6760 or 267- 2078     
|     CH 4056 Basel                | TEL     x41 61 267- 2076 or 2247    |   
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
                     ftp mirror at nic.switch.ch 

More information about the Bio-soft mailing list