looking for proteins sequences

Reinhard Doelz doelz at comp.bioz.unibas.ch
Mon Dec 6 02:47:54 EST 1993


In article <2dsrc0$j26 at reseau.cict.fr>, maveyrau at ecstasy.NoSubdomain.NoDomain (Laurent Maveyraud) writes:
|> Query from mcpalmie at FOX.CCE.USP.BR (Mauricio Cesar Palmieri): 
|> ...
|> >ones. Somebody there could tell me some information about how is the 
|> >easiest manner to proceed a roughly search? (database, software, time 
|> >required, etc).

The software you may choose balances speed vs. sensitivity, and it very 
much depends on the question you ask. There are three options 

(1) Rely on the network 
(2) rely on your site
(3) Do everything yourself

Option three, obviously, is time consuming and usually does not pay 
off. You should not need to install software and databases yourself if 
you work in a larger environment because central services are cost-
effective and therefore provided usually. If you do not enjoy such 
comfort but need to initiate one the best were to visit a site where 
it's already running (better even, two or three different sites 
running different environments). 

Option one, the network, may turn out to be a Damokles sword; 
as you get what you paid for. Most of the services are "free". 
Many of the services provided on the net are spawned from research 
activities and therefore continuously suffer from funding, thus, 
are continuously in danger to degrade, or cease to exist. 

There are a few exceptions, thogh, like the services of the 
EMBL data library or the national services from EMBnet in Europe, 
and the NCBI in the US. 

The available services on the network have been collected by Amos Bairoch
and are available on various servers, on our mirror server you will 
find the data on nic.switch.ch as
 mirror/embnet-ch/other-data/info/serv_ema.txt.Z

With respect to option two, the central services, you might want 
to consider querying your national Center in Brasil. Goran  
(adresses below) will provide you answers to your questions if you 
would like to get either services there or an advice. 

Any EMBnet node has an entry in 'The Nodes of EMBnet' database which 
is available as Postscript format on nic.switch.ch in 
 mirror/embnet-ch/info/embnet
The embnet.dat file is a plain ASCII file and there are nice postscript 
versions with additional information as well. 

There is a Brasilian node in Brasilea which serves the community according
to as imilar structure as the EMBnet nodes do in Europe, the 
Brazilian Molecular Biology and Biotechnology Network (BMBBNet). For more
information, ask at 

 ------------------------------------------------------------------------
| Goran Neshich                    |                                     |
| CENARGEN/EMBRAPA                 | e-mail: neshich at cenargen.embrapa.br | 
| S.A.I.N. Parque Rural, Final W5  |                                     |
| Asa Norte                        |  Phone:  +55 (61)273-0100 ext 127   |
| 70770-900, Brasilia-D.F.-BRASIL  |  Fax:    +55 (61)274-3212           |
 ------------------------------------------------------------------------

|> 
|> 
|> One of the numerous way is to use the UWGCG package, with the database 
|> Genembl. The option is TFASTA. This goes through the whole DNA database,

TFASTA belongs to the FASTA package written by William R. Pearson 
originally described in Science (Lipman and Pearson, (1985) Science 
227:1435-1441). The Genetics Computer Group, Inc. (Madison, Wisconsin) 
distributes a version which is adapted to their package. 

...
|> Software : UWGCG (University of Wisconsin Genetic Computer Group) it is
|> available nearly everywhere there are molecular biology labs

The package in question arose from an academic group (UWGCG) but is now
comercially managed (Info at beers at gcg.com). Due to its history and 
economical factors, it is installed at a large basis. We have enjoyed 
excellent results arising from running this package since 1987. However,
it is important to emphasize that other packages are available, such
as the Intelligenetics Software Packahe (known as IG Suite, Info 
at ig-consultant at presto.ig.com), and others. 

There are many packages available from the public domain, though, these 
usually cover only a special (but very extensively covered) set of 
a particular activity, e.g. sequence searching, e.g., he FASTA was mentioned 
above, and the NCBI has released the 'BLAST' suite of programs. 

|> 
|> Database : Genembl, which is also from UWGCG, and is updated every 15 days (I think...)

'Genembl' is a term for a merged database used in the GCG package. 
On the basis of accession numbers, Genbank and EMBL are excluded 
in the corresponding database; e.g., in the US you would 
normally run GENBANK with a EMBL exclusion set, and in Europe you will 
have a EMBL with a Genbank exclusion set. The update frequency 
is a matter of your administrator, we do EMBL daily and GENBANK 
weekly, most sites in Switzerland maintain a weekly GCG GENEMBL data 
update. There oare other merged collections out there, PATCHX, OWL, 
and others. 

|> 
|> Time : well, this depends mainly on the machine you are using. It can range from 1hour to 15 min.

BLAST (software from the NCBI) takes only a few seconds if peptide 
runs vs. peptice straight, and a Smith and Waterman search on a 
massively parallel machine (software from the Edinburgh team) doen't 
take longer either. The latter run on a deskside, 4 year old hardware 
might easily take 5 hours CPU to complete. The search software 
available ranges from scans of identity to sophisticated methods of 
fragment pattern searching, and from browsing the database with
 quick-but-dirty to extensive homology evaluations searching with 
profiles. ISEARCH (PIR), (T)FASTA (Pearson), BLAST (NCBI), 
WORDSEARCH(UWGCG), TWORDSEARCH(Rice), BLAZE (Intelligenetics), 
MPsearch(Collins et al, also as BLITZ), PROFILE methods (GRIBSKOV), 
FLASH (IBM), and many, many other programs have a wide area 
of application. 

So the range depends on 
	* algorithm used 
	* hardware used 
	* question asked 
	* (occasionally) length of the sequence 


Regards
Reinhard

-- 
+----------------------------------+-------------------------------------+
|    Dr. Reinhard Doelz            | RFC     doelz at urz.unibas.ch         |
|      Biocomputing                | DECNET  20579::48130::doelz         |
|Biozentrum der Universitaet       | X25     022846211142036::doelz      |
|   Klingelbergstrasse 70          | FAX     x41 61 261- 6760 or 267- 2078     
|     CH 4056 Basel                | TEL     x41 61 267- 2076 or 2247    |   
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
                     ftp mirror at nic.switch.ch 
               -----------------------------------------




More information about the Bio-soft mailing list