From owner-bio-srs@hgmp.mrc.ac.uk  Thu Nov  9 13:25:26 2000
Return-Path: <owner-bio-srs@hgmp.mrc.ac.uk>
Received: by mercury.hgmp.mrc.ac.uk (Postfix, from userid 110)
	id 8D63417AFC; Thu,  9 Nov 2000 13:25:25 +0000 (GMT)
Received: by mercury.hgmp.mrc.ac.uk (Postfix, from userid 6015)
	id 0C4F817A6A; Thu,  9 Nov 2000 13:25:22 +0000 (GMT)
Received: by mercury.hgmp.mrc.ac.uk (Postfix, from userid 60001)
	id 131F717AD4; Thu,  9 Nov 2000 11:40:49 +0000 (GMT)
To: bionet-software-srs@net.bio.net
From: ichan-Yoshihiro@im.ac.cn (ICHIYANAGI Yoshihiro)
Newsgroups: bionet.software.srs
Subject: SRS indexing in parallel on distributed machines
Organization: IM, CAS
Message-ID: <3A0A8C39.50E1BFC4@im.ac.cn>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-2022-jp
Content-Transfer-Encoding: 7bit
X-Received: by mercury.hgmp.mrc.ac.uk (Postfix, from userid 6024)
	id A688017A89; Thu,  9 Nov 2000 11:40:43 +0000 (GMT)
X-Received: from mail.im.ac.cn (mail.im.ac.cn [159.226.66.130])
	by mercury.hgmp.mrc.ac.uk (Postfix) with SMTP id 28D7A415EA
	for <bio-srs@net.bio.net>; Thu,  9 Nov 2000 11:37:58 +0000 (GMT)
X-Received: (qmail 24101 invoked by uid 0); 9 Nov 2000 11:33:12 -0000
X-Received: from bio-mirror.im.ac.cn (HELO im.ac.cn) (159.226.80.8)
  by mail.im.ac.cn with SMTP; 9 Nov 2000 11:33:12 -0000
X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i686)
X-Accept-Language: ja, en
X-To: bio-srs@net.bio.net
Date: Thu,  9 Nov 2000 13:25:22 +0000 (GMT)
Sender: owner-bio-srs@hgmp.mrc.ac.uk
Precedence: bulk

Dear all,

Now we're using SRS for Bio-mirror to provide bio-mirror
databases with search function.
For SRS we have to make indices of each Databases.
It takes a long time to make indices such as DDBJ(48GB),
GENBANK(39GB) and EMBL(37GB) and so on.
We need a powerful computer with many CPU to parallelize
indexing quickly. It's very expensive.
So I tried to make programs, which can make SRS
Databanks' indices concurrently with distributed machines
using Java, HORB(a kind of Java Object Request Broker)
, NFS, Perl and shell scripts on RedHat 6.2.

I know that SRS has a function, which can parallelize indexing
for a multiprocessor machine. I've made use of this function
(parallelType:files in icarus files) to create proto-type system
on distributed computers. Then got pretty good results.

At this time I tried to make DDBJNEW indices with four Linux
machines. There are 102 files of DDBJNEW on our ftp site.
And also I tried to make Genbank indices with three Linux 
machines. There are 75 files of Genbank on our ftp site.

These results were as follows:
--------------------------------------------------------------
DDBJNEW:
Linux No.1:1CPU(PIII 733MHz) mem512M created 28 files' indices
Linux No.2:1CPU(PIII 600MHz) mem512M created 38 files' indices
Linux No.3:1CPU(Celron500MHz)mem128M created 29 files' indices
Linux No.4:1CPU(PIII 450MHz) mem128M created  7 files' indices
--------------------------------------------------------------                                     
total 102 files

Indexing with parallel on four machines (TIME) : 03 hours 52 minutes
Marging indices on Linux No.1           (TIME) : 00 hours 50 minutes
--------------------------------------------------------------
Genbank:
Linux No.1:                          created 25 files' indices
Linux No.2:                          created 28 files' indices
Linux No.3:                          created 22 files' indices
--------------------------------------------------------------
                                       total 75 files

Indexing with parallel on three machines (TIME) : 03 hours 54 minutes
Marging indices on Linux No.1            (TIME) : 00 hours 39 minutes

No.1 is SRS web server, No.2 is ftp and web server,
No.3 is mail server and No.4 is just my PC.
No.1 is destination server and others are remote agent machines
to make indexing in this system.

Before it took more than 13 hours on No.1 machine.

For remote agent machines destination server must export
DATABANKS data directories to be indexed, and $SRSICA, $SRSDB 
configuration files to make indexing with same configurations,
and $SRSINX directory to abbreviate gathering distributed indices
using mount. These are a little mess to set up.

Please give me some advices,if you're interested in this subject,
and you know similar approaches as mine.

Wishes,
ichan
---------------------------
ICHIYANAGI Yoshihiro
Institute of Microbiology,
Chinese Academy of Sciences


---




From owner-bio-srs@hgmp.mrc.ac.uk  Fri Nov 17 15:20:39 2000
Return-Path: <owner-bio-srs@hgmp.mrc.ac.uk>
Received: by mercury.hgmp.mrc.ac.uk (Postfix, from userid 110)
	id 57D1617B41; Fri, 17 Nov 2000 15:20:39 +0000 (GMT)
Received: by mercury.hgmp.mrc.ac.uk (Postfix, from userid 6015)
	id 42A0617B40; Fri, 17 Nov 2000 15:20:37 +0000 (GMT)
Received: by mercury.hgmp.mrc.ac.uk (Postfix, from userid 6024)
	id 61BFD17BDD; Fri, 17 Nov 2000 11:37:43 +0000 (GMT)
Received: from niobium.hgmp.mrc.ac.uk (niobium [193.62.192.41])
	by mercury.hgmp.mrc.ac.uk (Postfix) with ESMTP id AB369415D4
	for <bionet-software-srs@net.bio.net>; Fri, 17 Nov 2000 11:37:17 +0000 (GMT)
Received: (from news@localhost)
	by niobium.hgmp.mrc.ac.uk (8.9.3+Sun/8.8.8) id LAA19327
	for bionet-software-srs@net.bio.net; Fri, 17 Nov 2000 11:37:16 GMT
To: bionet-software-srs@net.bio.net
From: Heikki Lehvaslaiho <heikki@ebi.ac.uk>
Newsgroups: bionet.software.srs
Subject: dbSNP in SRS
Organization: EMBL - EBI
Message-ID: <3A15186C.427B43B5@ebi.ac.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Trace: niobium.hgmp.mrc.ac.uk 974461036 19325 193.62.199.73 (17 Nov 2000 11:37:16 GMT)
X-Complaints-To: news@net.bio.net
NNTP-Posting-Date: 17 Nov 2000 11:37:16 GMT
	Steve Sherry <sherry@ray.nlm.nih.gov>
X-Mailer: Mozilla 4.72 [en] (X11; I; IRIX64 6.5 IP28)
X-Accept-Language: en
Date: Fri, 17 Nov 2000 15:20:37 +0000 (GMT)
Sender: owner-bio-srs@hgmp.mrc.ac.uk
Precedence: bulk


Dear all,

There have been quite a lot of interest in getting dbSNP into SRS.
I've now written the first version of parsers for the new dbSNP
distribution
(XML) format. Since the XML support for SRS is not yet ready and I had
parsing code for it anyway, I created three flat files from XML and
wrote parsers for them; quick and easy. Note that the XML will be
change and have richer representation of dbSNP relational database
schema in near future. Also, note that current dump in the NCBI ftp
server contains data submitted before September 5th, only. 

The URLs:

The EBI SRS server:
	http://srs.ebi.ac.uk/

dbSNP disribution files:
	ftp://ftp.ncbi.nlm.nih.gov/snp/pc_compressed/XML_exchange/

Perl XML parser (needs XML::Parser and XML::Node) and icarus files:
	http://www.ebi.ac.uk/mutations/progs/dbsnp.tar.gz


  -Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________





