From iain.m.wallace at gmail.com Mon Oct 10 07:57:13 2005 From: iain.m.wallace at gmail.com (Iain Wallace) Date: Mon Oct 10 08:02:32 2005 Subject: [Bio-srs] Message-ID: <8cff3eb80510100557g705a7ffco699fac8aad0ea0b6@mail.gmail.com> Hi, I am trying to return the embl entries for a list of uniprot entries. I use the following command. getz '(@testing > embl)' where the file testing contains: uniprot:CYGB_MOUSE uniprot:GLB1_SCAIN The output is: EMBL:AK019410 EMBL:MMU315163 EMBL:BC055040 Is there any way of viewing the Uniprot ID's aswell as the EMBL ID; My ideal output would be EMBL:AK019410 UNIPROT:CYGB_MOUSE EMBL:MMU315163 UNIPROT:CYGB_MOUSE EMBL:BC055040 UNIPROT:CYGB_MOUSE I have tried getz '(@testing > embl) > uniprot' but this only returns one entry, rather than three.. I want to parse out the results into individual files according to the uniprot id. I believe it is possible using views and wgetz, but I would prefer not to use wgetz Any help would be greatly appreciated. Iain From iain.m.wallace at gmail.com Mon Oct 10 07:57:15 2005 From: iain.m.wallace at gmail.com (Iain Wallace) Date: Mon Oct 10 08:03:11 2005 Subject: [Bio-srs] Uniprot and EMBL question Message-ID: <8cff3eb80510100557x58552b5u4c4b80939fb7d02a@mail.gmail.com> Hi, I am trying to return the embl entries for a list of uniprot entries. I use the following command. getz '(@testing > embl)' where the file testing contains: uniprot:CYGB_MOUSE uniprot:GLB1_SCAIN The output is: EMBL:AK019410 EMBL:MMU315163 EMBL:BC055040 Is there any way of viewing the Uniprot ID's aswell as the EMBL ID; My ideal output would be EMBL:AK019410 UNIPROT:CYGB_MOUSE EMBL:MMU315163 UNIPROT:CYGB_MOUSE EMBL:BC055040 UNIPROT:CYGB_MOUSE I have tried getz '(@testing > embl) > uniprot' but this only returns one entry, rather than three.. I want to parse out the results into individual files according to the uniprot id. I believe it is possible using views and wgetz, but I would prefer not to use wgetz Any help would be greatly appreciated. Iain From hpm at ebi.ac.uk Mon Oct 10 09:38:59 2005 From: hpm at ebi.ac.uk (Hamish McWilliam) Date: Mon Oct 10 19:57:31 2005 Subject: [Bio-srs] Uniprot and EMBL question In-Reply-To: References: Message-ID: <434A7D03.2050402@ebi.ac.uk> Hi Iain, > I am trying to return the embl entries for a list of uniprot entries. > I use the following command. > getz '(@testing > embl)' > where the file testing contains: > uniprot:CYGB_MOUSE > uniprot:GLB1_SCAIN > > The output is: > EMBL:AK019410 > EMBL:MMU315163 > EMBL:BC055040 > > Is there any way of viewing the Uniprot ID's aswell as the EMBL ID; > My ideal output would be > EMBL:AK019410 UNIPROT:CYGB_MOUSE > EMBL:MMU315163 UNIPROT:CYGB_MOUSE > EMBL:BC055040 UNIPROT:CYGB_MOUSE > > I have tried getz '(@testing > embl) > uniprot' > but this only returns one entry, rather than three.. > > I want to parse out the results into individual files according to the > uniprot id. > > I believe it is possible using views and wgetz, but I would prefer not > to use wgetz A simple solution is to use a shell script to do the relevant processing. For example: #!/bin/sh tab=`echo "\t"` for ln in `cat testing`; do getz "[$ln]>embl" | sed "s#\$#$tab$ln#" done This produces your desired result, but is inefficent for large lists of ids since each id is processed using an individual getz call. If your set of ids is the product of a query you could use an Icarus script to do the processing instead, and avoid some of the overhead involved in the getz calls. Hamish -- ============================================================ Mr Hamish McWilliam European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK URL: http://www.ebi.ac.uk/ ============================================================