From Yair.Pilpel from mpimf-heidelberg.mpg.de Fri Apr 4 05:19:19 2008 From: Yair.Pilpel from mpimf-heidelberg.mpg.de (Yair Pilpel) Date: Fri Apr 4 10:27:48 2008 Subject: [Protein-analysis] T7 terminator sequence Message-ID: Dear Prof Gegenheimer, My name is Yair Pilpel, and I am a post-doctoral researcher in the Max Planck Institute in Heidelberg, Germany. I am sorry to disturb you, but I have seen your thread on T7 RNA polymerase terminator sequence. I have been working with a T7-based expression system in mammalian cells, and have taken the promoter, followed by an IRES sequence and my protein, and then the T7 terminator sequence. Thus far it seems to be working, even rather well. However, I have noticed that in my cloning I have made a mistake and cloned instead of the ("classic") terminator sequence: TAGCAtaaCCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTG the following sequence: TAGCAggcatgcCCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTG Don't ask me how that happened, long story... I've marked the differences between my and the classic sequence. In any event, I am now unsure whether or not my expression would be reduced by this change. Since I've already used this basic vector to clone in a pretty large number of proteins, I am naturally averse to reclone this sequence. However,it can (and will be done) if necessary. Do you think that this terminator sequence should still be working? Another question that I have regards the exact site of termination. I was thinking of using the T7 system to clone short-hairpin RNA's. The thing here is that I would then need the termination point to be very close to the beginning of the termination sequence. At least I would need to know that there are no secondary hairpins being formed after my RNA sequence. I guess probably that this is not the best expression system for shRNA's, but would greatly appreciate your opinion. Thanks a lot, Yair From nq_flipper from yahoo.com Tue Apr 8 08:12:18 2008 From: nq_flipper from yahoo.com (Chris X. Weichenberger) Date: Tue Apr 8 19:29:59 2008 Subject: [Protein-analysis] Errors in protein structures Message-ID: Dear all, We announce the release of NQ-Flipper, a public service for the visualization and correction of unfavorable/incorrect asparagine and glutamine side-chain rotamers. For further information on the NQ-rotamer problem and the NQ-Flipper service see Weichenberger, Byzia & Sippl (2008), Visualization of unfavorable interactions in protein folds. Bioinformatics Advance Access published online on March 29, 2008. http://dx.doi.org/10.1093/bioinformatics/btn108 Weichenberger & Sippl (2007) Recognition and Correction of Erroneous Asparagine and Glutamine Side Chain Rotamers in Protein Structures. Nucleic Acids Res., 35(Web Server issue), W403-406. http://dx.doi.org/10.1093/nar/gkm263 NQ-Flipper is available as a public web service at http://flipper.services.came.sbg.ac.at or may be downloaded as a standalone application for Linux systems. Enjoy! The NQ-Flipper Team CAME - Center of Applied Molecular Engineering University of Salzburg Austria From ke_lu from yahoo.com Tue Apr 15 14:31:32 2008 From: ke_lu from yahoo.com (Luke) Date: Tue Apr 15 16:36:28 2008 Subject: [Protein-analysis] unknown tag sequence Message-ID: We got a vector from another lab. After sequencing, I found a fragment besides target gene: SDPLVQCGGILQISSTVAAARGHPFEGKPIPNPLLGLDSTRTG Anybody knows what is the name of this tag? I tried BLAST, but still have no idea. Luke From darek.kedra from gmail.com Wed Apr 16 05:56:45 2008 From: darek.kedra from gmail.com (darked) Date: Wed Apr 16 12:54:48 2008 Subject: [Protein-analysis] Re: unknown tag sequence References: Message-ID: On Apr 15, 8:31 pm, "Luke" wrote: > We got a vector from another lab. > After sequencing, I found a fragment besides target gene: > SDPLVQCGGILQISSTVAAARGHPFEGKPIPNPLLGLDSTRTG > > Anybody knows what is the name of this tag? > I tried BLAST, but still have no idea. > > Luke It is in your blastp results: i.e.: AAU14794. Reports neurotrophin 3 pr...[gi:51949752] V5 epitope tag (GKPIPNPLLGLDST). Best, darked http://openwetware.org/wiki/Wikiomics http://openwetware.org/wiki/Wikiomics:Bioinfo_tutorial From ke_lu from yahoo.com Wed Apr 16 09:58:09 2008 From: ke_lu from yahoo.com (Luke) Date: Wed Apr 16 12:55:01 2008 Subject: [Protein-analysis] Re: unknown tag sequence References: Message-ID: darked, Thanks very much for your help. Is SDPLVQCGGILQISSTVAAARGHPFE a tag? Luke "darked" wrote in message news:b7ee045b-e46b-4766-ba30-65120fe8a43f@2g2000hsn.googlegroups.com... > On Apr 15, 8:31 pm, "Luke" wrote: >> We got a vector from another lab. >> After sequencing, I found a fragment besides target gene: >> SDPLVQCGGILQISSTVAAARGHPFEGKPIPNPLLGLDSTRTG >> >> Anybody knows what is the name of this tag? >> I tried BLAST, but still have no idea. >> >> Luke > > It is in your blastp results: > i.e.: AAU14794. Reports neurotrophin 3 pr...[gi:51949752] > > V5 epitope tag (GKPIPNPLLGLDST). > > Best, > > darked > http://openwetware.org/wiki/Wikiomics > http://openwetware.org/wiki/Wikiomics:Bioinfo_tutorial From blackhole from abuse.plus.com Wed Apr 16 10:09:44 2008 From: blackhole from abuse.plus.com (Duncan Clark) Date: Wed Apr 16 12:55:06 2008 Subject: [Protein-analysis] Re: unknown tag sequence References: Message-ID: <63ozkcB4ahBIFA3f@abuse.plus.com> Historians believe that in newspost on Wed, 16 Apr 2008, Luke penned the following literary masterpiece: >Thanks very much for your help. >Is SDPLVQCGGILQISSTVAAARGHPFE a tag? Doesn't appear to be from Blast but try blasting the DNA sequence rather than protein sequence. Duncan -- I love deadlines. I especially like the whooshing noise they make as they go flying by. Duncan Clark GeneSys Ltd. From fapalida from proteinlabs.com Wed Apr 16 11:23:52 2008 From: fapalida from proteinlabs.com (F. A. Palida) Date: Wed Apr 16 12:55:11 2008 Subject: [Protein-analysis] Protein elution from SDS-PAGE gel ?? Message-ID: <48062818.6050003@proteinlabs.com> Hello, Can you please send me more details on how to elute from SDS-PAGE using hydroxylapatite? Thank you very much Palida From zambonel from gmail.com Thu Apr 17 12:35:14 2008 From: zambonel from gmail.com (Carlo Zambonelli) Date: Fri Apr 18 00:12:02 2008 Subject: [Protein-analysis] irreversible protein aggregate Message-ID: <48078A52.90001@gmail.com> Hi, I am trying to purify a 135 kDa protein, predicted pI=5.2. I can efficiently express my protein as a fusion with MBP. Affinity purification on amylose resin works fine at pH 6.2 while at pH 8.0 most of the MBP fusion does not bind to the column. This has to do with charge distribution/neutralization but it is (probably) irrelevant to my problem. After elution from amylose resin, I cleave off MBP and the protein I obtain is a large Mw aggregate. Purity determined after SDS-PAGE is no more than 60%. I tried up to 10% glycerol, 1 M NaCl, 10 mM DTT and a number of detergents. I obtained some very slight improvement with the above conditions and CHAPS as the detergent, but I still have large aggregates as determined both by GPC and DLS. Desperate, I decided to try and unfold/refold my protein, which is NOT an enzyme and for which protein I do NOT have a functional assay! So far I tried denaturing with 1%SDS, 8M urea, 6M guanidine. In the presence of SDS I seem to obtain some reduction in the size of the aggregate (GPC, DLS), while with urea and guandidine I do not see any change (DLS) in the size of aggregate! Any comment or suggestion? Is it possible that proteins aggregate irreversibly? Thanks, Carlo From sticher from bioc.unizh.ch Fri Apr 18 08:05:06 2008 From: sticher from bioc.unizh.ch (Patrick Sticher) Date: Fri Apr 18 11:53:30 2008 Subject: [Protein-analysis] 6th International NCCR Symposium on New Trends in Structural Biology Message-ID: <48089C82.8010700@bioc.unizh.ch> Dear colleagues, please be informed that the registration slot for the 6th International NCCR Symposium on New Trends in Structural Biology 8 + 9 September 2008, University of Z?rich, Lecture Hall KOH-B10, Z?rich, Switzerland is now open. Online registration is possible directly from the symposium website: www.structuralbiology.uzh.ch/symposium2008.asp where you will also find further information about this event. Confirmed plenary lecturer to date: Stephen C. Kowalczykowski, Keiichi Namba, Poul Nissen, Andrej Sali, Titia Sixma, Jeffrey Skolnick, A. Joshua Wand Please do not hesitate to contact me anytime if you need further information (sticher@bioc.uzh.ch). With best regards, Patrick Sticher The NCCR Structural Biology is a research initiative of the Swiss Science Foundation. Its research encompasses the fields of recombinant protein technologies, macromolecular structure determination and computational biomolecular sciences with a special focus on membrane proteins and supramolecular assemblies/interactions. 19 research groups from Swiss Universities and Research Institutions participate in this network. www.structuralbiology.uzh.ch/ _________________________________ Dr. Patrick Sticher Moser NCCR Scientific Officer Institute of Biochemistry University of Z?rich Winterthurerstrasse 190 CH - 8057 Z?rich Phone +41 / (0)44 / 635 54 84 Fax +41 / (0)44 / 635 59 08 From narges.habibi from gmail.com Wed Apr 23 06:25:42 2008 From: narges.habibi from gmail.com (Narges Habibi) Date: Wed Apr 23 11:51:27 2008 Subject: [Protein-analysis] About PDB Files and Secondary Structures Message-ID: <4f8cb9900804230425q621e90a7wa41467d7263badf8@mail.gmail.com> Hi all, I'm doing a project on "Protein Contact Map Prediction" and I use some features for nueral network's input, including Secondary Structure of a given Amino Acid. There are several ways: 1- getting dssp file for each pdb file (from ftp server) 2- extracting from pdb file (The HELIX and SHEET and TURN section) 3- getting ss file from www.pdb.org (as I see the given sequences in this file don't match with the pdb files, why?) What do you suggest? What method is more accurate? Thanks in advance -- Narges Habibi From dan.bolser from gmail.com Wed Apr 23 07:12:53 2008 From: dan.bolser from gmail.com (Dan Bolser) Date: Wed Apr 23 11:51:41 2008 Subject: [Protein-analysis] Re: pdb-l: About PDB Files and Secondary Structures In-Reply-To: <4f8cb9900804230425q621e90a7wa41467d7263badf8@mail.gmail.com> References: <4f8cb9900804230425q621e90a7wa41467d7263badf8@mail.gmail.com> Message-ID: <2c8757af0804230512u1e1e9ac9p1224e446160acb15@mail.gmail.com> 2008/4/23 Narges Habibi : > Hi all, > > I'm doing a project on "Protein Contact Map Prediction" and I use some > features for nueral network's input, including Secondary Structure of a > given Amino Acid. There are several ways: > > 1- getting dssp file for each pdb file (from ftp server) This method has the advantage of giving you a consistent definition of what is helix / sheet / turn / etc., but the disadvantage of sometimes missing short regions of sheet in particular. > 2- extracting from pdb file (The HELIX and SHEET and TURN section) The problem here is that the given definitions may be inconsistently applied, depending on the tastes of the particular PDB author. I don't know to what extent the annotations in the PDB entries match DSSP. > 3- getting ss file from www.pdb.org (as I see the given sequences in this > file don't match with the pdb files, why?) I don't know about this method. > What do you suggest? What method is more accurate? If we knew that we wouldn't have more than one method ;-) In reality what you mean by 'accurate' varies depending on the application. > Thanks in advance > > -- > Narges Habibi > > TO UNSUBSCRIBE OR CHANGE YOUR SUBSCRIPTION OPTIONS, please see > https://lists.sdsc.edu/mailman/listinfo.cgi/pdb-l . > Dan. From rhuehne from fli-leibniz.de Wed Apr 23 07:22:40 2008 From: rhuehne from fli-leibniz.de (Rolf Huehne) Date: Wed Apr 23 11:51:46 2008 Subject: [Protein-analysis] Re: pdb-l: About PDB Files and Secondary Structures In-Reply-To: <4f8cb9900804230425q621e90a7wa41467d7263badf8@mail.gmail.com> References: <4f8cb9900804230425q621e90a7wa41467d7263badf8@mail.gmail.com> Message-ID: <480F2A10.8040003@fli-leibniz.de> Narges Habibi wrote: > Hi all, > > I'm doing a project on "Protein Contact Map Prediction" and I use some > features for nueral network's input, including Secondary Structure of a > given Amino Acid. There are several ways: > > 1- getting dssp file for each pdb file (from ftp server) > 2- extracting from pdb file (The HELIX and SHEET and TURN section) > 3- getting ss file from www.pdb.org (as I see the given sequences in this > file don't match with the pdb files, why?) > > What do you suggest? What method is more accurate? > You should be careful if you use the secondary structure assignments provided by the authors in the PDB file. You might expect that they are of high quality because they were curated by people who know the proteins quite well. But I noticed that there are overlaps between different secondary structure elements, for example in entry '1AMR': HELIX 13 HM PHE A 352 LYS A 355 1 TURN 16 T16 ILE A 353 GLN A 356 TYPE I (The helix at position 352-355 overlaps with a turn at position 353-356!?) I analyzed this a few weeks ago for all remediated PDB files and detected overlaps within about 2000 entries. A summary text file (15 KB) containing the PDB codes and the corresponding number of overlapping residues is (temporarily) available at this URL: http://www.fli-leibniz.de/~rhuehne/jmol/analyze_sec_struct-2008_02_26b-overlap.txt (Note: The numbers are not always correct because for simplification and speed-up I didn't took into account numbering irregularities in the non-border residues of an element, like insertion codes or nice residue number sequences like "28,328,29" in entry '1BLB'.) The full output (9 MB) is (temporarily) available at this URL: http://www.fli-leibniz.de/~rhuehne/jmol/analyze_sec_struct-2008_02_26b-log.txt Besides not all PDB entries contain secondary structure assignments. Regards, Rolf From msauder from sgxpharma.com Wed Apr 23 10:16:38 2008 From: msauder from sgxpharma.com (Michael Sauder) Date: Wed Apr 23 11:51:50 2008 Subject: [Protein-analysis] RE: pdb-l: About PDB Files and Secondary Structures In-Reply-To: <4f8cb9900804230425q621e90a7wa41467d7263badf8@mail.gmail.com> References: <4f8cb9900804230425q621e90a7wa41467d7263badf8@mail.gmail.com> Message-ID: <9A698C4AB21AE2449A18EF2E9ABDDEAA03EAEFF5@exchange.ds.stromix.com> The best source of sequence files based on PDB structures is either from the Astral database or from Roland Dunbrack's lab. He calls his PDB sequence resource PISCES: http://dunbrack.fccc.edu/PISCES.php. It is updated weekly and all PDB protein sequences are available at various %identity and resolution cutoffs. His S2C files also provide a mapping between SEQRES and ATOM residues, which might solve your other problem. In addition, it lists the secondary structure as defined in the PDB file and as calculated by a single program (STRIDE). So this also might save you a lot of time. You can download his S2C files for the entire PDB. His sequence files correctly translate modified residues to their nearest equivalent, as opposed to other sites like NCBI, which translate them as X's. (e.g., selenomethionine is M instead of X, phosphorylated tyrosine is Y instead of X, etc.) There are two sequences that are relevant to any given chain in a PDB file. One is the "SEQRES" sequence, and one is what I call the "ATOM" sequence, or the residues that are visible in the electron density and reported in the ATOM records. The SEQRES sequence (from the SEQRES records in the PDB file) should, in theory, reflect all the residues in the protein that crystallized, including those that might be disordered. The "ATOM" sequence is derived from the ATOM/HETATM coordinate records and reflects only what could be fit from the experimental electron density. For example, your structure might be a homotetramer. There will be four sets of SEQRES records, chains A,B,C,D, all of which are identical. On the other hand, different residues in each of the four monomers might be disordered in the electron density, so the sequences corresponding to the "ATOM sequence" of the four chains may all be different. Dunbrack's website provides a mapping of the residue numbering between the SEQRES sequence and the ATOM sequence: http://dunbrack.fccc.edu/Guoli/s2c/index.php For example, the SEQRES sequence will always start at 1 and go to n, where n is the length of the SEQRES sequence. This would be the numbering if you used the sequence for a BLAST search. On the other hand, the ATOM residue numbering might be based (as it should) on the full length biologically relevant protein. For example, Q9KL26 from V. cholerae is predicted to be a membrane bound protein and the structure of only the N-terminal half of the protein is in the PDB (3c8c). If any residues were disordered, these would be listed in the SEQRES column. The S2C file provides the residue number mapping between the SEQRES and ATOM sequence, as well as the secondary structure and %solvent accessibility: SEQCRD A S SER SER 1 61 H T 97 SEQCRD A L LEU LEU 2 62 H T 25 SEQCRD A R ARG ARG 3 63 H T 82 SEQCRD A S SER SER 4 64 H T 47 SEQCRD A M MSE MSE 5 65 - - - SEQCRD A V VAL VAL 6 66 H H 6 ... Notice how the selenomethionine (MSE) is translated as M, but the secondary structure is not reported since a lot of these programs (e.g., STRIDE) don't know how to handle the HETATM records as part of the chain. That's probably why STRIDE annotates the first 4 residues as "turn" as opposed to "helix". Mike -----Original Message----- From: pdb-l-bounces@sdsc.edu [mailto:pdb-l-bounces@sdsc.edu] On Behalf Of Narges Habibi Sent: Wednesday, April 23, 2008 4:26 AM To: pdb-l@sdsc.edu; proteins@magpie.bio.indiana.edu Subject: pdb-l: About PDB Files and Secondary Structures Hi all, I'm doing a project on "Protein Contact Map Prediction" and I use some features for nueral network's input, including Secondary Structure of a given Amino Acid. There are several ways: 1- getting dssp file for each pdb file (from ftp server) 2- extracting from pdb file (The HELIX and SHEET and TURN section) 3- getting ss file from www.pdb.org (as I see the given sequences in this file don't match with the pdb files, why?) What do you suggest? What method is more accurate? Thanks in advance -- Narges Habibi TO UNSUBSCRIBE OR CHANGE YOUR SUBSCRIPTION OPTIONS, please see https://lists.sdsc.edu/mailman/listinfo.cgi/pdb-l . From karplus from soe.ucsc.edu Wed Apr 23 12:20:59 2008 From: karplus from soe.ucsc.edu (Kevin Karplus) Date: Wed Apr 23 14:39:17 2008 Subject: [Protein-analysis] Re: pdb-l: About PDB Files and Secondary Structures In-Reply-To: <4f8cb9900804230425q621e90a7wa41467d7263badf8@mail.gmail.com> (narges.habibi@gmail.com) References: <4f8cb9900804230425q621e90a7wa41467d7263badf8@mail.gmail.com> Message-ID: <200804231720.m3NHKxrT032725@cheep.cse.ucsc.edu> Narges Habibi wrote > I'm doing a project on "Protein Contact Map Prediction" and I use some > features for nueral network's input, including Secondary Structure of a > given Amino Acid. There are several ways: > > 1- getting dssp file for each pdb file (from ftp server) > 2- extracting from pdb file (The HELIX and SHEET and TURN section) > 3- getting ss file from www.pdb.org (as I see the given sequences in this > file don't match with the pdb files, why?) > > What do you suggest? What method is more accurate? None of the above. Predicting contact maps using known structure is cheating. You should be predicting the local structure, not extracting it from known structures. Any way that data from known structures can creep into your inputs invaliates your testing, and makes it impossible to say with confidence that your method does anything useful. Given the rather low-quality of contact prediction at the current state of the art, even small amounts of information from the real structure can make a big difference. The following paper by my student is a pretty good summary of the the best method as of CASP7---improvements since then have been modest: George Shackelford and Kevin Karplus. Contact Prediction using Mutual Information and Neural Nets. Proteins: Structure, Function, and Bioinformatics, 69(S8):159-164, 2007. (CASP7 sepcial issue). doi:10.1002/prot.21791 I see a lot of "prediction" work that is complete garbage, because the authors fooled themselves by using data that could only come from knowing the real structures. The even more common problem is insufficient separation of train and test sets, in which computer scientists assume that the random partition of a data set is all that is needed---but the sta sets we have aren't independent samples, so one has to go to some effort to ensure that the test set does not contain examples that are very close to training set examples. ------------------------------------------------------------ Kevin Karplus karplus@soe.ucsc.edu http://www.soe.ucsc.edu/~karplus Professor of Biomolecular Engineering, University of California, Santa Cruz Undergraduate Director, Bioinformatics (Senior member, IEEE) (Board of Directors & Chair of Education Committee, ISCB) Affiliations for identification only. From otastan from gmail.com Wed Apr 23 16:00:05 2008 From: otastan from gmail.com (Oznur Tastan) Date: Wed Apr 23 19:39:15 2008 Subject: [Protein-analysis] About PDB Files and Secondary Structures In-Reply-To: <4f8cb9900804230425q621e90a7wa41467d7263badf8@mail.gmail.com> References: <4f8cb9900804230425q621e90a7wa41467d7263badf8@mail.gmail.com> Message-ID: <63aeb8db0804231400m477ff1d6lf088c4a1e1b689a2@mail.gmail.com> Hi Narges, I recommend to have a look at the following article for comparison of different methods and for other secondary structure assignment methods that you have not listed: *Zhang W, Dunker AK, Zhou Y.**Assessing secondary structure assignment of protein structures by using pairwise sequence-alignment benchmarks, * Proteins. 2008 Apr;71(1):61-7. Different methods would success and fail at different cases, I would build the model by incorporating multiple method's assignments. Hope that helps, Oznur Tastan On Wed, Apr 23, 2008 at 7:25 AM, Narges Habibi wrote: > Hi all, > > I'm doing a project on "Protein Contact Map Prediction" and I use some > features for nueral network's input, including Secondary Structure of a > given Amino Acid. There are several ways: > > 1- getting dssp file for each pdb file (from ftp server) > 2- extracting from pdb file (The HELIX and SHEET and TURN section) > 3- getting ss file from www.pdb.org (as I see the given sequences in this > file don't match with the pdb files, why?) > > What do you suggest? What method is more accurate? > > Thanks in advance > > -- > Narges Habibi > _______________________________________________ > Proteins mailing list > Proteins@net.bio.net > http://www.bio.net/biomail/listinfo/proteins > -- oznur From pcxpj1 from nottingham.ac.uk Wed Apr 23 15:27:28 2008 From: pcxpj1 from nottingham.ac.uk (Pooja Jain) Date: Thu Apr 24 11:43:45 2008 Subject: [Protein-analysis] 3D Structural Properties : From DSSP or PDB or ? In-Reply-To: <200804231720.m3NHKxrT032725@cheep.cse.ucsc.edu> References: <4f8cb9900804230425q621e90a7wa41467d7263badf8@mail.gmail.com> <200804231720.m3NHKxrT032725@cheep.cse.ucsc.edu> Message-ID: <2A7C9524-109B-4374-9977-2ADA2BF617C6@nottingham.ac.uk> Ciao, To continue this discussion, I am very much interested in what others say about the best approach to get the 3-D structural properties for the disordered regions or the secondary structures elements, may be for the purpose of training some machine learning algorithm or guiding MD simulations of a deemed homologous protein with unknown structure? Should it be DSSP or PDB or something else that I am not aware of ? Thank you. -Pooja On 23 Apr 2008, at 19:20, Kevin Karplus wrote: > > > Narges Habibi wrote > >> I'm doing a project on "Protein Contact Map Prediction" and I use >> some >> features for nueral network's input, including Secondary Structure >> of a >> given Amino Acid. There are several ways: >> >> 1- getting dssp file for each pdb file (from ftp server) >> 2- extracting from pdb file (The HELIX and SHEET and TURN section) >> 3- getting ss file from www.pdb.org (as I see the given sequences >> in this >> file don't match with the pdb files, why?) >> >> What do you suggest? What method is more accurate? > > None of the above. > > Predicting contact maps using known structure is cheating. You should > be predicting the local structure, not extracting it from known > structures. Any way that data from known structures can creep into > your inputs invaliates your testing, and makes it impossible to say > with confidence that your method does anything useful. Given the > rather low-quality of contact prediction at the current state of the > art, even small amounts of information from the real structure can > make a big difference. > > The following paper by my student is a pretty good summary of the the > best method as of CASP7---improvements since then have been modest: > > George Shackelford and Kevin Karplus. > Contact Prediction using Mutual Information and Neural Nets. > Proteins: Structure, Function, and Bioinformatics, > 69(S8):159-164, 2007. (CASP7 sepcial issue). > doi:10.1002/prot.21791 > > I see a lot of "prediction" work that is complete garbage, because the > authors fooled themselves by using data that could only come from > knowing the real structures. The even more common problem is > insufficient separation of train and test sets, in which computer > scientists assume that the random partition of a data set is all that > is needed---but the sta sets we have aren't independent samples, so > one has to go to some effort to ensure that the test set does not > contain examples that are very close to training set examples. > > ------------------------------------------------------------ > Kevin Karplus karplus@soe.ucsc.edu http://www.soe.ucsc.edu/~karplus > Professor of Biomolecular Engineering, University of California, > Santa Cruz > Undergraduate Director, Bioinformatics > (Senior member, IEEE) (Board of Directors & Chair of Education > Committee, ISCB) > Affiliations for identification only. > > > TO UNSUBSCRIBE OR CHANGE YOUR SUBSCRIPTION OPTIONS, please see > https://lists.sdsc.edu/mailman/listinfo.cgi/pdb-l . From pcxpj1 from nottingham.ac.uk Wed Apr 23 15:34:07 2008 From: pcxpj1 from nottingham.ac.uk (Pooja Jain) Date: Thu Apr 24 11:43:52 2008 Subject: [Protein-analysis] Re: pdb-l: About PDB Files and Secondary Structures In-Reply-To: <2c8757af0804230512u1e1e9ac9p1224e446160acb15@mail.gmail.com> References: <4f8cb9900804230425q621e90a7wa41467d7263badf8@mail.gmail.com> <2c8757af0804230512u1e1e9ac9p1224e446160acb15@mail.gmail.com> Message-ID: <673416FE-3DBF-4900-94A1-61CF8069A98E@nottingham.ac.uk> Hi Dan, >> >> I'm doing a project on "Protein Contact Map Prediction" and I use >> some >> features for nueral network's input, including Secondary Structure >> of a >> given Amino Acid. There are several ways: >> >> 1- getting dssp file for each pdb file (from ftp server) > > This method has the advantage of giving you a consistent definition of > what is helix / sheet / turn / etc., but the disadvantage of sometimes > missing short regions of sheet in particular. > Which region do you mean DSSP might miss ? The terminal ends of the strands in a sheet, the terminal strands of the sheet or perhaps the interrupted strands in a sheet ? Would you please share further details, any statistics or any reference that discusses these missing regions please ? Many thanks, -Pooja From dan.bolser from gmail.com Thu Apr 24 09:39:40 2008 From: dan.bolser from gmail.com (Dan Bolser) Date: Thu Apr 24 11:44:06 2008 Subject: [Protein-analysis] Re: pdb-l: About PDB Files and Secondary Structures In-Reply-To: <673416FE-3DBF-4900-94A1-61CF8069A98E@nottingham.ac.uk> References: <4f8cb9900804230425q621e90a7wa41467d7263badf8@mail.gmail.com> <2c8757af0804230512u1e1e9ac9p1224e446160acb15@mail.gmail.com> <673416FE-3DBF-4900-94A1-61CF8069A98E@nottingham.ac.uk> Message-ID: <2c8757af0804240739r7e238aa4s5f2f7d2ae11f3620@mail.gmail.com> On 23/04/2008, Pooja Jain wrote: > Hi Dan, > > > > > > > > > > > I'm doing a project on "Protein Contact Map Prediction" and I use some > > > features for nueral network's input, including Secondary Structure of a > > > given Amino Acid. There are several ways: > > > > > > 1- getting dssp file for each pdb file (from ftp server) > > > > > > > This method has the advantage of giving you a consistent definition of > > what is helix / sheet / turn / etc., but the disadvantage of sometimes > > missing short regions of sheet in particular. > > > > > > Which region do you mean DSSP might miss ? The terminal ends of the > strands in a sheet, the terminal strands of the sheet or perhaps the > interrupted strands in a sheet ? > > Would you please share further details, any statistics or any reference > that discusses these missing regions please ? I am glad you brought this up, because the truth is that I don't really know! I have only ever heard this said - I have not seen any statistics or examples. It would be great to get more information from the list about the identification (and validity) of short regions of secondary structure. Sorry for any confusion, Dan. > > Many thanks, > > -Pooja > > > -- hello From karplus from soe.ucsc.edu Thu Apr 24 10:15:17 2008 From: karplus from soe.ucsc.edu (Kevin Karplus) Date: Thu Apr 24 11:44:12 2008 Subject: [Protein-analysis] Re: 3D Structural Properties : From DSSP or PDB or ? In-Reply-To: <2A7C9524-109B-4374-9977-2ADA2BF617C6@nottingham.ac.uk> (message from Pooja Jain on Wed, 23 Apr 2008 22:27:28 +0200) References: <4f8cb9900804230425q621e90a7wa41467d7263badf8@mail.gmail.com> <200804231720.m3NHKxrT032725@cheep.cse.ucsc.edu> <2A7C9524-109B-4374-9977-2ADA2BF617C6@nottingham.ac.uk> Message-ID: <200804241515.m3OFFH46024405@cheep.cse.ucsc.edu> For training secondary structure predictors, DSSP is an adequate method. Stride is about as good, though DSSP and Stride disagree on the boundaries of helices. Our group has taken to using more finely divided alphabets, since there is more predictable from local context than DSSP defines, and the extra information is useful in fold recognition and alignment. Our most successful alphabet mainy subdivides the beta strands, predicting parallel/antiparallel/mixed (see http://www.soe.ucsc.edu/research/compbio/SAM_T06/STR.html for information and definitions---str2 is our most successful local structure alphabet, of the dozens we have tried). The str2 alphabet can be defined based soley on the full DSSP output. For burial/exposure, we have found neighborhood-count measures more conserved and predictable than surface area measures. The simplest one that works well is CB14 (the number of beta carbons within 14 Angstroms of the beta carbon), but we have another (near-backbone-11) that is a little better. ------------------------------------------------------------ Kevin Karplus karplus@soe.ucsc.edu http://www.soe.ucsc.edu/~karplus Professor of Biomolecular Engineering, University of California, Santa Cruz Undergraduate Director, Bioinformatics Affiliations for identification only. From geoff from compbio.dundee.ac.uk Thu Apr 24 10:35:23 2008 From: geoff from compbio.dundee.ac.uk (Geoff Barton) Date: Thu Apr 24 11:44:19 2008 Subject: [Protein-analysis] Re: pdb-l: About PDB Files and Secondary Structures In-Reply-To: <2c8757af0804240739r7e238aa4s5f2f7d2ae11f3620@mail.gmail.com> References: <4f8cb9900804230425q621e90a7wa41467d7263badf8@mail.gmail.com> <2c8757af0804230512u1e1e9ac9p1224e446160acb15@mail.gmail.com> <673416FE-3DBF-4900-94A1-61CF8069A98E@nottingham.ac.uk> <2c8757af0804240739r7e238aa4s5f2f7d2ae11f3620@mail.gmail.com> Message-ID: >>>> I'm doing a project on "Protein Contact Map Prediction" and I use some >>>> features for nueral network's input, including Secondary Structure of a >>>> given Amino Acid. There are several ways: >>>> >>>> 1- getting dssp file for each pdb file (from ftp server) >>>> >>> >>> This method has the advantage of giving you a consistent definition of >>> what is helix / sheet / turn / etc., but the disadvantage of sometimes >>> missing short regions of sheet in particular. >>> >>> >> >> Which region do you mean DSSP might miss ? The terminal ends of the >> strands in a sheet, the terminal strands of the sheet or perhaps the >> interrupted strands in a sheet ? >> >> Would you please share further details, any statistics or any reference >> that discusses these missing regions please ? > > I am glad you brought this up, because the truth is that I don't > really know! I have only ever heard this said - I have not seen any > statistics or examples. It would be great to get more information from > the list about the identification (and validity) of short regions of > secondary structure. > > Sorry for any confusion, > > Dan. We did a comparison of DSSP, STRIDE and DEFINE in connection with secondary structure prediction (see Cuff and Barton, 1999, Proteins, 34, 508-519) which may give you some information on this. For example, the different helix-length distributions found for stride and DSSP - if you can't get that paper let me know. You might also find the paper by Colloc'h et al (http://www.ncbi.nlm.nih.gov/pubmed/8332595) interesting as this compared a number of definition methods. The original DSSP paper by Kabsch and Sander is a good read since this gives a very clear explanation of how the method works (but don't print out the secondary structure dictionary that is on the end!) DSSP defines secondary structure by a set of rules based on hydrogen bonding. It is a nice algorithm since it only has one adjustable parameter (the H-bond energy cutoff) and the rules are relatively simple and hierarchical. For example, beta-strands are not defined directly, but only as part of a sheet which in turn is defined by joining beta ladders, which are defined as sequences of beta bridges. For high "quality" crystal structures, the DSSP definitions usually reproduce what you would expect by eye, but as with any automatic procedure, for some structures you may not agree with the definition. For example, loss of single hydrogen bond due to a poorly modelled residue, can lead to truncation of a helix, or splitting of a sheet into two sheets. Whether this is important to you or not will depend on the problem you are interested in. Stride starts with DSSP, but then applies geometric criteria to see if strands or helices should be extended. For this reason, some people prefer STRIDE definitions. DEFINE, which is not used these days worked from completely different principles. The advantage of using DSSP or STRIDE over the structure-author definitions is they follow a consistent model. Of course, many authors base their secondary structure definitions on those from DSSP. Geoff. -- Geoff Barton, Professor of Bioinformatics, School of Life Sciences University of Dundee, Scotland, UK. geoff@compbio.dundee.ac.uk Tel:+44 1382 385860/388731 (Fax:385764) www.compbio.dundee.ac.uk The University of Dundee is registered Scottish charity: No.SC015096 From microsathese from hotmail.com Wed Apr 23 22:42:30 2008 From: microsathese from hotmail.com (Niraikulam Ayyadurai) Date: Thu Apr 24 16:52:15 2008 Subject: [Protein-analysis] Protocol needed from study the protein aggregation Message-ID: Hi all Recently i got a mutant gene and showing higher solubility then wild type. we are assuming that our mutant preventing the aggregation of the protein. I would like to study the thermal and chemical protein aggregation. i would be happy If any one provide me detailed protocol to study this experiment. expecting lot of reply from our group thanks _________________________________________________________________ Video: Get a glimpse of the latest in Cricket, Bollywood, News and Fashion. Only on MSN videos. http://video.msn.com/?mkt=en-in From otastan from gmail.com Thu Apr 24 12:34:25 2008 From: otastan from gmail.com (Oznur Tastan) Date: Thu Apr 24 16:52:32 2008 Subject: [Protein-analysis] Re: pdb-l: About PDB Files and Secondary Structures In-Reply-To: <2c8757af0804240739r7e238aa4s5f2f7d2ae11f3620@mail.gmail.com> References: <4f8cb9900804230425q621e90a7wa41467d7263badf8@mail.gmail.com> <2c8757af0804230512u1e1e9ac9p1224e446160acb15@mail.gmail.com> <673416FE-3DBF-4900-94A1-61CF8069A98E@nottingham.ac.uk> <2c8757af0804240739r7e238aa4s5f2f7d2ae11f3620@mail.gmail.com> Message-ID: <63aeb8db0804241034j3fc33a9ar1f250d91a3a44e45@mail.gmail.com> *The following recent article provides a comparison:* * Zhang W, Dunker AK, Zhou Y.**Assessing secondary structure assignment of protein structures by using pairwise sequence-alignment benchmarks, *Proteins. 2008 Apr;71(1):61-7. ---- Oznur Tastan PhD Candidate, Computer Science Department Carnegie Mellon University On Thu, Apr 24, 2008 at 10:39 AM, Dan Bolser wrote: > On 23/04/2008, Pooja Jain wrote: > > Hi Dan, > > > > > > > > > > > > > > > > > I'm doing a project on "Protein Contact Map Prediction" and I use > some > > > > features for nueral network's input, including Secondary Structure of > a > > > > given Amino Acid. There are several ways: > > > > > > > > 1- getting dssp file for each pdb file (from ftp server) > > > > > > > > > > This method has the advantage of giving you a consistent definition of > > > what is helix / sheet / turn / etc., but the disadvantage of sometimes > > > missing short regions of sheet in particular. > > > > > > > > > > Which region do you mean DSSP might miss ? The terminal ends of the > > strands in a sheet, the terminal strands of the sheet or perhaps the > > interrupted strands in a sheet ? > > > > Would you please share further details, any statistics or any reference > > that discusses these missing regions please ? > > I am glad you brought this up, because the truth is that I don't > really know! I have only ever heard this said - I have not seen any > statistics or examples. It would be great to get more information from > the list about the identification (and validity) of short regions of > secondary structure. > > Sorry for any confusion, > > Dan. > > > > > Many thanks, > > > > -Pooja > > > > > > > > > -- > hello > > _______________________________________________ > Proteins mailing list > Proteins@net.bio.net > http://www.bio.net/biomail/listinfo/proteins > -- oznur From sticher from bioc.unizh.ch Fri Apr 25 09:28:15 2008 From: sticher from bioc.unizh.ch (Patrick Sticher) Date: Fri Apr 25 10:32:03 2008 Subject: [Protein-analysis] First Announcement - Practical Course in 2D Membrane Protein Crystallization and Observation Message-ID: <4811EA7F.3060008@bioc.unizh.ch> Dear colleagues, I would like to inform you about following course: 7th NCCR Practical Course and EMBN Summer School PRACTICAL COURSE IN 2D MEMBRANE PROTEIN CRYSTALLIZATION AND OBSERVATION October 20 - 24, 2008, Basel Switzerland Topics include membrane protein expression, solubilization and purification | detergent properties and interactions with lipids and proteins | functional analysis of membrane proteins | 2D-crystallization of membrane proteins | electron microscopy of 2D crystals | image processing | application of other visualization techniques: AFM, IR-spectroscopy, solid-state NMR | The course is directed primarily to PhD students and postdocs with some previous experience in the structural biology of membrane proteins or related fields. The number of participants will be limited to 25. This course is supported by the European Union Marie Curie Conferences and Training Courses Program, and the Swiss NCCR Structural Biology. Junior scientists wishing to attend this course may apply for an EU Marie Curie Fellowship covering course fees and accommodation costs, and providing support towards travel costs. For further information, please see to the course website www.structuralbiology.uzh.ch/membranecourse2008.asp Online registration will be possible end of May. With best regards, Patrick Sticher _________________________________ Dr. Patrick Sticher Moser NCCR Scientific Officer Institute of Biochemistry University of Z?rich Winterthurerstrasse 190 CH - 8057 Z?rich Phone +41 / (0)44 / 635 54 84 Fax +41 / (0)44 / 635 59 08 Mail sticher@bioc.uzh.ch From RoyalOui from gmail.com Sun Apr 27 19:01:33 2008 From: RoyalOui from gmail.com (Berkeley Brett) Date: Sun Apr 27 19:51:24 2008 Subject: [Protein-analysis] The Protein Synthesis Jabberwocky Dance (YAY!!); 1971 (+ capturing YouTube videos) Message-ID: <6829fa18-e321-4458-a424-84e7ed0e1d17@a9g2000prl.googlegroups.com> This may bring back memories for some of you.... In 1971, Stanford Chemistry Professor Robert Alan Weiss assembled a troupe of dancers to demonstrate the process of protein synthesis -- and the "Protein Synthesis Dance -- an Epic on the Cellular Level" was born! Brimming with the exuberance of the time (: an exuberance that is still present in the hearts and minds of many :), the video of the Protein Synthesis Dance has been used very widely at universities, colleges, and even some high schools to introduce the subject of protein synthesis to students. So here it is, in all its glory -- The Protein Synthesis Dance! (Of course, the poem on which the narration is modelled is Lewis Carrol's "Jabberwocky," which I've included as a PS below; also, if memory serves, each puff of smoke you see from the GTP dancer represents a release of energy from hydrolysis (?? It's been awhile....) ) : http://www.youtube.com/watch?v=Nmqhdozuf7Y The FULL version of the video, with a more conventional introduction (to "people portraying molecules using the dance idiom") by (subsequent Nobel Prize in Chemistry winner) Paul Berg ( http://nobelprize.org/nobel_prizes/chemistry/laureates/1980/berg-autobio.html and http://en.wikipedia.org/wiki/Paul_Berg ), may be seen here: http://www.youtube.com/watch?v=u9dhO0iCLww&feature=related Peace, Love, and Wholesome Protein Synthesis to you all! P.S. You can capture most YouTube videos (in AVI format, which can be played with the Windows Media Player, among many other players) by using the free "VDownloader" program, available here: http://www.vdownloader.es/ Just click the large green arrow in the lower right part of the window to download. It's pretty easy to use and mostly intuitive. P.S.2. Here's the text of Lewis Carrol's delightful nonsense poem, Jabberwocky. (Not surprisingly, my spell-checker objects muchly to it (as it does to the word 'muchly'!): Jabberwocky 'Twas brillig, and the slithy toves Did gyre and gimble in the wabe: All mimsy were the borogoves, And the mome raths outgrabe. "Beware the Jabberwock, my son! The jaws that bite, the claws that catch! Beware the Jubjub bird, and shun The frumious Bandersnatch!" He took his vorpal sword in hand: Long time the manxome foe he sought -- So rested he by the Tumtum tree, And stood awhile in thought And as in uffish thought he stood, The Jabberwock, with eyes of flame, Came whiffling through the tulgey wood, And burbled as it came! One, two! One, two! and through and through The vorpal blade went snicker-snack! He left it dead, and with its head He went galumphing back. "And has thou slain the Jabberwock? Come to my arms, my beamish boy! O frabjous day! Callooh! Callay!" He chortled in his joy. 'Twas brillig, and the slithy toves Did gyre and gimble in the wabe: All mimsy were the borogoves, And the mome raths outgrabe. [end] -- Brett http://www.100bestwebsites.org/ "The 100 finest sites on the Web, all in one place!" Widely-watched non-profit ranking of top Internet sites From engelbert_buxbaum from hotmail.com Mon Apr 28 14:28:48 2008 From: engelbert_buxbaum from hotmail.com (Dr Engelbert Buxbaum) Date: Tue Apr 29 11:32:52 2008 Subject: [Protein-analysis] Re: Protocol needed from study the protein aggregation References: Message-ID: Am 23.04.2008, 23:42 Uhr, schrieb Niraikulam Ayyadurai : > > Hi all > Recently i got a mutant gene and showing higher solubility then wild > type. we are assuming that our mutant preventing the aggregation of the > protein. > I would like to study the thermal and chemical protein aggregation. > i would be happy If any one provide me detailed protocol to study this > experiment. Differential scanning calorimetry would be my method of choice. No point in posting the protocol, you need to find somebody with the required equiment to collaborate with. Plan on sending them plenty of protein (several 100 mg), there is a wide gap between what biophysicists and biochemists consider "a small sample". And biophysicists don't do protein purificantion and hence have absolutely no respect for the work you put in ;-( From vincenza.nardicchi from unipg.it Mon Apr 28 09:01:18 2008 From: vincenza.nardicchi from unipg.it (Vincenza Nardicchi) Date: Tue Apr 29 11:42:42 2008 Subject: [Protein-analysis] PREDITOP Message-ID: I'm Vincenza Nardicchi from Perugia (Italy), I find you asked for the free availability of the program PREDITOP and managed to get it. Do you mind to share me the link as I am also looking for it Thank you very much in advance PhD Vincenza Nardicchi Dip. Medicina Interna Sez. Biochimica Universit? degli Studi di Perugia vincenza.nardicchi@unipg.it From itisam.sarangi from gmail.com Mon Apr 28 12:16:01 2008 From: itisam.sarangi from gmail.com (Itisam Sarangi) Date: Tue Apr 29 11:42:48 2008 Subject: [Protein-analysis] DNA binding protein purification Message-ID: <25de2fb70804281016i5c40b09mb79b5ca4ff75029f@mail.gmail.com> Dear all Presently I am trying to purifying a DNA binding protein from E.coli..and trying to remove the contaminating DNA... I am using DNAse to remove the contaminating DNA ....but the protein is precipitaing in the contact with diavalent cations used for DNAse activity...if any one have suggestions plz feel free to share.... itisam From clement from bio.mls.eng.osaka-u.ac.jp Tue Apr 29 19:20:11 2008 From: clement from bio.mls.eng.osaka-u.ac.jp (Clement Angkawidjaja) Date: Wed Apr 30 12:50:50 2008 Subject: [Protein-analysis] Re: Protocol needed from study the protein aggregation References: <200804291703.m3TH3vQ25923@net.bio.net> Message-ID: <003101c8aa57$f46a1a10$0e00a8c0@CLEMENT> > Hi all > Recently i got a mutant gene and showing higher solubility then wild > type. we are assuming that our mutant preventing the aggregation of the > protein. > I would like to study the thermal and chemical protein aggregation. > i would be happy If any one provide me detailed protocol to study this > experiment. Besides DSC, you can also do CD spectroscopy. Easier and needs smaller amount of protein (10 mg for far-UV should be sufficient). Similar with Dr. Engelbert Buxbaum's comment, you need to collaborate with someone with appropriate instrument. Clement