Likely artifactual S. cerevisiae ORF's

JEAN HANI HANI at MIPS16.dnet.mips.biochem.mpg.de
Tue Mar 25 07:40:13 EST 1997


The following information was distributed by: Jim Freeman

>The BioMolecular Engineering Research Center has identified a large
>set of ORFs in S. cerevisiae which are likely to be artifactual.

>        NOTE 1: There are 6275 ORF's originally identified by MIPS in
>yeast genome that theoretically could encode proteins longer than 99
>amino acid residues (Goffeau et al., Science, 1996, Vol. 274, page
>546-567). About 45% of them have been functionally annotated,
>experimentally or matching previously assigned proteins. The rest of
>the ORF's remain "hypothetical", and many of them are not real
>ORFs. Our statistics based on sequence length distribution alone
>predicts that over 400 of these hypothetical ORFs between 100 and 110
>amino acid residues long are not likely to code for proteins (Das et
>al., Nature, 1997, Vol. 385, page 29-30).  

>In addition, when two unannotated ORFs overlap one should anticipate
>that one or the other is likely an artifact, although which is
>generally indeterminate. However when one, or both, of the pairs has a
>length of between 100 and 110 there is a strong case for its
>artifactual nature.

The existence of questionable ORFs is well known and a detailed, excellent
publication can be found: Termier M and Kalogeropoulos A., Yeast 1996, Vol. 12,
page 369-384.

The following facts were not taken into account:

1) The paper form Goffeau et al. says that 6275 ORFs were extracted,
   from these extracted ORFs 390 had been assigned as questionable ORFs. This
   makes 5885 hypothetical proteins in yeast.
   (Goffeau et al., Science, 1996, Vol. 274, page 546-567)
   
2) The genome overview at the MIPS WWW site is constantly updated and can be
   accessed at: http://speedy.mips.biochem.mpg.de/mips/yeast/inventy.html
   At the moment 6287 ORFs are extracted and 434 are annotated as questionable 
   ORFs according to the criteria provided by Termier and Kalogeropoulos. These
   have to be substracted. The result are 5853 hypothetical proteins.

3) At MIPS a questionable ORF is defined by a combination of the following 
   attributes: low cai value, partial overlap to a longer or known ORF, no 
   similarity to other ORFs.

Nevertheless, we have scrutinized the information provided by J. Freeman.
We have inspected the list of hypothetical ORFs and came to the result
that most of the ORFs mentioned had been previously assigned as
questionable ORFS with the following 50 exceptions:

29 ORFs are annotated at MIPS as hypothetical proteins, these have been checked
and for 22 of these we had to make corrections: 3 of the corrections were 
located on chromosome X and 18 were located on chromsome XV (in fact our
information for chromosome XV was incomplete):

YJL009W  hypothetical protein (has to be annotated as questionable ORF)
YJL119C  hypothetical protein (has to be annotated as questionable ORF)
YJL135W  hypothetical protein (has to be annotated as questionable ORF)
YOL035C  hypothetical protein (has to be annotated as questionable ORF)
YOL050C  hypothetical protein (has to be annotated as questionable ORF)
YOL099C  hypothetical protein (has to be annotated as questionable ORF)
YOL150C  hypothetical protein (has to be annotated as questionable ORF)
YOR105W  hypothetical protein (has to be annotated as questionable ORF)
YOR121C  hypothetical protein (has to be annotated as questionable ORF)
YOR135C  hypothetical protein (has to be annotated as questionable ORF)
YOR139C  hypothetical protein (has to be annotated as questionable ORF)
YOR146W  hypothetical protein (has to be annotated as questionable ORF)
YOR169C  hypothetical protein (has to be annotated as questionable ORF)
YOR170W  hypothetical protein (has to be annotated as questionable ORF)
YOR218C  hypothetical protein (has to be annotated as questionable ORF)
YOR225W  hypothetical protein (has to be annotated as questionable ORF)
YOR282W  hypothetical protein (has to be annotated as questionable ORF)
YOR300W  hypothetical protein (has to be annotated as questionable ORF)
YOR331C  hypothetical protein (has to be annotated as questionable ORF)
YOR333C  hypothetical protein (has to be annotated as questionable ORF)
YOR345C  hypothetical protein (has to be annotated as questionable ORF)
YPL072W  hypothetical protein (has to be annotated as questionable ORF)

The 7 remaining hypothetical proteins have not been changed due to our 
definition of a questionable ORF: 

YAR030C  hypothetical protein (113 aa) 
YBL018C  hypothetical protein (133 aa)
YGR290W  hypothetical protein (147 aa, helix-loop-helix motif)
YHL005C  hypothetical protein (130 aa)
YLR236C  hypothetical protein (107 aa)
YMR151W  hypothetical protein = YIM2
YNL303W  hypothetical protein (115 aa)

For 16 ORFs we have found other classifications because of similarities to 
other proteins, part of them are known TY proteins:
                                                             
YAL004W   strong similarity to A.klebsiana glutamate dehydrogenase
YAR043c   not extracted at MIPS because internal to other bigger ORF
YAR052c   not extracted at MIPS because internal to other bigger ORF
YAR074c   not extracted at MIPS because internal to other bigger ORF
YAR073W   FUN63 strong similarity to IMP dehydrogenases
YAR075W   strong similarity to IMP dehydrogenases
YBL112C   strong similarity to subtelomeric encoded proteins
YCL013w   PART of BUD3, the sequence was corrected in September 1996
YCR013C   weak similarity to M.leprae B1496_F1_41 protein
YDL228C   similarity to A.klebsiana glutamate dehydrogenase
YER097W   weak similarity to ribosomal S3 proteins
YFL002w-A TY2B protein 
YFL065C   strong similarity to subtelomeric encoded proteins
YGR181W   similarity to YHR004c-a
YHR218W-A strong similarity to subtelomeric encoded proteins
YIL080W   Ty3-2 orf C fragment
YIL082w-A TY3B protein
YIL175W   putative pseudogene
YKL199C   might be C-terminal part of YKL198c due to a frameshift error
YLL037W   weak similarity to human platelet-activating factor receptor
YNL203C   weak similarity to B.subtilis CDP-diacylglycerol--serine 
          O-phosphatidyltransferase
YPR203W   strong similarity to subtelomeric encoded proteins


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
*                                       |             E-mail:                *
*           Dr. Kaj Albermann           |    albermann at mips.embnet.org       *
*             Dr. Jean Hani             |        hani at mips.embnet.org        *
*            Dr. H.W. Mewes             |       mewes at mips.embnet.org        *
*                                       |                                    *
*                 MIPS                  |              Tel:                  *
* am Max-Planck-Institut fuer Biochemie |        +49 89 8578 2659            *
*          Am Klopferspitz 18a          |                                    *
*          D-82152 Martinsried          |              Fax:                  *
*                Germany                |        +49 89 8578 2655            *
*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*
* Internet connectivity to MIPS  - WWW:    WWW.MIPS.BIOCHEM.MPG.DE           *
*                                - FTP:    FTP.MIPS.EMBNET.ORG               *
*                                - Email:  username at MIPS.EMBNET.ORG          *
*----------------------------------------------------------------------------*



More information about the Yeast mailing list