Database search program?

Ken Wolfe khwolfe at tcd.ie
Fri Jul 7 09:26:43 EST 1995


>> From: champlin at GAS.UUG.Arizona.EDU (Jacob B Champlin)
>> Newsgroups: bionet.software
>> Subject: Database search program?
>> Date: 6 Jul 1995 06:57:24 GMT
>> I am looking for a program that will go through the yeast gene database and
>> retrieve this information:
>> 
>>         1.      The Name of every sequenced gene.
>> 
>>         2.      The base at the start of the translation.
>> 
>>         3.      The first 10 bases before the start of the translated region.
>> 
>>         4.      The first 12 bases after the start of the translated region.
>> 
>>         5.      Calculate the frequency of each of the four bases at each
>>                 position.
>> 


Jacob -

Here's an approximate answer to your 5th question.  It's the nucleotide
composition around the start codons of all yeast coding sequences (CDSs)
in GenBank release 89 (June 1995), excluding mitochondrial genes.  I
calculated this using ACNUC database access software and a simple Fortran
program.  

This analysis will be a bit inaccurate, because many yeast genes appear in
GenBank more than once.  GenBank 89 has 6774 start codons, as compared to
about 4000 genes in the YPD non-redundant yeast database
(http://siva.cshl.org).  You could use YPD to get a non-redundant list of
every sequenced yeast gene and their GenBank accession numbers.

Ken Wolfe
University of Dublin

     actual numbers of bases                proportions
      ----------------------    ----------------------------------
pos.    T     C     A     G        T         C        A        G      total
 -10  1965  1203  2534  1072    0.290080 0.177591 0.374077 0.158252    6774
  -9  1872  1211  2614  1077    0.276351 0.178772 0.385887 0.158990    6774
  -8  1798  1268  2649  1059    0.265427 0.187186 0.391054 0.156333    6774
  -7  1888  1137  2683  1066    0.278713 0.167848 0.396073 0.157366    6774
  -6  1928  1113  2491  1242    0.284618 0.164305 0.367730 0.183348    6774
  -5  1817  1398  2453  1106    0.268231 0.206377 0.362120 0.163271    6774
  -4  1497  1317  2982   978    0.220992 0.194420 0.440213 0.144376    6774
  -3   827   650  4025  1272    0.122084 0.095955 0.594184 0.187777    6774
  -2  1631  1425  2804   914    0.240774 0.210363 0.413936 0.134928    6774
  -1  1416  1201  3093  1064    0.209035 0.177296 0.456599 0.157071    6774
  +1    31     8  6718    17    0.004576 0.001181 0.991733 0.002510    6774
   2  6703    20    20    31    0.989519 0.002952 0.002952 0.004576    6774
   3    19    18    30  6707    0.002805 0.002657 0.004429 0.990109    6774
   4  1850   917  2060  1947    0.273103 0.135371 0.304104 0.287423    6774
   5  1473  2575  1722  1004    0.217449 0.380130 0.254207 0.148214    6774
   6  2583  1292  1781  1117    0.381311 0.190729 0.262917 0.164895    6773
   7  1510  1008  2360  1896    0.222911 0.148804 0.348391 0.279894    6774
   8  1682  1645  2292  1155    0.248302 0.242840 0.338353 0.170505    6774
   9  2170  1341  2053  1210    0.320342 0.197963 0.303071 0.178624    6774
  10  1622  1205  2295  1652    0.239445 0.177886 0.338795 0.243874    6774
  11  1799  1753  2324   898    0.265574 0.258784 0.343076 0.132566    6774
  12  2138  1232  2311  1093    0.315619 0.181872 0.341157 0.161352    6774

-- 
Ken Wolfe
Department of Genetics
University of Dublin                    e-mail: khwolfe at tcd.ie
Trinity College                         phone:  +353-1-608-1253
Dublin 2, Ireland                       FAX:    +353-1-679-8558




More information about the Bio-soft mailing list