An item of possible interest to the Arabidopsis genome community...
> From BIOSCI-REQUEST at genbank.bio.net Fri Feb 21 16:33:06 1992
> Message-Id: <9202212110.AA29918 at genbank.bio.net>
> To: human-genome-program at genbank.bio.net> From: steen at darwin.bu.edu (Steen Knudsen)
> Subject: NETGENE AND GENEID ONLINE SERVER
> Date: 21 Feb 92 21:03:30 GMT
> Sender: news at bu.edu> Followup-To: bionet.software
>>>> GENEID AND NETGENE ONLINE SYSTEMS FOR PREDICTION OF GENE STRUCTURE
> version 1.0 2/1/1992
> Geneid is an Artificial Intelligence system for analyzing vertebrate genomic
> DNA and prediction of exons and gene structure (1). A prototype is implemented
> as a fast, automatic email-response system. Users have the option of having
> their DNA sequence analyzed by NetGene (2) simultaneously.
> Before or simultaneously with submitting a sequence for analysis, you need to
> register your name by sending a line with the word "register", followed by
> your name and address. Example:
>> register, Don Johnson, Miami Vice, Baywiev Marina Dock A12, Miami, FL 34566-
> 1234, U.S.A.
>> NOTE>> The line can be longer than 80 characters as long as it contains NO
> linebreaks, (that is, do NOT press the <Return> key until the end of the
>> Send the line in a mail to: geneid at darwin.bu.edu. The registration
> information will only be used for maintaining a file of the number and
> geographic distribution of the users.
>> SUBMITTING SEQUENCES:
> Your sequences must be submitted in the following format (approximately same
> format as used for fasta, BLAST and GRAIL):
> You can submit only one sequence per mail. Put the sequence after the keyword
> "Genomic Sequence" as shown below:
>> Genomic Sequence
>> (Restrict the line length to 80 characters. The seqname is limited to 20
>> NOTE>> IF YOUR MAIL DOES NOT CONTAIN THE KEYWORD "GENOMIC SEQUENCE", OR
> ANY OTHER KEYWORDS LISTED IN THIS FILE, NO MAIL WILL BE RETURNED TO YOU.
>> If the reply file with the results will exceed the Mail limit of 300
> kB, the reply will be split into several files. On a UNIX system you
> could send the File containing the sequence as follows: mail -v
>geneid at darwin.bu.edu <File
> GeneId currently will not accept sequences smaller than 100 bp or larger
> than 20 kb.
> Your submitted sequence will be deleted automatically immediately after
> reception by GeneID.
> GeneID will scan your sequence for potential splice sites, startcodons, and
> stopcodons. Then it will try to assemble these into potential first exons,
> internal exons, and last exons. Exons will be evaluated according to a number
> of characteristics related to coding and splicing, and only likely exons will
> be kept. Mutually exchangeable exons (normally overlapping and in the same
> frame) will be put together in classes. Only the top 15 ranking first and
> last exon classes, and the top 35 ranking internal exon classes
> from each sequence will be kept, and assembled into potential gene models with
> open reading frame, that will be ranked according to quality of the exons
> they contain. The top 20 models will be included in the return mail. Your
> return mail will also contain lists of the sites and exons created during the
> analysis. GeneID will not analyze the reverse complement of your sequence. If
> you suspect a gene on the other strand, submit the reverse complement sequence
>> TIPS FOR USE OF GENEID:
> GeneID will try to identify first, internal, and last exons in each of the
> sequences you submit, and try to assemble these into models of ONE likely
> gene in each sequence. To avoid missing any exons, the number of exons will
> be vastly overpredicted, and only a few of them are likely to be true (they
> tend to be the top ranking exons, but a few true exons rank very low). But
> these few true exons are likely to be found in the gene models because they
> fit together to form a continuous open reading frame. Thus you should look to
> the gene models to find a probable coding region.
> If you submit a sequence that turns out to contain two genes, the behavior of
> GeneID is unpredictable. It could either predict one large gene containing
> both, or it could predict only the gene with the most typical charateristics.
> If you submit a sequence that contains only part of a gene, GeneID will try to
> identify an entire gene in this sequence. Thus the predicted first exon may
> actually be part of a true internal exon, or the predicted last exon may be
> part of a true internal exon. If GeneID fails to predict any genes, you might
> look at the potential exon lists.
> Thus you can experiment with input and response, by starting out with sequences
> that are not too long (for example less than 10 kb), and see if GeneID is
> able to extend the gene if you extend the sequence. If you have very large
> sequences, it may be a good idea to request analysis by NetGene first (see
> below). NetGene will analyze sequences up to 100 kb, and may find regions
> containing exons of very high likelihood. These regions can then be resubmitted
> to GeneID for further analysis.
> GeneID will not construct models with more than 22 exons.
> If the sequence contains frameshift errors in exons, then that may affect the
> quality of the prediction in the current implementation.
> In a test on 28 genes from GenBank, 91% of the nucleotides were correctly
> predicted as coding or non-coding. Since these two categories are unequally
> represented, a better measure of accuracy may be the correlation coefficient,
> which was found to be 0.68. See paper for details.
>> ANALYSIS TIME:
> Will depend on the load on the system and grows approximately linearly with
> the length of the sequence input. Expect at least 1 minute per kb. Longer
> response times can occur if the system is temporarily down (check with the
> UNIX command: "finger geneid at darwin.bu.edu").
>> FURTHER INFORMATION:
> A preprint of a paper describing the development and testing of GeneID is
> available as a Stuffit.hqx file for Macintosh. Simply include the line:
>> Preprint Request
>> in your mail to geneid at darwin.bu.edu, and the manuscript will be mailed to you.
> Publication of output from GeneID must be referenced as follows:
> (1) Guigo, R., Knudsen, S., Drake, N., and Smith, T. (1992) Prediction of Gene
> Structure. Journal of Molecular Biology. In Press.
>>> PROBLEMS, COMMENTS, AND SUGGESTIONS:
> Can be mailed to steen at darwin.bu.edu.>> Users of the MBCRR and BMERC national computer resources have direct
> online access to GeneID from their account. Contact Tom Graf at
>tom at mbcrr.harvard.edu for information on these accounts.
> Users now have the option of having their submitted sequence analyzed by NetGene
> also. NetGene predicts splice sites and gives information about the likelihood
> of the prediction. NetGene detects both coding regions and splice signals, and
> combines that information to predict both small and large exons (it predicts one
> end of the exon, the acceptor or donor site).
>> Simply include the keyword "NetGene" between the keyword "Genomic Sequence"
> and your sequence. The results of the NetGene analysis will be mailed to you
> separately. The only difference in sequence format is that NetGene will accept
> sequences UP TO 100 kb. Thus, NetGene can be used in conjunction with GeneID
> by first submitting a large sequence to NetGene (specify the keyword "NetGene";
> GeneID will not respond if the sequence is larger than 20 kb). Regions that
> show exons with very high likelihood can then be resubmitted to GeneID (<20kb)
> for further analysis. The minimum sequence length that NetGene will faithfully
> analyze is 451 bp.
>> REFERENCING AND FURTHER INFORMATION
> Publication of output from NetGene must be referenced as follows:
> (2) Brunak, S., Engelbrecht, J., and Knudsen, S. (1991) Prediction of Human mRNA
> Donor and Acceptor Sites from the DNA Sequence. Journal of Molecular Biology
>> PROBLEMS, COMMENTS AND SUGGESTIONS:
> Can be mailed to : steen at darwin.bu.edu>>