>BIOSCI note relating to patents raises some interesting questions about freedom
>of information in the increasingly market-driven, accountant-steered
>socio-economic and academic environment that most scientists find themselves in
>these days.
>(1) How much information is not getting in to the public domain (and by public
>    I dont mean public to those who can afford to buy GENESEQ).

It is true that sequences only reported in patent documents have not found their
way into the public sequence databanks. The data is in the public domain,
its just that its on paper. Speaking for EMBL at least, we have
talked for years with the European Patent Office (EPO) about this. Derwent
(producers of GENESEQ) were also involved until they lost patience and decided
to do their own thing). I think they have made a good product.
The EPO (and we of course) believe and want this data to be public, that is in
EMBL, GenBank etc. The problem is that it is buried in patent documents,
extremely expensive in labour terms to extract. It will be done. NCBI are
already stuffing some data from the US Patent Office into their GenInfo database
and we hope to conclude with the EPO how to process their 'backfile' of data
within this year. Sequence data associated with patents from now on should
be largely in electronic form, submitted by the patent applicant using
software PatentIn or similar, derived from Authorin (from Intelligentics
under the GenBank contract) so this data should become public more easily.
>(2) This is yet another database format to contend with ...

I don't think so. The GENESEQ database format is very similar to the EMBL and
SWISS-PROT format. They have added one or two line-types for patent-specific
information quite reasonably: in fact very similar to how we generated some
samples a few years ago.
Peter Stoehr
EMBL Data Library

