software for reading sequence from PDF file
gilbertd at bio.indiana.edu
Fri Mar 19 17:54:47 EST 1999
Certainly what you ask can be done (extract specific text from PDF),
if the PDF docs are not encrypted/secured by the creators. Text is
stored as text in PDF, not as bitmap images (unless the PDF was created
from a bitmap image) so you can pull out the text with the right
tool. PDF format is well documented by Adobe.
Here are some PDF links
See esp. here for extraction tools
I've written software to create PDF from various graphics/text.
It wasn't too hard. If you need to write it, software to
extract text should be a straight-forward programming project
for some software engineer. Java is a great match for PDF, since
the standard ZIP libraries of java work on PDF compressed data.
In article <717801BBC100D211B89500805F6FAD93047D56 at snap01.synapticcorp.com>,
<Tvenkatesh at synapticcorp.com> wrote:
>I would like to know if there is software that can convert PDF file into
>Specifically we want to extract sequences from patent documents which are
>stored as images in PDF
>format. We tried Acorobat reader, it did not help.
>I appreciate your help.
>T. V. (Venky) Venkatesh, Ph D
>Senior Scientist (Bioinformatics and Molecular Biology)
>Synaptic Pharmaceutical Corporation
>215 College Road
>Paramus NJ 07652 - 1431
>Tvenkatesh at synapticcorp.com
-- gilbertd at bio.indiana.edu
More information about the Bio-soft