pstotext - extracting plain text from PostScript


pstotext is still available from DEC/Compaq/HP, but the original authors no longer work there and they are no longer maintaining it.

An updated version, pstotext 1.9, is available from this site. Changes are:

2004-01-13, Ghostgum Software Pty Ltd.


Newsgroups: comp.text.pdf,comp.lang.postscript
From: birrell@pa.dec.com
Subject: pstotext - extracting plain text from PostScript
Date: Thu, 30 Nov 95 10:59:41 -0800

We've released a freeware utility "pstotext", that works with GhostScript to extract plain-text from PostScript files. We're releasing it publicly because it seems to do a good job on a much wider range of PostScript files than previous such programs.

We've tested pstotext on millions of lines of PostScript, including files generated by several versions of drivers from each of Windows, Macintosh, and dvips (TeX). It deals successfully with a wide variety of encoding vectors, and it re-assembles words that have been broken up for pair-kerning (it doesn't re-assemble words that have been hyphenated, though). It also works (though a little less reliably) on Acrobat PDF files.

We're distributing pstotext free, in source form. You'll need a copy of a recent version of GhostScript too (at least 3.33 for PostScript, at least 3.51 for PDF). Pstotext should work straightforwardly on any Unix system; we haven't done Windows or Macintosh ports.

You can read the pstotext documentation at:

http://www.research.digital.com/SRC/virtualpaper/manpages/pstotext.1.html
You can download pstotext from the following URL (after reading and agreeing to the license offered there):
http://www.research.digital.com/SRC/virtualpaper/pstotext.html
We aren't offering formal technical support for pstotext, but we'd be happy to receive your questions and comments by e-mail at the following locations:
http://www.research.digital.com/SRC/virtualpaper/comments.html
mailto:mcjones@pa.dec.com
mailto:birrell@pa.dec.com
Alternatively, if you think your comments on pstotext would be of more general interest, please post them to the newsgroup comp.lang.postscript

We created pstotext as spin-off from our Virtual Paper project, where we attempted to make on-line reading of lengthy material (like research reports, manuals, or entire books) comfortable. You can read about that project at:

http://www.research.digital.com/SRC/virtualpaper
Andrew Birrell & Paul McJones, Digital (Systems Research Center)