|
|
Un-searchable PDF
|
|
|
|
Dedicated MacNNer
Join Date: Jun 2006
Location: Chicago
Status:
Offline
|
|
I have a PDF document that's over 100 pages long. Although the document is 5 years old, we have reason to search it on a regular basis. We've tried a number of routes (see below) to search within this PDF but no joy. We've tried:
- Preview on a MBP (Leopard)
- Adobe on same MBP
- Adobe Reader on a WinXP laptop
- using Scansoft PDF Converter v3.0 to open a PDF in MS-Word 2003
Any suggestions on how to make this file into a searchable PDF?
|
|
|
|
|
|
|
|
|
Administrator
Join Date: Jun 2000
Location: California
Status:
Online
|
|
In Preview (or Reader for that matter) select the text tool. See if you can highlight text. If you can't, then the PDF is probably an assembly of images. Such as a scanned result.
If it is images, then you'd have to OCR the document, rebuild the formatting as needed, and reassemble into a proper PDF.
It might also be a permissions issue. Whoever created the PDF may have flagged it as not allowing copying, which might turn searching off too. You can check this by pulling up the PDF's properties in Reader - not sure if Preview will show the fine copy permissions settings.
|
|
|
|
|
|
|
|
|
Dedicated MacNNer
Join Date: Jun 2006
Location: Chicago
Status:
Offline
|
|
It definitely has images.
Any suggestions to OCR the doc? The formatting is a fairly simple outline.
|
|
|
|
|
|
|
|
|
Administrator
Join Date: Jun 2000
Location: California
Status:
Online
|
|
Professional OCR packages usually do PDF files. Your complaint is a common problem. The more basic OCR packages sometimes included as freebies usually (always?) lack that functionality - so you'll have reason to buy the full package.
Chances are you have a basic OCR app already. You might have gotten one with a scanner, especially if one of your scanners is a cut above the bargain ones. See if it will open a PDF file to read the images. If not, save each page as a TIFF or PNG picture - you can do this with Preview, though it will be tedious for 100+ pages. Maybe there is some freeware utility that will save each page as a separate document. Don't save to JPEG - artifacts will decrease the OCR accuracy.
If you do save them manually, make sure the page is scaled up enough so the text is clear. Run them through the OCR program one at a time. Copy the results into a text editor. You should proof it even with clear type - OCR makes mistakes here and there. When you're done with cleanup and restoring formatting, save the editable copy for future reference. Print to PDF and the resulting PDF will finally be searchable. The contents should be indexed by Spotlight too. And that final text PDF will be way smaller than the original.
Edit: PDF2Image will export a PDF file to a succession of image files.
(
Last edited by reader50; Oct 2, 2009 at 07:32 PM.
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Forum Rules
|
|
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is Off
|
|
|
|
|
|
|
|
|
|
|
|