Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Applications > Un-searchable PDF

Un-searchable PDF
Thread Tools
QSilver
Dedicated MacNNer
Join Date: Jun 2006
Location: Chicago
Status: Offline
Reply With Quote
Oct 2, 2009, 05:07 PM
 
I have a PDF document that's over 100 pages long. Although the document is 5 years old, we have reason to search it on a regular basis. We've tried a number of routes (see below) to search within this PDF but no joy. We've tried:
  1. Preview on a MBP (Leopard)
  2. Adobe on same MBP
  3. Adobe Reader on a WinXP laptop
  4. using Scansoft PDF Converter v3.0 to open a PDF in MS-Word 2003

Any suggestions on how to make this file into a searchable PDF?
     
reader50
Administrator
Join Date: Jun 2000
Location: California
Status: Online
Reply With Quote
Oct 2, 2009, 05:13 PM
 
In Preview (or Reader for that matter) select the text tool. See if you can highlight text. If you can't, then the PDF is probably an assembly of images. Such as a scanned result.

If it is images, then you'd have to OCR the document, rebuild the formatting as needed, and reassemble into a proper PDF.

It might also be a permissions issue. Whoever created the PDF may have flagged it as not allowing copying, which might turn searching off too. You can check this by pulling up the PDF's properties in Reader - not sure if Preview will show the fine copy permissions settings.
     
QSilver  (op)
Dedicated MacNNer
Join Date: Jun 2006
Location: Chicago
Status: Offline
Reply With Quote
Oct 2, 2009, 06:11 PM
 
It definitely has images.

Any suggestions to OCR the doc? The formatting is a fairly simple outline.
     
reader50
Administrator
Join Date: Jun 2000
Location: California
Status: Online
Reply With Quote
Oct 2, 2009, 06:54 PM
 
Professional OCR packages usually do PDF files. Your complaint is a common problem. The more basic OCR packages sometimes included as freebies usually (always?) lack that functionality - so you'll have reason to buy the full package.

Chances are you have a basic OCR app already. You might have gotten one with a scanner, especially if one of your scanners is a cut above the bargain ones. See if it will open a PDF file to read the images. If not, save each page as a TIFF or PNG picture - you can do this with Preview, though it will be tedious for 100+ pages. Maybe there is some freeware utility that will save each page as a separate document. Don't save to JPEG - artifacts will decrease the OCR accuracy.

If you do save them manually, make sure the page is scaled up enough so the text is clear. Run them through the OCR program one at a time. Copy the results into a text editor. You should proof it even with clear type - OCR makes mistakes here and there. When you're done with cleanup and restoring formatting, save the editable copy for future reference. Print to PDF and the resulting PDF will finally be searchable. The contents should be indexed by Spotlight too. And that final text PDF will be way smaller than the original.

Edit: PDF2Image will export a PDF file to a succession of image files.
( Last edited by reader50; Oct 2, 2009 at 07:32 PM. )
     
   
 
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Top
Privacy Policy
All times are GMT -4. The time now is 03:36 AM.
All contents of these forums © 1995-2017 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.8 © 2000-2017, Jelsoft Enterprises Ltd.,