Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Developer Center > Working with the CONTENT of a PDF?

Working with the CONTENT of a PDF?
Thread Tools
Professional Poster
Join Date: Sep 2000
Location: San Francisco
Status: Offline
Reply With Quote
Jul 6, 2003, 11:07 AM
 
I found some classes for converting to PDF, etc, but I am interested in being able to search the content of a PDF. Are there classes for converting PDFs to text?

kman
     
Mac Elite
Join Date: Jul 2002
Status: Offline
Reply With Quote
Jul 6, 2003, 02:44 PM
 
I don't think there's anything from Apple right now, but you may want to look at Preview in Panther and see if you can reverse engineer the searching in that.
     
Fresh-Faced Recruit
Join Date: Jun 2003
Status: Offline
Reply With Quote
Jul 11, 2003, 07:25 AM
 
there's two things that'll do this that i know of:

http://www.foolabs.com/xpdf/
pdftotext which is a c++ open source gpl thing that is one tool / part of xpdf. its a userinterfaceless tool so compiling on os x isn't too much of a problem - there was one patch or something needed though if memory serves - i'll dig it out if you want if necessary

the other is textlightening (company name - meta projects or meta something). this gives you a cocoa service that'll convert pdfs to rtf. this on the face of it does a better job than the above, but it costs, and is a service which i'm not so keep on. also the rtf/layout/images parts will be of no use to you.

for implementing into your own code i think for licencing and logistical reasons, pdftotext maybe better - i plan on doing this myself very soon. the only thing is, it just didn't seem to convert some pdfs quite as cleanly as i'd like, but for searching, where appearence and maybe a bit too much white space here and there doesn't pose a problem at all, it would be absolutely fine for you i think. and i checked it out several months ago so it may well have been refined since maybe.

if you find or have found anything further to this i'd really appreciate if you could say here, as i'm very interested in getting access to text in pdfs.
(Last edited by jBee; Jul 11, 2003 at 07:33 AM. )
     
   
Thread Tools
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Top
Privacy Policy
All times are GMT -5. The time now is 03:46 PM.
All contents of these forums © 1995-2011 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.7 © 2000-2011, Jelsoft Enterprises Ltd., Content Relevant URLs by vBSEO 3.3.2