Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Developer Center > suggestion for a small useful program

suggestion for a small useful program
Thread Tools
Fresh-Faced Recruit
Join Date: Mar 2005
Status: Offline
Reply With Quote
Mar 12, 2005, 05:24 PM
 
Hi! I'd like to have a program for fastly browsing through pdfs. I'm not a programmer and I don't know how to do it. But I post this suggestion for maybe some bored developer (who could write such a simple program propably in almost no time). Here is my problem:

As theoretical physicist my source of information comes mainly from preprint servers on the web, i.e. www.arxiv.org. In praxis one sometimes download dozens of them a day, and they are all stored in the downloads folder under some cryptic number like 0022354.pdf. But what if you're offline and looking for an article which you certeinly downloaded, but forgot the number? Therefore I'd like to have a tiny program that scans all pdfs in a certain folder and extracts information like author, titel, arxiv-number (i.e. for example hep-th/xxyyzzz), and also offers a full-text search. In principle this shouldn't be difficult, because there are e.g. the UNIX tools to extract text from pdfs. Just, someone needs to write a nice frontent, and an algorithms, which can extract all the data I mentioned above. I think many people in the scientific community would be grateful for such a program.

Thanks,
Marco
     
Mac Elite
Join Date: Oct 1999
Location: San Jose, Ca
Status: Offline
Reply With Quote
Mar 12, 2005, 07:52 PM
 
If you just wait for 10.4 you can use spotlight for this. It has a plugin (default) to index the text in PDF's, so you will be able to search for these strings easily. It won't extract and label things like name (but that would probably prove very difficult to do programmatically anyways), but you can still search for it.
     
Professional Poster
Join Date: Oct 1999
Location: :ИOITAↃO⅃
Status: Offline
Reply With Quote
Mar 13, 2005, 05:29 AM
 
I'm right now beginning work on just such a program (it will be 10.4-only, probably). I concur that machine-learning algorithms for extracting the author, title, etc. may not do very well. In the meantime,try BibDesk and its AutoFile function. It won't extract fields automatically, but as soon as you enter them, it will stash the PDF in the folder of your choosing.
     
   
Thread Tools
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Top
Privacy Policy
All times are GMT -5. The time now is 09:14 AM.
All contents of these forums © 1995-2011 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.7 © 2000-2011, Jelsoft Enterprises Ltd., Content Relevant URLs by vBSEO 3.3.2