Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Our Archives > General Archives > Delicious Monster > Bending export to your will with grep

 
Bending export to your will with grep
Thread Tools
Delicious Monster
Dedicated MacNNer
Join Date: Jun 2006
Status: Offline
Oct 31, 2006, 04:25 PM
 
Although we're working on better printing and export in Delicious Library 2, a lot of people ask me how they can, say, create a simple list of titles they have. Printing yields an attractive, but inflexible print out, and export spits out way more than you'd ever want to know about the data in your library.

My recommendation is usually the same: grep. This powerful tool is not only available from the command line, but also built in to my favorite free text editor, TextWrangler.

Still, it occurs to me that regular expressions may be a mystery to most people, so I thought I'd start a thread for people to request help, and to post their favorite grep-cipes.

To start things off, here's a few things to get you started with regular expressions as they apply to the Delicious Library export file in TextWrangler.

As a tab-delimited text file, the basic unit is the expression [^\t]*\t which is to say, zero or more characters that are NOT a tab, followed by a tab.

Because this pattern is repeated on every line, we want to specify a few things in our expression, namely the beginning of the line ^ and "characters that are not a carriage return" .+

So, let's say you wanted a list of titles, but Delicious Library gives you every darned field. You open your exported file in TextWrangler, then do a replace all with grep.

Title is the 11th field, so we want to find "10 units, followed by a unit, followed by whatever." That looks like this:

^([^\t]*\t){10}([^\t]*\t).+

As TextWrangler saves anything in parentheses, and the first set of parentheses is basically everything before the title, we want to replace this with the second saved thing:

\2

Of course, we don't really need that trailing tab, and since we're in TextWrangler, which supports PERL extensions, we can eliminate the storage of the first sub-expression, so to clean it up a bit we could say...

replace:
^(?:[^\t]*\t){10}([^\t]*).+

with:
\1

So here's a common request. People have a shelf of all the DVDs in their 200 disc changer with the slot in the location field. You want to print a list of titles and slot numbers, but you want the slot numbers to appear first (since the titles are of variable length).

replace:
^(?:[^\t]*\t){10}([^\t]*)\t(?:[^\t]*\t){22}([^\t]*\t).+

with:
\2\1
     
Ed S
Fresh-Faced Recruit
Join Date: Oct 2006
Location: Los Alamos, NM, USA
Status: Offline
Oct 31, 2006, 06:37 PM
 
[Longtime Unix hacker, new to Mac]

grep is a little tough for this job. Unix provides lots of other tools better suited for this sort of thing. In particular, awk is more powerful and probably less intimidating.

Let's say you've already exported your library to Library Text Export.txt in Desktop (the default). You then open a shell and cd ~/Desktop. The first thing you'll want to do is get the field numbers for each field:

$ head -1 Library\ Text\ Export.txt | tr '\t' '\n' | cat -n
The 'tr' converts tabs to newlines; cat -n assigns a line number. Now you know the number of each field. Want to print out author and title?

$ awk -F\t '{print $38 ",", $11}' <Library\ Text\ Export.txt
awk handles delimited files. The -F\t tells it to use tab for the delimiter. $38 and $11 are author and title respectively (we found that out using the cat -n above). Even neater: want to print out only the books where 'location in building' contains 'Bathroom'?

$ awk -F\t '$34 ~ /Bathroom/ {print $38 ",", $11}' <Library\ Text\ Export.txt
grep can come in handy to filter those results. And sort can reorder them. But those are subjects for another posting.
     
Delicious Monster  (op)
Dedicated MacNNer
Join Date: Jun 2006
Status: Offline
Nov 8, 2006, 05:00 PM
 
This came up recently in support email: what if you want to put the currency symbol and the currency amount in different columns?

This is a two-step process. First, you have to insert the new columns.

replace:
^((?:[^\t]*\t){18})(price\t)(currentValue\t)(.+)
with:
\1priceCurrency\t\2valueCurrency\t\3\4

Then you can have to split the currency markers from the currency amounts.

replace:
^((?:[^\t]*\t){18})(?:(\$|CDN\$|&#xffe5;|&#x00a3;|EUR|¥|£) *([0-9., ]*\t))(?:(\$|CDN\$|&#xffe5;|&#x00a3;|EUR|¥|£)*([0-9., ]*\t))(.+)
with:
\1\2\t\3\4\t\5\6
     
Delicious Monster  (op)
Dedicated MacNNer
Join Date: Jun 2006
Status: Offline
Nov 9, 2006, 04:10 PM
 
Another one from support email.

Delicious Library exports dates in the so-called international format: YYYY-MM-DD HH:MM:SS GMT but the most common destination for exported text files, Excel, doesn't seem to understand this format.

To get things working, strip the time stamp.

replace:
\d\d:\d\d:\d\d (\+|-)\d\d\d\d
with:
(nothing)
( Last edited by Delicious Monster; Nov 10, 2006 at 12:38 PM. )
     
Ed S
Fresh-Faced Recruit
Join Date: Oct 2006
Location: Los Alamos, NM, USA
Status: Offline
Nov 9, 2006, 04:41 PM
 
Delicious Library exports dates in the so-called international format: YYY/MM/DD HH:MM:SS �GMT
Are you sure that's an international standard? ISO 8601 mandates a hyphen, not slashes, e.g. YYYY-MM-DDTHH:MM:SS.
     
Delicious Monster  (op)
Dedicated MacNNer
Join Date: Jun 2006
Status: Offline
Nov 10, 2006, 12:44 PM
 
Whoops! [edited]

You're right, it should be hyphens. And years have four digits. And the "plus or minus" character doesn't seem to print.

Anyway, you're right that the standard used is not ISO 8601 as there's no T marker and there's an offset to GMT.

So who says this format is the "international format"?

Foundation NSDate class (Objective-C)

In other words, Delicious Library doesn't format dates. The display date is set by the user's localization settings and the export date is whatever Apple has determined the "standard" date description is.

The solution for stripping the time portion is the same.
     
Weezer
Mac Elite
Join Date: Jul 2002
Location: Syracuse
Status: Offline
Nov 12, 2006, 01:02 AM
 

Imac Core Duo 1.83/1.5 GB/20 inch cinema, ibook G4 1 ghz
     
illitrate
Junior Member
Join Date: Nov 2006
Status: Offline
Nov 22, 2006, 04:26 PM
 
Originally Posted by Weezer View Post
wow - that's almost exactly what i was looking for at the start of summer.
downloaded it yesterday and tried it out and it does exactly what it says on the tin - great work! and a nice easy to use interface too

i wish it would exctract more fields though - it'd be nice to get the summary text so i can give a the description of what the films are about - but for the moment, it's made my life an awful lot simpler

thanks very much!!
     
tcobbs
Fresh-Faced Recruit
Join Date: Nov 2006
Status: Offline
Nov 27, 2006, 09:25 PM
 
While it's true that grep can do a whole lot of things, picking out fields in a tab-separated list will often be much easier using the cut command. If you want field 10, do the following:

cut -f 10 export.txt > field10.txt

If you want fields 5 and 10, use this:

cut -f 5,10 export.txt > fileds5and10.txt

You can get 5 through 10 like this:

cut -f 5-10 export.txt > fields5through10.txt

There are some limitations, though. First of all, all the fields in the output will be separated by tab. If you don't like that, you're out of luck just using cut. Additionally, all the fields in the output will be in the same order they were in the input. So even if you use -f 10,5 on the command line, 5 will come first in the output, followed by 10.

Of course, the above won't be available in your text editor (unless your text editor supports shell escapes), but it's likely easier to remember. And awk is far more powerful, but I feel also more difficult to master.

Note that the limitations I listed above can be overcome using a combination of the cut and paste commands. (cut and paste are standard *nix commands that have been around for a long time: apparently paste was introduced in 1980, and cut was introduced in 1982. They have nothing to do with the clipboard.) However, if you're going to that much trouble, you're probably better off with awk. But for completeness, the following commands will give you column 10 followed by column 5:

cut -f 10 export.txt > col10.txt
cut -f 5 export.txt > col5.txt
paste col10.txt col5.txt > output.txt
     
sabre 39
Fresh-Faced Recruit
Join Date: Dec 2006
Status: Offline
Dec 26, 2006, 11:26 AM
 
Thanks to everyone who has contributed to this thread. I've found it very informative, and based on the information here, I will probably purchase Delicious Library.

For a while now, I have been looking for a way to scan my library into my linux machine, and then run a script or application on the list to do an online look up of the Library of Congress or Dewey Decimal Classification number using the ISBN. The goal is to catalog and shelve my home library according to the LOC or DDC.

I had recently considered hacking an open source cuecat reader application to develop a driver to use with a USB barcode scanner I had bought. It would have been my first foray into driver writing, so I wasn't really looking forward to doing it. That meant that I kept putting it off, and my library isn't going to sort itself. Then I bought a MacBook and found Delicious Library. Although there was some initial irritation with the whole "Not Found" thing, I have found that it is helpful to cover up the UPC while scanning the ISBN, if both are printed side-by-side on the book's cover. I'm now getting pretty fair results when scanning.

After reading the posts here, I have a pretty good idea of how I could export the data into a text file in an appropriate format to run it through a bash, perl or python script to do an online look up. Since I am more familiar with scripting than I am with writing device drivers in C, and since you've provided some methods for separating out the data I need, the project should be much less daunting.
     
Delicious Monster  (op)
Dedicated MacNNer
Join Date: Jun 2006
Status: Offline
Dec 26, 2006, 12:35 PM
 
Glad to hear it. I, and I think a few people in this thread (which should be renamed Delicious Library for nerds^H^H^H^H^HPower Users) would be interested to see what you come up with.
     
 
   
Thread Tools
 
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Top
Privacy Policy
All times are GMT -4. The time now is 05:58 AM.
All contents of these forums © 1995-2017 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.8 © 2000-2017, Jelsoft Enterprises Ltd.,