Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Mac OS X > Remove all non-ascii chars through CLI

Remove all non-ascii chars through CLI
Thread Tools
Senior User
Join Date: Jul 2000
Status: Offline
Reply With Quote
Jul 17, 2003, 07:02 PM
 
Well, the subject pretty much says it all. I have some jank XML docs which have non-ascii chars in them. I just want to remove them, or translate them to their � equiv. The big thing here though is, I want to retain the file as an XML file, so I don't want to convert the standard ascii chars. I've found many Perl scripts which will do the whole thing, but I really don't know how to do it for non-ascii.
     
Moderator
Join Date: May 2001
Location: Hilbert space
Status: Online
Reply With Quote
Jul 18, 2003, 06:30 AM
 
Check out recode. It can convert one character set into pretty much any other. Documentation of recode

Download the source here.

Before you compile recode, use

./configure --disable-nls

instead of just ./configure. Otherwise it won't compile.

Hope that helps.
I don't suffer from insanity, I enjoy every minute of it.
     
cwasko  (op)
Senior User
Join Date: Jul 2000
Status: Offline
Reply With Quote
Jul 18, 2003, 08:33 AM
 
Thanks for the tip. Do you actually have it working on OSX? I tried compiling and it first bombed and said it couldn't determine the host. So I copied in the /usr/share/libtool stuff. It then got most of the way through and bombed on another part. At this point, a binary would be nice (hint-hint) or even just a command line shell script. I started to play around last night with tr and jot, but haven't finished things up yet.
     
cwasko  (op)
Senior User
Join Date: Jul 2000
Status: Offline
Reply With Quote
Jul 18, 2003, 10:59 AM
 
Well, I got it to compile. I downloaded the development version. However, I can't figure out how to make it strip the stuff I don't want. It seems to bomb when it gets to the parts in the file that I want to strip. Uhg.

recode -sf ..us < file.orig.xml > file.xml
     
cwasko  (op)
Senior User
Join Date: Jul 2000
Status: Offline
Reply With Quote
Jul 18, 2003, 03:39 PM
 
perl -pe 's/[^\040-\176\011\012\015]//g' inputfile > outputfile

This will strip everything except normal ascii characters and white-space characters. The odd thing is that this works beautifully on OSX, but when I do this on Linux RedHat 8.0, it doesn't strip all the chars when run on the same file. Any ideas?
     
   
Thread Tools
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Top
Privacy Policy
All times are GMT -5. The time now is 10:10 AM.
All contents of these forums © 1995-2011 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.7 © 2000-2011, Jelsoft Enterprises Ltd., Content Relevant URLs by vBSEO 3.3.2