Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Developer Center > Perl capabilities?

Perl capabilities?
Thread Tools
Registered User
Join Date: Nov 2001
Location: Jersey
Status: Offline
Reply With Quote
Apr 29, 2002, 10:05 PM
 
Can I write a perl script that will retrieve the HTML from a page passed to it? ie.. if i go to..
http://localhost/web/cgi-bin/geturl....www.apple.com/

..can it get the source from apple.com/index.html and return it in a page of my own? Essentially what i'm trying to do is make a workaround for the stupid stupid firewall on my school's network...it's got Penny-Arcade and Slashdot blocked! That's pure evil. I'd also like to fix images that have a (an example) path of /images/image.gif..but that's not the first priority. Anyone seen a script like this, or know how to get started retreiving URL's with perl?
     
Senior User
Join Date: Mar 2001
Location: Bay Area, CA
Status: Offline
Reply With Quote
Apr 30, 2002, 08:19 PM
 
Originally posted by command-tab:
<STRONG>Can I write a perl script that will retrieve the HTML from a page passed to it? ie.. if i go to..
http://localhost/web/cgi-bin/geturl....www.apple.com/

..can it get the source from apple.com/index.html and return it in a page of my own? Essentially what i'm trying to do is make a workaround for the stupid stupid firewall on my school's network...it's got Penny-Arcade and Slashdot blocked! That's pure evil. I'd also like to fix images that have a (an example) path of /images/image.gif..but that's not the first priority. Anyone seen a script like this, or know how to get started retreiving URL's with perl?</STRONG>
Hmmm, I might be missing a bit of the point, but if what I understand is correct I don't think you need Perl. If all you want to do is download the webpage to disk, and view it offline, you can use "curl" from the command line.

"curl http://slashdot.org/ &gt; index.html" will pull the Slashdot mainpage into a local index.html file. (note that this did not work using www.slashdot.org for me). It doesn't get the pictures this way, but you can probably get those with "curl" as well. This should work as long as the Firewall is only for the web browsers.

"man curl" will reveal more. I also used to use "wget" (or wget -r for recursive downloads), but Apple no longer distributes wget with the OS, you have to install it yourself.

Apologies if I'm way off base here.
     
Registered User
Join Date: Nov 2001
Location: Jersey
Status: Offline
Reply With Quote
Apr 30, 2002, 09:26 PM
 
Here's what I've got so far:

<BLOCKQUOTE><font size="1"face="Geneva, Verdana, Arial">code:</font><HR><pre><font size=1 face=courier>
#!/usr/bin/perl
print <font color = red>"Content-type:text/html\n\n"</font>;
$out=system(<font color = red>"curl"</font>,<font color = red>"-s"</font>,$ENV{'QUERY_STRING'});
print <font color = red>"$out"</font>;
</font>[/code]

This gets the page, but if the images are linked to using "/images/image.gif", they won't show up..for obvious reasons. But if they're made like "http://www.apple.com/images/image.gif", they work fine. How can I add the server root to all URL's? Also, can I pipe it through grep first, to filter out any foul language? If so...how?

Check it out... http://macgyvr64.homeftp.net/proxy.c...www.apple.com/

doesn't work: http://macgyvr64.homeftp.net/proxy.c...nny-arcade.com
     
Registered User
Join Date: Nov 2001
Location: Jersey
Status: Offline
Reply With Quote
May 7, 2002, 05:27 PM
 
Anyone have an idea how I could use grep (or some other command) to figure out the shortened links?
     
Fresh-Faced Recruit
Join Date: Jun 2001
Status: Offline
Reply With Quote
May 10, 2002, 05:41 PM
 
to make relative images work again you don't need to rewrite all the urls. just put &lt;BASE HREF="www.foo.com"&gt; in the &lt;head&gt; and all relative urls are based on that instead.
     
   
Thread Tools
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Top
Privacy Policy
All times are GMT -5. The time now is 10:33 PM.
All contents of these forums © 1995-2011 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.7 © 2000-2011, Jelsoft Enterprises Ltd., Content Relevant URLs by vBSEO 3.3.2