 |
 |
Perl capabilities?
|
 |
|
 |
|
Registered User
Join Date: Nov 2001
Location: Jersey
Status:
Offline
|
|
Can I write a perl script that will retrieve the HTML from a page passed to it? ie.. if i go to..
http://localhost/web/cgi-bin/geturl....www.apple.com/
..can it get the source from apple.com/index.html and return it in a page of my own? Essentially what i'm trying to do is make a workaround for the stupid stupid firewall on my school's network...it's got Penny-Arcade and Slashdot blocked! That's pure evil. I'd also like to fix images that have a (an example) path of /images/image.gif..but that's not the first priority. Anyone seen a script like this, or know how to get started retreiving URL's with perl?
|
|
|
| |
|
|
|
 |
|
 |
|
Senior User
Join Date: Mar 2001
Location: Bay Area, CA
Status:
Offline
|
|
Originally posted by command-tab:
<STRONG>Can I write a perl script that will retrieve the HTML from a page passed to it? ie.. if i go to..
http://localhost/web/cgi-bin/geturl....www.apple.com/
..can it get the source from apple.com/index.html and return it in a page of my own? Essentially what i'm trying to do is make a workaround for the stupid stupid firewall on my school's network...it's got Penny-Arcade and Slashdot blocked! That's pure evil. I'd also like to fix images that have a (an example) path of /images/image.gif..but that's not the first priority. Anyone seen a script like this, or know how to get started retreiving URL's with perl?</STRONG>
Hmmm, I might be missing a bit of the point, but if what I understand is correct I don't think you need Perl. If all you want to do is download the webpage to disk, and view it offline, you can use "curl" from the command line.
"curl http://slashdot.org/ > index.html" will pull the Slashdot mainpage into a local index.html file. (note that this did not work using www.slashdot.org for me). It doesn't get the pictures this way, but you can probably get those with "curl" as well. This should work as long as the Firewall is only for the web browsers.
"man curl" will reveal more. I also used to use "wget" (or wget -r for recursive downloads), but Apple no longer distributes wget with the OS, you have to install it yourself.
Apologies if I'm way off base here.
|
|
|
| |
|
|
|
 |
|
 |
|
Registered User
Join Date: Nov 2001
Location: Jersey
Status:
Offline
|
|
Here's what I've got so far:
<BLOCKQUOTE><font size="1"face="Geneva, Verdana, Arial">code:</font><HR><pre><font size=1 face=courier>
#!/usr/bin/perl
print <font color = red>"Content-type:text/html\n\n"</font>;
$out=system(<font color = red>"curl"</font>,<font color = red>"-s"</font>,$ENV{'QUERY_STRING'});
print <font color = red>"$out"</font>;
</font>[/code]
This gets the page, but if the images are linked to using "/images/image.gif", they won't show up..for obvious reasons. But if they're made like "http://www.apple.com/images/image.gif", they work fine. How can I add the server root to all URL's? Also, can I pipe it through grep first, to filter out any foul language? If so...how?
Check it out... http://macgyvr64.homeftp.net/proxy.c...www.apple.com/
doesn't work: http://macgyvr64.homeftp.net/proxy.c...nny-arcade.com
|
|
|
| |
|
|
|
 |
|
 |
|
Registered User
Join Date: Nov 2001
Location: Jersey
Status:
Offline
|
|
Anyone have an idea how I could use grep (or some other command) to figure out the shortened links?
|
|
|
| |
|
|
|
 |
|
 |
|
Fresh-Faced Recruit
Join Date: Jun 2001
Status:
Offline
|
|
to make relative images work again you don't need to rewrite all the urls. just put <BASE HREF="www.foo.com"> in the <head> and all relative urls are based on that instead.
|
|
|
| |
|
|
|
 |
 |
|
 |
|
|
|
|
|

|
|
 |
Forum Rules
|
 |
 |
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is Off
|
|
|
|
|
|
 |
 |
 |
 |
|
 |
|