|
|
db driven site dump?
|
|
|
|
Junior Member
Join Date: Sep 2004
Status:
Offline
|
|
Hi all,
I have an old database driven web site that has about 5,000 records in the database. The site has maybe 10 or 12 pages, each with links to all of the 5,000 records.
I've been asked to dump all of the pages into static format. Rather than visiting each page and doing "Save as", I'd like to find a utility that will follow all internal links and save the html to my local drive.
Any info would be greatly appreciated.
Thanks :-)
-Ben
|
|
|
|
|
|
|
|
|
Moderator Emeritus
Join Date: Dec 2000
Location: College Park, MD
Status:
Offline
|
|
|
|
|
|
|
|
|
|
|
Grizzled Veteran
Join Date: Nov 2003
Location: Hebburn, UK
Status:
Offline
|
|
You could use wget to do it.
wget -m http://your.site.com/
The -m flag tells wget to mirror the site, so it gets everything etc.
Not too sure if it'll work - can you let me know if it does? Oh, and you'll need to get wget from here if you haven't got it installed already (it doesn't come with OS X by default, I believe).
Edit: bah! beaten to it. curl comes with OS X (I was going to suggest it, but I prefer wget ), so you might want to use that instead.
|
Just who are Britain? What do they? Who is them? And why?
Formerly Black Book
|
|
|
|
|
|
|
|
Mac Elite
Join Date: Mar 2001
Location: CO
Status:
Offline
|
|
The documentation page at http://www.gnu.org/software/wget was 404 this morning.
Can anyone tell me if this is pretty safe for a 'nix newbie who's never done much but 'top' ?
Where will I find the 'mirror' ? What does it look like: a folder of all the pages?
Am I right that this will *process* all the php in my web pages, calling upon the data of its mysql files, and produce an html page that's rather like what "View Source" gives me of my my php-filled out site pages?
I notice that http://www.gnu.org/software/wget talks about this "not keeping passwords safe" - so I presume that the php in my pages shows? (with its passwords for accessing the mysql dbs?) Doesn't that mean that *others* could use wget upon my site (or does wget ask for a pw at some poit. [sorry, you're dealing with a naive neophyte]
Finally: Since this is *sounding* like such an interesting process, how come I never stumbled across it in my newbie web-design books?
|
TOMBSTONE: "He's trashed his last preferences"
|
|
|
|
|
|
|
|
Grizzled Veteran
Join Date: Jun 2001
Location: Melbourne, Australia
Status:
Offline
|
|
Originally posted by Love Calm Quiet:
The documentation page at http://www.gnu.org/software/wget was 404 this morning.
Can anyone tell me if this is pretty safe for a 'nix newbie who's never done much but 'top' ?
Where will I find the 'mirror' ? What does it look like: a folder of all the pages?
Am I right that this will *process* all the php in my web pages, calling upon the data of its mysql files, and produce an html page that's rather like what "View Source" gives me of my my php-filled out site pages?
I notice that http://www.gnu.org/software/wget talks about this "not keeping passwords safe" - so I presume that the php in my pages shows? (with its passwords for accessing the mysql dbs?) Doesn't that mean that *others* could use wget upon my site (or does wget ask for a pw at some poit. [sorry, you're dealing with a naive neophyte]
Finally: Since this is *sounding* like such an interesting process, how come I never stumbled across it in my newbie web-design books?
Try this:- http://www.jim.roberts.net/articles/wget.html
None of your PHP code will show because all it's doing is retrieving what a web browser would see anyway. It's a really simple way to rip a whole site, convert it to static HTML and have it automatically recreate the links to each page. I was pretty chuffed when I found out about it a year ago but I haven't had much use for it lately. It would be a good tool to archive a 'snapshot' of your folio sites before your clients wreck 'em or get forgotten about
Yep, someone could use WGET on your site. Theoretically you could do some useragent sniffing to stop them but why bother. Public pages are public pages.
|
Computer thez nohhh...
|
|
|
|
|
|
|
|
Grizzled Veteran
Join Date: Nov 2003
Location: Hebburn, UK
Status:
Offline
|
|
Originally posted by Love Calm Quiet:
The documentation page at http://www.gnu.org/software/wget was 404 this morning.
Can anyone tell me if this is pretty safe for a 'nix newbie who's never done much but 'top' ?
Where will I find the 'mirror' ? What does it look like: a folder of all the pages?
Am I right that this will *process* all the php in my web pages, calling upon the data of its mysql files, and produce an html page that's rather like what "View Source" gives me of my my php-filled out site pages?
I notice that http://www.gnu.org/software/wget talks about this "not keeping passwords safe" - so I presume that the php in my pages shows? (with its passwords for accessing the mysql dbs?) Doesn't that mean that *others* could use wget upon my site (or does wget ask for a pw at some poit. [sorry, you're dealing with a naive neophyte]
Finally: Since this is *sounding* like such an interesting process, how come I never stumbled across it in my newbie web-design books?
wget will get the pages by asking for them with an http request. So your server will parse the PHP and spit out the HTML you'd see if viewing with a normal browser.
The files will be stored in a folder within your working directory named after the site, like www.something.com. You're working directory is your home folder by default when you open up a terminal window - navigate somewhere else and run wget if you'd like the mirror to be saved elsewhere.
|
Just who are Britain? What do they? Who is them? And why?
Formerly Black Book
|
|
|
|
|
|
|
|
Junior Member
Join Date: Sep 2004
Status:
Offline
|
|
Originally posted by Black Book:
You could use wget to do it.
can you let me know if it does?
Yes. It did what I was hoping to do. It copied all of the linked pages. Thanks everyone for the advice :-)
|
|
|
|
|
|
|
|
|
Mac Elite
Join Date: Jan 2000
Location: Pittsburgh, PA
Status:
Offline
|
|
perhaps I'm just dumb with this, but I can't seem to get wget to install. I found a mpkg on a site with 1.9.1 and ran it to install, but the command "wget" still can't be found in the terminal? is there something i'm missing?
|
spike[at]avenirex[dot]com | Avenirex
IM - Avenirx | ICQ - 3932806
|
|
|
|
|
|
|
|
Dedicated MacNNer
Join Date: Nov 2004
Location: Stockholm, Sweden
Status:
Offline
|
|
Originally posted by Avenir:
perhaps I'm just dumb with this, but I can't seem to get wget to install. I found a mpkg on a site with 1.9.1 and ran it to install, but the command "wget" still can't be found in the terminal? is there something i'm missing?
sounds like you need a full path. type which wget to determine the path, and then try it with that.
(i'm not at a Mac for another 10 hours, so i can't be 100% about that being the problem)
|
|
|
|
|
|
|
|
|
Grizzled Veteran
Join Date: Nov 2003
Location: Hebburn, UK
Status:
Offline
|
|
I believe wget is installed to /usr/local/bin/, so you'll need to add that to your $PATH, or type in the full path each time you want to use it.
I'd have to check to be sure though.
|
Just who are Britain? What do they? Who is them? And why?
Formerly Black Book
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Forum Rules
|
|
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is Off
|
|
|
|
|
|
|
|
|
|
|
|