Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Developer Center > db driven site dump?

db driven site dump?
Thread Tools
benandkelley
Junior Member
Join Date: Sep 2004
Status: Offline
Reply With Quote
Dec 14, 2004, 01:47 PM
 
Hi all,

I have an old database driven web site that has about 5,000 records in the database. The site has maybe 10 or 12 pages, each with links to all of the 5,000 records.

I've been asked to dump all of the pages into static format. Rather than visiting each page and doing "Save as", I'd like to find a utility that will follow all internal links and save the html to my local drive.

Any info would be greatly appreciated.

Thanks :-)

-Ben
     
Scotttheking
Moderator Emeritus
Join Date: Dec 2000
Location: College Park, MD
Status: Offline
Reply With Quote
Dec 14, 2004, 02:33 PM
 
wget
curl
My website
Help me pay for college. Click for more info.
     
Chris O'Brien
Grizzled Veteran
Join Date: Nov 2003
Location: Hebburn, UK
Status: Offline
Reply With Quote
Dec 14, 2004, 02:46 PM
 
You could use wget to do it.

wget -m http://your.site.com/

The -m flag tells wget to mirror the site, so it gets everything etc.

Not too sure if it'll work - can you let me know if it does? Oh, and you'll need to get wget from here if you haven't got it installed already (it doesn't come with OS X by default, I believe).

Edit: bah! beaten to it. curl comes with OS X (I was going to suggest it, but I prefer wget ), so you might want to use that instead.
Just who are Britain? What do they? Who is them? And why?

Formerly Black Book
     
Love Calm Quiet
Mac Elite
Join Date: Mar 2001
Location: CO
Status: Offline
Reply With Quote
Dec 15, 2004, 10:14 AM
 
The documentation page at http://www.gnu.org/software/wget was 404 this morning.

Can anyone tell me if this is pretty safe for a 'nix newbie who's never done much but 'top' ?

Where will I find the 'mirror' ? What does it look like: a folder of all the pages?

Am I right that this will *process* all the php in my web pages, calling upon the data of its mysql files, and produce an html page that's rather like what "View Source" gives me of my my php-filled out site pages?

I notice that http://www.gnu.org/software/wget talks about this "not keeping passwords safe" - so I presume that the php in my pages shows? (with its passwords for accessing the mysql dbs?) Doesn't that mean that *others* could use wget upon my site (or does wget ask for a pw at some poit. [sorry, you're dealing with a naive neophyte]

Finally: Since this is *sounding* like such an interesting process, how come I never stumbled across it in my newbie web-design books?
TOMBSTONE: "He's trashed his last preferences"
     
Simon Mundy
Grizzled Veteran
Join Date: Jun 2001
Location: Melbourne, Australia
Status: Offline
Reply With Quote
Dec 15, 2004, 11:26 AM
 
Originally posted by Love Calm Quiet:
The documentation page at http://www.gnu.org/software/wget was 404 this morning.

Can anyone tell me if this is pretty safe for a 'nix newbie who's never done much but 'top' ?

Where will I find the 'mirror' ? What does it look like: a folder of all the pages?

Am I right that this will *process* all the php in my web pages, calling upon the data of its mysql files, and produce an html page that's rather like what "View Source" gives me of my my php-filled out site pages?

I notice that http://www.gnu.org/software/wget talks about this "not keeping passwords safe" - so I presume that the php in my pages shows? (with its passwords for accessing the mysql dbs?) Doesn't that mean that *others* could use wget upon my site (or does wget ask for a pw at some poit. [sorry, you're dealing with a naive neophyte]

Finally: Since this is *sounding* like such an interesting process, how come I never stumbled across it in my newbie web-design books?
Try this:- http://www.jim.roberts.net/articles/wget.html

None of your PHP code will show because all it's doing is retrieving what a web browser would see anyway. It's a really simple way to rip a whole site, convert it to static HTML and have it automatically recreate the links to each page. I was pretty chuffed when I found out about it a year ago but I haven't had much use for it lately. It would be a good tool to archive a 'snapshot' of your folio sites before your clients wreck 'em or get forgotten about

Yep, someone could use WGET on your site. Theoretically you could do some useragent sniffing to stop them but why bother. Public pages are public pages.
Computer thez nohhh...
     
Chris O'Brien
Grizzled Veteran
Join Date: Nov 2003
Location: Hebburn, UK
Status: Offline
Reply With Quote
Dec 15, 2004, 11:59 AM
 
Originally posted by Love Calm Quiet:
The documentation page at http://www.gnu.org/software/wget was 404 this morning.

Can anyone tell me if this is pretty safe for a 'nix newbie who's never done much but 'top' ?

Where will I find the 'mirror' ? What does it look like: a folder of all the pages?

Am I right that this will *process* all the php in my web pages, calling upon the data of its mysql files, and produce an html page that's rather like what "View Source" gives me of my my php-filled out site pages?

I notice that http://www.gnu.org/software/wget talks about this "not keeping passwords safe" - so I presume that the php in my pages shows? (with its passwords for accessing the mysql dbs?) Doesn't that mean that *others* could use wget upon my site (or does wget ask for a pw at some poit. [sorry, you're dealing with a naive neophyte]

Finally: Since this is *sounding* like such an interesting process, how come I never stumbled across it in my newbie web-design books?
wget will get the pages by asking for them with an http request. So your server will parse the PHP and spit out the HTML you'd see if viewing with a normal browser.

The files will be stored in a folder within your working directory named after the site, like www.something.com. You're working directory is your home folder by default when you open up a terminal window - navigate somewhere else and run wget if you'd like the mirror to be saved elsewhere.
Just who are Britain? What do they? Who is them? And why?

Formerly Black Book
     
benandkelley  (op)
Junior Member
Join Date: Sep 2004
Status: Offline
Reply With Quote
Dec 17, 2004, 01:14 PM
 
Originally posted by Black Book:
You could use wget to do it.

can you let me know if it does?
Yes. It did what I was hoping to do. It copied all of the linked pages. Thanks everyone for the advice :-)
     
Avenir
Mac Elite
Join Date: Jan 2000
Location: Pittsburgh, PA
Status: Offline
Reply With Quote
Dec 20, 2004, 09:39 PM
 
perhaps I'm just dumb with this, but I can't seem to get wget to install. I found a mpkg on a site with 1.9.1 and ran it to install, but the command "wget" still can't be found in the terminal? is there something i'm missing?

spike[at]avenirex[dot]com | Avenirex
IM - Avenirx | ICQ - 3932806
     
Phil Sherry
Dedicated MacNNer
Join Date: Nov 2004
Location: Stockholm, Sweden
Status: Offline
Reply With Quote
Dec 21, 2004, 04:21 AM
 
Originally posted by Avenir:
perhaps I'm just dumb with this, but I can't seem to get wget to install. I found a mpkg on a site with 1.9.1 and ran it to install, but the command "wget" still can't be found in the terminal? is there something i'm missing?
sounds like you need a full path. type which wget to determine the path, and then try it with that.

(i'm not at a Mac for another 10 hours, so i can't be 100% about that being the problem)
     
Chris O'Brien
Grizzled Veteran
Join Date: Nov 2003
Location: Hebburn, UK
Status: Offline
Reply With Quote
Dec 21, 2004, 10:46 AM
 
I believe wget is installed to /usr/local/bin/, so you'll need to add that to your $PATH, or type in the full path each time you want to use it.

I'd have to check to be sure though.
Just who are Britain? What do they? Who is them? And why?

Formerly Black Book
     
   
 
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Top
Privacy Policy
All times are GMT -4. The time now is 08:50 PM.
All contents of these forums © 1995-2017 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.8 © 2000-2017, Jelsoft Enterprises Ltd.,