Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Developer Center > a script to cull data from a site's internal database?

a script to cull data from a site's internal database?
Thread Tools
TheJoshu
Mac Elite
Join Date: Oct 1999
Location: Brooklyn, New York, USA
Status: Offline
Reply With Quote
Jun 4, 2004, 03:43 PM
 
I'm going to try to make this as clear as possible and even doing that, this probably isn't the right forum. But hey, you all are smart, and it's worth a shot.

I am doing a database in SQL. It contains a list of financial advisers, as can be downloaded from the SEC website. However, some of my data is spotty (especially an endless number of missing phone numbers) and I'm trying to automate as much of the cleaning process as possible.

The SEC site has a search engine in which you can input the name/part of the name or the id number of a financial adviser and an informational form that they all have to fill out pops up. Do a test, go to the site at http://www.adviserinfo.sec.gov/IAPD/...SearchInit.asp and input a generic piece of an adviser name like 'asset management.' I need to manually get data from different pages of what's called an adviser's form ADV. Think phone number from the front page, URL from schedule D page, and assets under management from the Information About Your Adviser page, as a small example.

Now to the question:

Is there any way to build a script in any language that you know of (since it is MacNN you can pretend I'm asking specifically about RealBASIC or AppleScript ) that can take a table of ID numbers [I can put the table into Excel if it helps], input them into the SEC's search engine and record specific chunks of the data that comes up? If it's helpful, the data that is variable to the adviser comes up in red.

Now, to x-post to some SQL messageboards

Thanks in advance for any possible help!
     
Arkham_c
Mac Elite
Join Date: Dec 2001
Location: Atlanta, GA, USA
Status: Offline
Reply With Quote
Jun 4, 2004, 08:49 PM
 
Originally posted by TheJoshu:
Now to the question:

Is there any way to build a script in any language that you know of (since it is MacNN you can pretend I'm asking specifically about RealBASIC or AppleScript ) that can take a table of ID numbers [I can put the table into Excel if it helps], input them into the SEC's search engine and record specific chunks of the data that comes up? If it's helpful, the data that is variable to the adviser comes up in red.
Yeah, that's easy.

You just use some sort of socket class. I'd personally write it in Python, but RealBasic would work fine too (5.5 has some nice httpsocket classes).
Mac Pro 2x 2.66 GHz Dual core, Apple TV 160GB, two Windows XP PCs
     
TheJoshu  (op)
Mac Elite
Join Date: Oct 1999
Location: Brooklyn, New York, USA
Status: Offline
Reply With Quote
Jun 4, 2004, 09:02 PM
 
Hm, looks like it's time to spend the weekend teaching myself something new!

Edit, though - perhaps you can be a little more descriptive? Especially if it's not too complex. I'm not sure I've even got a clue where to start, although now at least I've got a possible lead.
( Last edited by TheJoshu; Jun 4, 2004 at 11:59 PM. )
     
bens1901
Registered User
Join Date: Sep 2002
Location: New York City
Status: Offline
Reply With Quote
Jun 5, 2004, 05:22 AM
 
Is there any way to build a script in any language that you know of that can take a table of ID numbers, input them into the SEC's search engine and record specific chunks of the data that comes up? [/B]
Before you proceed with this plan, you may want to review the Terms of Use or any related documents on the SEC site to make sure that the system will allow you to perform this automated function. Although the information is available to a human surfing the web, such a system may not permit an automated data collection.

I did a few years of technical consulting to some Federal agencies in DC and this issue came up during the development of a few systems.

Good luck :-)
     
   
 
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Top
Privacy Policy
All times are GMT -4. The time now is 07:20 AM.
All contents of these forums © 1995-2017 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.8 © 2000-2017, Jelsoft Enterprises Ltd.,