|
|
a script to cull data from a site's internal database?
|
|
|
|
Mac Elite
Join Date: Oct 1999
Location: Brooklyn, New York, USA
Status:
Offline
|
|
I'm going to try to make this as clear as possible and even doing that, this probably isn't the right forum. But hey, you all are smart, and it's worth a shot.
I am doing a database in SQL. It contains a list of financial advisers, as can be downloaded from the SEC website. However, some of my data is spotty (especially an endless number of missing phone numbers) and I'm trying to automate as much of the cleaning process as possible.
The SEC site has a search engine in which you can input the name/part of the name or the id number of a financial adviser and an informational form that they all have to fill out pops up. Do a test, go to the site at http://www.adviserinfo.sec.gov/IAPD/...SearchInit.asp and input a generic piece of an adviser name like 'asset management.' I need to manually get data from different pages of what's called an adviser's form ADV. Think phone number from the front page, URL from schedule D page, and assets under management from the Information About Your Adviser page, as a small example.
Now to the question:
Is there any way to build a script in any language that you know of (since it is MacNN you can pretend I'm asking specifically about RealBASIC or AppleScript ) that can take a table of ID numbers [I can put the table into Excel if it helps], input them into the SEC's search engine and record specific chunks of the data that comes up? If it's helpful, the data that is variable to the adviser comes up in red.
Now, to x-post to some SQL messageboards
Thanks in advance for any possible help!
|
|
|
|
|
|
|
|
|
Mac Elite
Join Date: Dec 2001
Location: Atlanta, GA, USA
Status:
Offline
|
|
Originally posted by TheJoshu:
Now to the question:
Is there any way to build a script in any language that you know of (since it is MacNN you can pretend I'm asking specifically about RealBASIC or AppleScript ) that can take a table of ID numbers [I can put the table into Excel if it helps], input them into the SEC's search engine and record specific chunks of the data that comes up? If it's helpful, the data that is variable to the adviser comes up in red.
Yeah, that's easy.
You just use some sort of socket class. I'd personally write it in Python, but RealBasic would work fine too (5.5 has some nice httpsocket classes).
|
Mac Pro 2x 2.66 GHz Dual core, Apple TV 160GB, two Windows XP PCs
|
|
|
|
|
|
|
|
Mac Elite
Join Date: Oct 1999
Location: Brooklyn, New York, USA
Status:
Offline
|
|
Hm, looks like it's time to spend the weekend teaching myself something new!
Edit, though - perhaps you can be a little more descriptive? Especially if it's not too complex. I'm not sure I've even got a clue where to start, although now at least I've got a possible lead.
(
Last edited by TheJoshu; Jun 4, 2004 at 11:59 PM.
)
|
|
|
|
|
|
|
|
|
Registered User
Join Date: Sep 2002
Location: New York City
Status:
Offline
|
|
Is there any way to build a script in any language that you know of that can take a table of ID numbers, input them into the SEC's search engine and record specific chunks of the data that comes up? [/B]
Before you proceed with this plan, you may want to review the Terms of Use or any related documents on the SEC site to make sure that the system will allow you to perform this automated function. Although the information is available to a human surfing the web, such a system may not permit an automated data collection.
I did a few years of technical consulting to some Federal agencies in DC and this issue came up during the development of a few systems.
Good luck :-)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Forum Rules
|
|
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is Off
|
|
|
|
|
|
|
|
|
|
|
|