Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Developer Center > dynamic list of search engine user-agents w. Ajax sites

dynamic list of search engine user-agents w. Ajax sites
Thread Tools
besson3c
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
Nov 11, 2006, 11:03 PM
 
Hello,

I'm looking for a way to consult some sort of dynamic list of search engine robot/crawler user-agents so that I can provide a separate version of my pages that can be crawled by Google. I'm doing a lot of Ajax these days, and according to Google's FAQ it is best to rely on Javascript to provide content you would like a search engine to be able to crawl.

In the past, I've just done a quick manual check of some common User Agents (Google and a few others) using the PHP predefined variable, but it would be nice to work with something more dynamic and inclusive of all search engine user-agents.


Any suggestions on the best technique and/or approach here?

Thanks in advance!
     
registered_user
Dedicated MacNNer
Join Date: Nov 2001
Location: Are Eye
Status: Offline
Reply With Quote
Nov 12, 2006, 09:32 AM
 
I suggest you don't do it.

If Google discovers that you are delivering separate content to Googlebot, they are likely to remove you from their index. Google can check this easily by crawling your page, switching its uA to IE, and hitting it again.

You have to be huge like MS to serve up custom content to Google (yes, MS did that for years, they may still do, without being removed).

And according to any accessibility FAQ, relying on js to serve your content is a poor idea.

All that having been said, I don't know of any public databases of search engine userAgents, and especially not one with an API you can tap into.
     
besson3c  (op)
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
Nov 15, 2006, 02:48 AM
 
Sorry for my slow response...

The user agent detection will only be to assist Google in being able to crawl the content that is available on the site, not to skew results and improve my page rank.

Is this sort of thing frowned upon? If so, what is your source if you don't mind me asking?

Thanks for your help!
     
registered_user
Dedicated MacNNer
Join Date: Nov 2001
Location: Are Eye
Status: Offline
Reply With Quote
Nov 15, 2006, 09:04 PM
 
The practice is called cloaking. Google for SEO and cloaking and you'll find a ton of sources.

I understand that you want to do it in a benign way, but the penalty for getting caught is steep.

as for a list of userAgents, just check your logs. It isn't dynamic, per se, but it ought to suit your purposes if that's the approach you are going to take.
     
besson3c  (op)
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
Nov 20, 2006, 09:14 PM
 
Thanks man...

Thanks to your helpful information, I researched cloaking further, and learned that it is a rather controversial issue. However, the consensus seems to be that there are legitimate uses for cloaking. As long as cloaking isn't used to skew search results in a favorable way to the cloaker, it seems like an accepted practice.

I wish there was a better way to do what I"m going, but so far I've get to come up with an alternative solution...
     
besson3c  (op)
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
May 27, 2007, 01:20 AM
 
Hello,

For those of you that have faced this problem before, you probably know that the most common solutions include not using AJAX for site navigation in this manner, providing two separate versions of your pages, or take the hit of not having your content indexed.

I've come up with what may be a new technique that I hope you might find useful. It involves rewriting URLs on the fly via a Javascript onload event handler. All browsers without Javascript support will simply ignore this trigger, leaving your regular non-Javascript links intact.

I hope you find this technique useful.

NetMusician Labs � Blog Archive � Making AJAX generated pages search engine friendly
     
   
 
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Top
Privacy Policy
All times are GMT -4. The time now is 03:44 PM.
All contents of these forums © 1995-2017 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.8 © 2000-2017, Jelsoft Enterprises Ltd.,