 |
 |
Search Engine & Related news
|
 |
|
 |
|
Administrator 
Join Date: Dec 1998
Status:
Offline
|
|
I have finally implemented a new search engine for MacNN news, its forums, and the Website. As you all know, the search engine on the old site was excellent and very precise, and though it was perl and flat-file text based system, it worked (well). Much thanks to the author. Several months ago, we finally moved to a database system and since then our current search engine was just bad. It never really works or finds the correct news, etc. (I think this has to do with full text searching problems in mySQL.)
Over the past few days, I've implemented a new faster, more precise search engine. I haven't yet tested it fully, but was hoping you guys would help me. The neat thing is that it offers searching of the news, website, reviews, and forums at the same time. Take a look here:
http://www.macnn.com/search/mnogo/search.php
I haven't put the GUI touches on it (hoping that Misha will do that soon), but I'm looking for bugs. Here are the ones that I know of:
-- Website doesn't currently search Website, but instead pulls up news items (via URLs rather direct database index). I'm working on this.
-- page summary displays HTML tags. I'll need to strip these out while indexing, but don't know how to do it yet. I'm also considering making the summary longer.
-- forum summaries are useless. I'm looking for way to get the first 100 characters of the first post, but i'm not sure it's possible.
-- apostrophe's and quotes are still displayed as escaped characters. I'm not sure I can fix this, as the database stores the info in this manner. I'm lookin' though.
-- items on OS X and other sites will show up in results, but link will not display page. This appears to be problem with our PHP code. I looked into and thought I fixed it, but it still seems to present itself all time. I guess I'll look again.
-- news comments (in our database) and the extended news items (with "read more..." are not being indexed. I'm not sure if I should or not as it would require more code to index another table and link the results. The page should display the associated news extension and blurbs, but the engine will report results that only appear there. Comments, feedback on implementing this would be welcomed.
I'm sure there are many more, so please help. Post bugs here only. Thanks.
also any feature suggestions would be appreciated as well.
best,
monish
[This message has been edited by mkbhatia (edited 04-13-2001).]
|
|
|
| |
|
|
|
 |
|
 |
|
Administrator 
Join Date: Dec 1998
Status:
Offline
|
|
Ok, one bug down: I fixed it so all the results now work--even if they were originally posted on osx.macnn.com, games.macnn.com, etc.
still waiting for feedback, though =)
m.
|
|
|
| |
|
|
|
 |
|
 |
|
Clinically Insane
Join Date: Apr 2000
Status:
Offline
|
|
Ok, I just checked it out: First thing I noticed was that the results will refer to the same page several times if it found more than one occurance of the query in it.
Try searching for a username and you can come up with 10 results for the one thread/page.
More to come as I use it more...
EDIT: Very fast too BTW - nice
------------------
[This message has been edited by Cipher13 (edited 04-14-2001).]
|
|
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Jun 2000
Status:
Offline
|
|
First off, YAY!!!!!!!!!!!!!!  the new search is kick-ass! Now I can finally really tell people to search the fora again  . I especially like that it searches different pages of a thread seperately, and gives links to a particular page of a thread.
-- in addition to the summary being useless for forum threads, on reviews this appears at the beginning of every summary "[an error occurred while processing this directive]".
-- in a couple test searches, there several times came up entries like this one:
1. No title [1114116]
...
() undefined, bytes
where "no title" is a link to http://www.macnn.com/search/mnogo/ . it looks like the search engine is finding things which don't exist  . And yes, on one of my test searches a "no title" came up as the first result. not good for newbies.
-- how are the results ordered? there is nothing in the displayed results which indicates whether there is some relevancy attatched, or anything. It would be really nice if there were a feature to view results by date modified. One other particular thing: with seperate pages of a thread being searched seperately, it would be very useful if all pages of a thread that came up were listed together (eg. search "word association" with "query type"=all. almost every page of the 35+ page thread "More Word Association Fun" comes up, but they are in apparently random order, and the other results are interspersed, which really doesn't make sense.)
--it would be nice if forum searches could be restricted to a specific forum. I know there is nothing vaguely resembling the need the old search engine had, as this thing is massively speedy, but in some circumstances, I can imagine doing a search and only being interested in the results in a particular forum. for instance, if I search "expansion," that returns lots of results in PowerMac, iMac, and in PB; chances are I'm only interested in one of those categories. and entering the name of the target forum as a search term doesn't work perfectly in an "any" search, and won't work at all under a "any" search.
well, I don't feel like doing any more now, maybe if I get inspired again tommorow I'll post more problems/suggestions.
Again, thank you so much. The new search ROCKS!!!!!!!!!!!!!
------------------
be happy!
-mac freak
|
|
|
| |
|
|
|
 |
|
 |
|
georgius
|
|
I typed in "How many are using OS X full time" and the top result was that very topic from the Lounge. Cool. A real good search engine. Nice work.
Play it cool

|
|
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Jun 2000
Status:
Offline
|
|
another issue I just randomly came across: when looking at a user's profile, there is a "Display all posts by this user" button. This is a link still to the old search tool.
Which made it occur to me that in the new search, the only way to search for posts by a specific user would be to enter that user's name as one of the search criteria. This is sub-optimal.
As I think about it, it seems there are number of forum-search-specific features, and it might be nice to have a special set of features for a "forum-only search mode," including which forum(s) to search, a user name, etc. the features that the onld search engine had but was just too slot to do anything at all with.
I still love the new search
------------------
be happy!
-mac freak
|
|
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Jul 2000
Location: Washington, DC
Status:
Offline
|
|
Wow, that's way faster! Cool  So how does it order the results? I searched for "Myst III" and it brought up a couple of forum thread and news items as expected, but the number one find didn't really seem like a great number one (I was expecting one of the games.macnn.com news items). Perhaps you can play around with the ordering of the results. If it goes by the number of times the search criteria is mentioned on the page, then forum threads will invariably always come to the top....probably not the best solution.
Once again, very cool!
krove
------------------
*The next sentence is entirely true...
*The previous sentence is most decidedly false...
|
|
|
| |
|
|
|
 |
|
 |
|
Clinically Insane
Join Date: Apr 2000
Status:
Offline
|
|
Um... whens that search gonna be linked to by the "search" link at the top of the page?
------------------
|
|
|
| |
|
|
|
 |
|
 |
|
Grizzled Veteran
Join Date: Mar 2000
Location: Upstate, NY
Status:
Offline
|
|
Here I am
------------------
Ti/500/512/20
G4/533/512/40
|
|
|
| |
|
|
|
 |
|
 |
|
Clinically Insane
Join Date: Apr 2000
Status:
Offline
|
|
Hello?
Admins?
Link - not a hard thing to do
*sigh*
Why do I get the feeling they don't read this topic area?
------------------
|
|
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Jun 2000
Status:
Offline
|
|
maybe they're still waiting for more people to test it and post bugs/feature suggestions, or maybe they're still working on fixing specific bugs, like the useless summaries of Forum items, before making it the main thing.
although an update would be nice. even if it's just to say that if we want the new search engine, we need to do more testing and feedback.
------------------
be happy!
-mac freak
|
|
|
| |
|
|
|
 |
|
 |
|
Administrator 
Join Date: Dec 1998
Status:
Offline
|
|
I basically moved on to another project, the AI forums. After I finished that, I went on to the newsfeed, which was giving us trouble. I'm looking to fix a few more bugs before I go "live" with the search. The indexing of news/forums is not yet automatic. I may just make the page more public once I automate the indexing procedures. There are other bugs, which I don't yet have a solution for, like the forum summaries.
m.
|
|
|
| |
|
|
|
 |
|
 |
|
Clinically Insane
Join Date: Apr 2000
Status:
Offline
|
|
Its still a thousand times better than what is there now... you can't argue with that
------------------
|
|
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Jun 2000
Status:
Offline
|
|
thanks for the update monish 
tell us when it's ready for the next stage of testing.
------------------
be happy!
-mac freak
|
|
|
| |
|
|
|
 |
|
 |
|
Administrator 
Join Date: Dec 1998
Status:
Offline
|
|
Ok, the search engine is officially up, although I haven't really solved any more of the outstanding bugs. I have been able to, however, automate the indexing. The news is archived daily while the forums is continous. I've been a little out of touch, so we're a bit behind. It should be caught up by tomorrow morning.
Anybody who wants to help troubleshoot and has a bit of PHP/SQL knowledge is free to email me. I'd love some help in fixing the "bugs"--though for the most part it works.
best,
monish
|
|
|
| |
|
|
|
 |
|
 |
|
Administrator 
Join Date: Dec 1998
Status:
Offline
|
|
ok, i've fixed two visual bug:
1) the '\' escape character no longer appears in the results
2) the HTML are no longer part of the description. i've also changed the index to include a bit more of the news story. we'll see if we can display some more text.
the results look a bit cleaner. I also cleaned up the search page itself and it looks much better.
best,
monish
|
|
|
| |
|
|
|
 |
|
 |
|
Administrator 
Join Date: Dec 1998
Status:
Offline
|
|
Well, i don't know if anybody actually reads this, but here's an update.
I cleared the database and began reindexing everything. Obviously, the news and reviews went smoothly, but 'Website' and 'forums' are more trickly. There are about 50,0000 URLs in the forums, so that will take a while to index (i have 4-5 processes running simultaneously). There is a way to do it directly via ftp, but I've been unable to get that to work.
It'll be another day or so before the index is caught for the forums. I'll start a few more processes during the night hours to speed things up. It's crawled about 10% of the forums.
best,
monish
|
|
|
| |
|
|
|
 |
 |
|
 |
|
|
|
|
|

|
|
 |
Forum Rules
|
 |
 |
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is Off
|
|
|
|
|
|
 |
 |
 |
 |
|
 |
|