 |
 |
Help with searching of Spanish words
|
 |
|
 |
|
Mac Elite
Join Date: Jan 2001
Status:
Offline
|
|
Hi,
Does anyone how I could accomplish this:
Basically, I have a MySQL database of names - some names have Spanish accents in them. I want to build a web interface in PHP to search this database. However, I want the names with Spanish accents to be shown in the search results, regardless if the search was spelled without the accents.
For example:
The database has "Niño" in it. I want it to be returned if the user searched by using "nino" or "niño".
Any ideas?
Thanks!
|
|
|
| |
|
|
|
 |
|
 |
|
Moderator 
Join Date: Mar 2004
Location: Copenhagen
Status:
Offline
|
|
(Warning: I'm not in any way a PHP wiz, so everything I say might be complete and utter bullsh*t)
You should be able to make some sort of function to convert all diacritic letters to non-diacritic letters in the search somehow... how you do that, however, I have no idea (which means I'm not much help here really, bah...)
|
|
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Jan 2001
Status:
Offline
|
|
No, that helps. Do you know where to find a list of all the accented characters found in Spanish?
But, now that I think about it, that still wouldn't work - to change every character in either the search string or in the database would be like decrypting a code - I would have to switch each character to each equivalent character and search. THis would take a supercomputer. Or am I missing something?
Thanks!
|
|
|
| |
|
|
|
 |
|
 |
|
Moderator 
Join Date: Mar 2004
Location: Copenhagen
Status:
Offline
|
|
Hmm... well, I don't exactly know how the system works, but in the search function (or the database? somewhere!), you must be able to do something like set a variable (say, X) and then do something like "if X == ñ { X = n; }" or something... (I think that was more JavaScript-ish than PHP-ish, but I'm too tired to care)...
As for a list of the accented letters in Spanish, to my knowledge, there are only á, é, í, ó, ú, (ý? Maybe in a few special cases), ñ and ü, unless you count ch and ll as special letters (they are counted as special letters in the Spanish alphabet, but I don't think they are on a computer)...
|
|
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Oct 2000
Status:
Offline
|
|
Originally posted by Oisín:
Hmm... well, I don't exactly know how the system works, but in the search function (or the database? somewhere!), you must be able to do something like set a variable (say, X) and then do something like "if X == ñ { X = n; }" or something... (I think that was more JavaScript-ish than PHP-ish, but I'm too tired to care)...
As for a list of the accented letters in Spanish, to my knowledge, there are only á, é, í, ó, ú, (ý? Maybe in a few special cases), ñ and ü, unless you count ch and ll as special letters (they are counted as special letters in the Spanish alphabet, but I don't think they are on a computer)...
I believe his problem is that the data is stored using accents. If it weren't, then it'd be dead easy to simply strip out the accents and then do a search. However doing a search with Nino (no accents) would have to search for every possible permutation of accented characters. Ñino, Ñiño, Niño... etc, you get the idea.
My only possible suggestion is to see if, whenever data is entered into the database, via whatever method you use, a second copy, without accents would be stored  Perhaps someone more knowledgeable with databases has a more efficient solution? Is there a way to perform a search in a database, while running each database entry during a comparison through a function that strips out the accents?
If you can get that to work, you could swap between a normal search, if their entry includes any accents, or a modified search which searchs through the database, stripped of accents.
I might be missing something though 
|
|
|
| |
|
|
|
 |
|
 |
|
Grizzled Veteran
Join Date: Jun 2001
Location: Melbourne, Australia
Status:
Offline
|
|
Originally posted by Synotic:
I believe his problem is that the data is stored using accents. If it weren't, then it'd be dead easy to simply strip out the accents and then do a search. However doing a search with Nino (no accents) would have to search for every possible permutation of accented characters. Ñino, Ñiño, Niño... etc, you get the idea.
My only possible suggestion is to see if, whenever data is entered into the database, via whatever method you use, a second copy, without accents would be stored Perhaps someone more knowledgeable with databases has a more efficient solution? Is there a way to perform a search in a database, while running each database entry during a comparison through a function that strips out the accents?
If you can get that to work, you could swap between a normal search, if their entry includes any accents, or a modified search which searchs through the database, stripped of accents.
I might be missing something though
PhpDig is a good example of this - when it spiders, it translates all 'foreign' characters before storing the meta data in its database. Then it performs the same translations on search queries. It's pretty quick for a php-based solution - give it a whirl at http://www.phpdig.net/ to see for yerself.
The way it achieves this is to store the keyword as a 'stripped' version, and then stores a lookup that links this keyword to an abbreviated meta data / URL abstract - that way when you retrieve the search results it still shows up as the original unaltered phrase.
The one thing PHPDig misses is the ability to 'stem' words, but having added this functionality to another tailor-made solution myself, it wouldn't be hard to add this to phpdig either. Drop me a line if you'd like the code I used for that.
Cheerio!
|
|
Computer thez nohhh...
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Jan 2001
Status:
Offline
|
|
Thanks - I sent you a Private msg asking for the code. My email is [timmerk] at [comcast.] NET
Thanks!
|
|
|
| |
|
|
|
 |
 |
|
 |
|
|
|
|
|

|
|
 |
Forum Rules
|
 |
 |
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is Off
|
|
|
|
|
|
 |
 |
 |
 |
|
 |
|