Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Developer Center > "Fuzzy" MySQL Searching

"Fuzzy" MySQL Searching
Thread Tools
macgyvr64
Grizzled Veteran
Join Date: Jun 2001
Status: Offline
Reply With Quote
Jan 22, 2007, 11:37 PM
 
I've got a MySQL question that I haven't been able to figure out for a few hours, so I thought I'd ask!

In a table of mine, I've got serial numbers in various formats, which may include hyphens and spaces. I'd like to match these fields even if a user enters the numbers/letters only, essentially asing MySQL to ignore the non-word characters throughout the field. Is this even possible? I've done some experimenting with the MySQL SELECT command with REGEXP at the end, but haven't quite been able to come up with a universal regex string. I'm still not sure yet if I'm even on the right path, or should be using a different MySQL command altogether.

Any suggestions would be greatly appreciated!
     
besson3c
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
Jan 23, 2007, 01:10 AM
 
Originally Posted by macgyvr64 View Post
I've got a MySQL question that I haven't been able to figure out for a few hours, so I thought I'd ask!

In a table of mine, I've got serial numbers in various formats, which may include hyphens and spaces. I'd like to match these fields even if a user enters the numbers/letters only, essentially asing MySQL to ignore the non-word characters throughout the field. Is this even possible? I've done some experimenting with the MySQL SELECT command with REGEXP at the end, but haven't quite been able to come up with a universal regex string. I'm still not sure yet if I'm even on the right path, or should be using a different MySQL command altogether.

Any suggestions would be greatly appreciated!

Sure...

Code:
<?php // create array of characters to replace $replace = array (' ', '-'); $query = "select ID, serial_numbers from yourtable"; $result = mysql_query($result); while ($row = mysql_fetch_row($result)) { // strip extra spaces and hypens $this_serial = str_replace($replace, '', $row[1]); $uquery = "update yourtable set serial_numbers = '$this_serial' where ID = '$row[0]'"; $uresult = mysql_query($uquery); } ?>
This function will clean up your data in that field, permitting you now to do the searches you need.

Are you hip to the MySQL "LIKE" clause? Might be useful in a case like this, although the data still needs to be predictable, which is what my little script will help with. If the data is already predictable, stringing together a few like causes would work just fine too.
( Last edited by besson3c; Jan 23, 2007 at 01:26 AM. )
     
macgyvr64  (op)
Grizzled Veteran
Join Date: Jun 2001
Status: Offline
Reply With Quote
Jan 23, 2007, 10:14 AM
 
Thanks for the help! I think that will work, but is there a way to let MySQL do the heavy lifting? i.e. Give MySQL some expression and have it return only the needed rows, instead of asking it for *all* the rows and filtering through them with PHP?
     
besson3c
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
Jan 23, 2007, 12:18 PM
 
Originally Posted by macgyvr64 View Post
Thanks for the help! I think that will work, but is there a way to let MySQL do the heavy lifting? i.e. Give MySQL some expression and have it return only the needed rows, instead of asking it for *all* the rows and filtering through them with PHP?
There would be, but only if the data was predictable. You could use the like clause in conjunction with a % for a wildcard, but the problem would be that if your serial number was:


0000-0000-0000


and your query was:

Code:
LIKE '%%%%-%%%%-%%%%'
then a serial number such as:

0000- 0000 - 0000

would mess things up. In general, these things become much easier when you have clean and predictable data to work with.
     
macgyvr64  (op)
Grizzled Veteran
Join Date: Jun 2001
Status: Offline
Reply With Quote
Jan 23, 2007, 09:20 PM
 
Hmm, ok. With a table of, say, 10K rows, would doing it the first way (arrays+PHP) incur much of a performance hit?
     
besson3c
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
Jan 23, 2007, 09:36 PM
 
Originally Posted by macgyvr64 View Post
Hmm, ok. With a table of, say, 10K rows, would doing it the first way (arrays+PHP) incur much of a performance hit?

A one time operation? No... We have a database of about 160,000 rows and we perform operations on it all the time, 10k should not be a big deal unless your server is really underpowered.

If this concerns you, you could also mysqldump the table to another machine, perform the operation there, and then restore from this table. Pretty easy to do...
     
macgyvr64  (op)
Grizzled Veteran
Join Date: Jun 2001
Status: Offline
Reply With Quote
Jan 23, 2007, 09:46 PM
 
That's good to know. This would be more than a one-time thing, as the data will be stored and searched in the database for who knows how long. There won't be more than a few searches a day, though, probably.

The server it will be running on is far from underpowered -- a dual 2.4 GHz Xeon with 4 GB of RAM and 15K SCSI drives :-D
     
Mithras
Professional Poster
Join Date: Oct 1999
Location: :ИOITAↃO⅃
Status: Offline
Reply With Quote
Jan 24, 2007, 11:49 AM
 
Can you clean up the data?
     
macgyvr64  (op)
Grizzled Veteran
Join Date: Jun 2001
Status: Offline
Reply With Quote
Jan 24, 2007, 08:59 PM
 
I could easily clean it up before it gets committed to the DB, but sometimes spacing and dashes are important. Ideally, it would be saved with dashes and be searchable with or without dashes.
     
   
Thread Tools
 
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Top
Privacy Policy
All times are GMT -4. The time now is 05:39 AM.
All contents of these forums © 1995-2017 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.8 © 2000-2017, Jelsoft Enterprises Ltd.,