Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Developer Center > UTF-8 Characters

UTF-8 Characters
Thread Tools
torsoboy
Mac Elite
Join Date: Mar 2003
Status: Offline
Reply With Quote
Oct 4, 2010, 10:25 AM
 
I am trying to figure out why some characters are not showing up on our website, but they show up fine in Wikipedia. According to the HTML source, we both have our charset set to UTF-8, but ours comes out looking like this:


and theirs comes out looking correct.

Here are the two pages:
Signup | Sell More Prints | Instaproofs.com (click on the State/Province droplist and scroll down to the Sweden area)
Counties of Sweden - Wikipedia, the free encyclopedia

Any ideas?
     
andi*pandi
Moderator
Join Date: Jun 2000
Location: inside 128, north of 90
Status: Offline
Reply With Quote
Oct 4, 2010, 11:00 AM
 
Hmm, this is what the source of your pulldown is showing...

<option value="733" >Esp�rito Santo</option>
<option value="735" >Goi�s</option>
<option value="737" >Maranh�o</option>

And attempts to validate say that

" Sorry, I am unable to validate this document because on line 910 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.

The error was: utf8 "\xD6" does not map to Unicode "

Did you lose the characters in copy/paste? Does your local source look ok?
     
besson3c
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
Oct 4, 2010, 11:33 AM
 
By any chance did your pages end up this way after transferring database contents via a copy/paste? Are you using a CMS of some sort?
     
torsoboy  (op)
Mac Elite
Join Date: Mar 2003
Status: Offline
Reply With Quote
Oct 4, 2010, 12:59 PM
 
I did a copy/paste of the names through phpMyAdmin. I wonder if the characters got lost from there somehow. It looked correct when I pasted them, but maybe it saves them wrong.
     
torsoboy  (op)
Mac Elite
Join Date: Mar 2003
Status: Offline
Reply With Quote
Oct 4, 2010, 01:01 PM
 
Originally Posted by torsoboy View Post
I did a copy/paste of the names through phpMyAdmin. I wonder if the characters got lost from there somehow. It looked correct when I pasted them, but maybe it saves them wrong.
Nope, they display correctly in the database. When I set the charset to ISO-8859-1 they seem to display correctly, but according to the things I have read, UTF should display them as well.
     
besson3c
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
Oct 4, 2010, 01:06 PM
 
Originally Posted by torsoboy View Post
I did a copy/paste of the names through phpMyAdmin. I wonder if the characters got lost from there somehow. It looked correct when I pasted them, but maybe it saves them wrong.
Aha! That's actually *exactly* what I thought you did, cause I've run into that same problem many times myself and spent a lot of time wrestling with this...

I haven't found a workaround for preserving these characters properly in phpMyAdmin while respecting MySQL character set, which is why I use mysqldump:

mysqldump -u username -p -h yourserver --default-character-set=utf8 yourdb > dumpfile.sql
     
Oisín
Moderator Emeritus
Join Date: Mar 2004
Location: Copenhagen
Status: Offline
Reply With Quote
Oct 4, 2010, 03:25 PM
 
In phpMyAdmin, make sure that both the individual fields where the county names are stored, as well as the entire table and database (if possible) are set to utf8_unicode_ci as the collation encoding. Basically, set anything you can to utf8_unicode_ci (unless of course you have other stuff in there that you rely on to be ISO-8859-1 or some other encoding!).

That won’t necessarily work, though … it doesn’t on my host, for example. I’ve basically had to just resort to using ISO-8859-1. :/


P.S.: Any particular reason for wanting to have all those counties there? For most European countries, they’re completely unnecessary and rarely used. They’d never be used as part of an address in Sweden, for example—only street name, house number (plus floor/door/room number, of course), postal code, and town name would be used.
     
besson3c
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
Oct 4, 2010, 03:29 PM
 
Originally Posted by Oisín View Post
In phpMyAdmin, make sure that both the individual fields where the county names are stored, as well as the entire table and database (if possible) are set to utf8_unicode_ci as the collation encoding. Basically, set anything you can to utf8_unicode_ci (unless of course you have other stuff in there that you rely on to be ISO-8859-1 or some other encoding!).

That won’t necessarily work, though … it doesn’t on my host, for example. I’ve basically had to just resort to using ISO-8859-1. :/


P.S.: Any particular reason for wanting to have all those counties there? For most European countries, they’re completely unnecessary and rarely used. They’d never be used as part of an address in Sweden, for example—only street name, house number (plus floor/door/room number, of course), postal code, and town name would be used.

It won't work to *convert* fields from one format to another, I don't think. I believe the default MySQL collation is still latin-swedish_ci, so in most cases these tables won't be in UTF8. mysqldump should handle the conversion to UTF8 though as described above. A lot of web apps (such as WordPress) seem to work best in UTF8, so going UTF8 is a good practice. Using mysqldump rather than cutting and pasting from phpMyAdmin, likewise.
     
Oisín
Moderator Emeritus
Join Date: Mar 2004
Location: Copenhagen
Status: Offline
Reply With Quote
Oct 4, 2010, 03:54 PM
 
Originally Posted by besson3c View Post
It won't work to *convert* fields from one format to another, I don't think. I believe the default MySQL collation is still latin-swedish_ci, so in most cases these tables won't be in UTF8. mysqldump should handle the conversion to UTF8 though as described above. A lot of web apps (such as WordPress) seem to work best in UTF8, so going UTF8 is a good practice. Using mysqldump rather than cutting and pasting from phpMyAdmin, likewise.
It doesn’t necessarily have to convert the fields—the database should be created as UTF-8 from the start, preferably. And different hosts have different ways of doing things. Dreamhost (my host), for example, have UTF-8 as the default character set.

In fact, I have absolutely everything set up as UTF-8 (database charset, collation charset, PHP charset, HTML charset, even the PHP file is saved as a UTF-8-encoded file), and it still needs to be ISO-8859-1 to display special characters properly.
     
besson3c
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
Oct 4, 2010, 04:06 PM
 
Originally Posted by Oisín View Post
It doesn’t necessarily have to convert the fields—the database should be created as UTF-8 from the start, preferably. And different hosts have different ways of doing things. Dreamhost (my host), for example, have UTF-8 as the default character set.

In fact, I have absolutely everything set up as UTF-8 (database charset, collation charset, PHP charset, HTML charset, even the PHP file is saved as a UTF-8-encoded file), and it still needs to be ISO-8859-1 to display special characters properly.

This is fine if your DB was created as UTF8, like I said. However, the default (last I checked) is not UTF8, so a lot of web hosts will probably not create UTF8 databases and tables by default.
     
Oisín
Moderator Emeritus
Join Date: Mar 2004
Location: Copenhagen
Status: Offline
Reply With Quote
Oct 4, 2010, 06:43 PM
 
Looking through this again now, I realise I’m a complete dunderhead, and my previous two posts are completely inaccurate.

In the project I’m currently working on, I set up my MySQL database connection in a config file, which is then included into the scripts that need to query the database. Earlier in the thread, when I was testing for solutions to make this work, I tweaked a lot of little things in this config file.

What I failed to notice was that the script I was using to test this was, apparently, an old one, from before I made the config file. So the config file wasn’t included, and the script made its own database connection from scratch. Which is obviously why nothing I tried earlier on seemed to make one iota of difference.

Once I fixed this, though, and made sure this script worked just like any of the other scripts, UTF-8 suddenly just works here, too.


To the OP:

Once you’re certain everything in your database is UTF-8 (preferably utf8_unicode_ci, though utf8_general_ci will work, too) and everything in your scripts (I’m assuming you’re using PHP here) is also UTF-8, you need to make sure that your database connection is UTF-8, too. Otherwise you get the result you’ve been experiencing here.

Depending on which flavour of MySQL functions you’re using (and whether you’re using the object-oriented or the procedural style), there are different functions to do that. Since I’m using MySQLi, object-oriented style, the solution for me was simply to add the following right after establishing the database connection in the config file ($db being the database connection object):

code:
$db->set_charset('utf8');


Once that was done, everything worked perfectly.

(Note that UTF-8 is usually written with a hyphen, but in MySQL, it’s called utf8, rather than utf-8)
     
besson3c
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
Oct 4, 2010, 06:54 PM
 
It's not just a matter of "making sure everything is UTF-8" though, Oisin. If it isn't, you can't just set the database and tables to UTF-8 and expect everything to work, I'm pretty sure it won't. The data has to be recreated or somehow converted. phpMyAdmin won't do this, and therefore none of your above instructions are going to work.

So, the first thing is to find out what the database and tables are now, and if they aren't UTF-8, use mysqldump and the CLI MySQL binary to transfer this data from source to destination.
     
torsoboy  (op)
Mac Elite
Join Date: Mar 2003
Status: Offline
Reply With Quote
Oct 4, 2010, 07:10 PM
 
Originally Posted by Oisín View Post
P.S.: Any particular reason for wanting to have all those counties there? For most European countries, they’re completely unnecessary and rarely used. They’d never be used as part of an address in Sweden, for example—only street name, house number (plus floor/door/room number, of course), postal code, and town name would be used.
We actually only add countries to the list when they are requested by photographers. The states/provinces/counties is extra information for us, and since we are primarily a US business we use them in our own documents.

I will switch the db table to UTF-8 and see what happens. Thanks guys!
     
Oisín
Moderator Emeritus
Join Date: Mar 2004
Location: Copenhagen
Status: Offline
Reply With Quote
Oct 4, 2010, 07:11 PM
 
It's not just a matter of "making sure everything is UTF-8" though, Oisin. If it isn't, you can't just set the database and tables to UTF-8 and expect everything to work, I'm pretty sure it won't.
I don’t have an install of MySQL handy that doesn’t have UTF-8 as the default character set, so I can’t test this. The fact that my almost freshly installed vanilla version of MySQL on my localhost is defaulting to UTF-8 seems to indicate that they may have switched to having UTF-8 the standard now.
     
besson3c
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
Oct 4, 2010, 07:18 PM
 
Originally Posted by torsoboy View Post
We actually only add countries to the list when they are requested by photographers. The states/provinces/counties is extra information for us, and since we are primarily a US business we use them in our own documents.

I will switch the db table to UTF-8 and see what happens. Thanks guys!

Please let us know what happens!
     
besson3c
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
Oct 4, 2010, 07:24 PM
 
Originally Posted by Oisín View Post
I don’t have an install of MySQL handy that doesn’t have UTF-8 as the default character set, so I can’t test this. The fact that my almost freshly installed vanilla version of MySQL on my localhost is defaulting to UTF-8 seems to indicate that they may have switched to having UTF-8 the standard now.
You can determine this by issuing the following query (the following are my results):

mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /mypath/usrlocal/share/mysql/charsets/ |
+--------------------------+----------------------------------------------------+


In my case the character_set_server is being set in my my.cnf file. This makes me think that the MySQL client in 5.0.90 (the version I'm running) defaults to latin1, and therefore phpMyAdmin will respect this setting.

Maybe I should try putting in character_set_results and the other variables into my.cfg for [mysql] as an override.
     
Oisín
Moderator Emeritus
Join Date: Mar 2004
Location: Copenhagen
Status: Offline
Reply With Quote
Oct 4, 2010, 07:35 PM
 
But if those values are the defaults, then there shouldn’t be a problem, should there? Only character_set_database controls what character sets newly created databases and tables will have as default, no? The rest (or the rest of the ones that are relevant) are all alterable when accessing the database with no need to reencode or convert the data already in the tables … aren’t they?
     
besson3c
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
Oct 4, 2010, 07:42 PM
 
Originally Posted by Oisín View Post
But if those values are the defaults, then there shouldn’t be a problem, should there? Only character_set_database controls what character sets newly created databases and tables will have as default, no? The rest (or the rest of the ones that are relevant) are all alterable when accessing the database with no need to reencode or convert the data already in the tables … aren’t they?

I don't think so. MySQL has a server and client component. I believe the server is just responsible for responding to and processing queries while the client is responsible for communicating with the server. I believe that no matter whether you are using PHP as an web server module or PHP-CGI that all this component does is provide a MySQL client which either uses the actual MySQL client libraries or mimics them via some sort of API.

In the my.cnf file you can see that there are variables for all sorts of MySQL daemons, for example (from the file):

[mysqld]
.. settings here

[mysqldump]
.. settings here

[mysql]
.. settings here

[isamchk]
.. settings here

etc.
These settings allow you to override the system defaults that are set by MySQL at runtime - the shipped defaults. There are separate defaults/variables for both the client and server. Therefore, all of the client character related variables must match the server vars.

I'm pretty sure that if you were to access your Dreamhost my.cnf file that you would find some of these variables. Either that, or Dreamhost is running a newer version of MySQL where the defaults have changed. The 5.1.x and 5.5.x branches are all newer than mine, so this is entirely possible.
     
torsoboy  (op)
Mac Elite
Join Date: Mar 2003
Status: Offline
Reply With Quote
Oct 5, 2010, 03:18 AM
 
Originally Posted by besson3c View Post
Please let us know what happens!
No luck I created a new table for the states and made it utf8, and I also made sure the columns were also set to utf (utf_general_ci, to be exact). I then copied the value from Wikipedia into that table, and tested the results. The results were the same as before.

I may try putting the values in through a php query instead of doing it through phpMyAdmin and see if that makes an difference...
     
torsoboy  (op)
Mac Elite
Join Date: Mar 2003
Status: Offline
Reply With Quote
Oct 5, 2010, 03:23 AM
 
That had no effect either. I don't get it
     
Oisín
Moderator Emeritus
Join Date: Mar 2004
Location: Copenhagen
Status: Offline
Reply With Quote
Oct 5, 2010, 05:13 AM
 
I may try putting the values in through a php query instead of doing it through phpMyAdmin and see if that makes an difference...
Wait, hang on … how were you getting the values in before “through phpMyAdmin”? I’d been assuming you generated the drop-down menu via a database call in PHP all along—didn’t you?

And how does your table look? Can you post a screendump of the phpMyAdmin view or something like that?
     
torsoboy  (op)
Mac Elite
Join Date: Mar 2003
Status: Offline
Reply With Quote
Oct 5, 2010, 10:32 AM
 
Originally Posted by Oisín View Post
Wait, hang on … how were you getting the values in before “through phpMyAdmin”? I’d been assuming you generated the drop-down menu via a database call in PHP all along—didn’t you?

And how does your table look? Can you post a screendump of the phpMyAdmin view or something like that?
The way I originally did it was I clicked on the "Insert" tab in phpMyAdmin, and filled in the field values from there. The new way I tried was I wrote a php/mysql script on our server and did a normal INSERT INTO xxx query.

The dropdown is generated via a database call through PHP.

Here is how the table contents look directly in phpMyAdmin (which also has its charset set to UTF-8):


I may just need to set the page to ISO-8859-1 and be done with it. I just don't get it.
     
Oisín
Moderator Emeritus
Join Date: Mar 2004
Location: Copenhagen
Status: Offline
Reply With Quote
Oct 5, 2010, 10:36 AM
 
The way I originally did it was I clicked on the "Insert" tab in phpMyAdmin, and filled in the field values from there. The new way I tried was I wrote a php/mysql script on our server and did a normal INSERT INTO xxx query.
Ah, I misunderstood you. You were talking about inserting the values into the database—I thought you were talking about getting them into the script, hence the puzzlement.

Did you try setting the database connection character set? That’s what made the difference for me. What’s the code snippet (minus your real database host/name/user/password, of course!) where you initially connect to the database, and where you execute the SELECT query?
     
torsoboy  (op)
Mac Elite
Join Date: Mar 2003
Status: Offline
Reply With Quote
Oct 5, 2010, 11:16 AM
 
I just found something that seems to work

After establishing the connection, run this query first:

SET NAME 'utf8'

Even though you may be saving the information in utf8, the values returned are not always in that same character set. Using SET NAME fixes that. Here is some more info on it (from here: http://www.adviesenzo.nl/examples/ph...charset_fix/):

A SET NAMES 'x' statement is equivalent to these three statements:
SET character_set_client = x;
SET character_set_results = x;
SET character_set_connection = x;

Setting character_set_connection to x also sets collation_connection to the default collation for x. To specify one of the character set's collations explicitly, use the optional COLLATE clause:

SET NAMES 'charset_name' COLLATE 'collation_name'

The connection string I use is pretty basic:
mysql_connect($server['host'], $server['username'], $server['password']);

Is there a way to set the character set directly from there?
     
Oisín
Moderator Emeritus
Join Date: Mar 2004
Location: Copenhagen
Status: Offline
Reply With Quote
Oct 5, 2010, 11:28 AM
 
Yes, that’s exactly what SET NAMES does—except it’s deprecated (according to php.net) to do it in a MySQL query.

Since you’re using the old mysql flavour functions in PHP, procedurally, the more updated and future-proof way of doing it would be this, directly in PHP:
code:
mysql_connect($server['host'], $server['username'], $server['password']);
mysql_set_charset('utf8');

Only caveat is that this requires that you have MySQL 5.0.7 or newer—otherwise, you have no choice but to use SET NAMES.
     
andi*pandi
Moderator
Join Date: Jun 2000
Location: inside 128, north of 90
Status: Offline
Reply With Quote
Oct 5, 2010, 11:51 AM
 
This thread is nominated for awesome working togetherness problem-solving.
     
torsoboy  (op)
Mac Elite
Join Date: Mar 2003
Status: Offline
Reply With Quote
Oct 5, 2010, 12:04 PM
 
Originally Posted by Oisín View Post
Yes, that’s exactly what SET NAMES does—except it’s deprecated (according to php.net) to do it in a MySQL query.

Since you’re using the old mysql flavour functions in PHP, procedurally, the more updated and future-proof way of doing it would be this, directly in PHP:
code:
mysql_connect($server['host'], $server['username'], $server['password']);
mysql_set_charset('utf8');

Only caveat is that this requires that you have MySQL 5.0.7 or newer—otherwise, you have no choice but to use SET NAMES.
Super; I will give that a shot. Thanks!
     
Oisín
Moderator Emeritus
Join Date: Mar 2004
Location: Copenhagen
Status: Offline
Reply With Quote
Oct 5, 2010, 12:09 PM
 
Forgot to mention:

If you’ve got more than this one MySQL connection open at the same time here, you’ll have to specify which one you want to have set to UTF-8, like so:
code:
$connection = mysql_connect($server['host'], $server['username'], $server['password']);
mysql_set_charset('utf8', $connection);
     
besson3c
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
Oct 5, 2010, 12:30 PM
 
torsoboy: if you ever decide to delve into a PHP framework (CakePHP, CodeIgniter, etc.) you should find that most, if not all of these set everything to UTF8 for you so that you don't have to worry about this kind of stuff.

That being said, I may not have my server configured correctly, I haven't had much time to fuss around with it, but you may still have to use mysqldump if you ever want to transfer data from one server to another while retaining your UTF characters. This is what I originally thought you were doing before I realized that you were grabbing this data from the Wikipedia...
     
   
Thread Tools
 
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Top
Privacy Policy
All times are GMT -4. The time now is 01:54 AM.
All contents of these forums © 1995-2017 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.8 © 2000-2017, Jelsoft Enterprises Ltd.,