I'm working on some XSLT transforms with PHP. Sablotron doesn't seem to compile well on OSX, so I can't test my transforms directly with PHP. Instead I use TestXSLT, a small program that also uses Sablotron, though for the life of me I haven't a clue how he got it working. Search for it on VersionTracker.
Anyway, one of my pages contains some Japanese text. Here's a sample string: 夢の中得 (in HTML/XML entities that's 夢の中得 ; just in case UBB doesn't let the Japanese text through).
I need to get this text through the XSLT transform intact. Sablotron can't handle it when the text is encoded "natively" in UTF-8. But when I use HTML entities, it gets processed by Sablotron's parser and turned into garbage (and it's UTF-8 encoded garbage, so no amount of browser View-menu trickage will work). If I try to get cute and encode the ampersands as an entity (so Sablotron will see those and pass them through), it encodes them into an HTML entity instead, so I still get messed up.
This is driving me nuts. All I want to do is get some Japanese text into the parser and have it come out the other side unchanged. Several other XSLT processors can do this, but because I'm using PHP for this I have to use Sablotron.
Alternatively, if someone can tell me how to get Sablotron to compile on OSX I may be able to tinker with this better; it's entirely possible that something is wrong with TestXSLT's build of it.