Little forum problem. (Fnordia?)

Little forum problem. (Fnordia?)

Discuss the source code and development of Spring Engine in general from a technical point of view. Patches go here too.

Moderator: Moderators

Post Reply
User avatar
Tim Blokdijk
Posts: 1242
Joined: 29 May 2005, 11:18

Little forum problem. (Fnordia?)

Post by Tim Blokdijk »

This site uses ISO-8859-1 the new version in development uses UTF-8.
The following characters ├â┬Ñ├â┬Â├â┬ñ├â┬®├â┬¿├â┬½ give problems with that, they are used incidentally on this mesageboard. The problem seems to be limited to only the existing posts, new posts made with UTF-8 encoding seem to work fine. But I do like to know if the problem can be fixed for the existing posts and if there are any other problems with this encoding that I'm not aware of at this moment.

Do people (that need to) understand this, or praat ik poep?
User avatar
Zenka
Posts: 1235
Joined: 05 Oct 2005, 15:29

Post by Zenka »

I'm not one of those who need to understand it. but at least I do, even the last remark ;)

is there a way to just convert the characters to something the UTF-8 gladly acceps?
User avatar
Tim Blokdijk
Posts: 1242
Joined: 29 May 2005, 11:18

Post by Tim Blokdijk »

Well, as I understand it at this moment..
The Apache <-> browser encoding is used as the encoding for the database, the database is not Unicode - but I think mysql just accepts it anyhow, values retrieved from the database are as they where put into the database.
Posts with ISO-8859-1 don't show up correct (where the encoding is different) with the new site as we now use UTF-8.
It would be the most simple explanation and it would mean the problem is relatively minor, but I'm far from sure on this and would love the input from someone that *knows* what's going on.
heze
Posts: 38
Joined: 28 Apr 2005, 23:32

Post by heze »

If you have ISO-8859-1 in database and you read it and display the data in UTF-8, obviously characters wont be displayed correctly.

Now, how do you fix this? Convert the database data to UTF8.

If you got mixed content in the database, it might be a bit more tricky to convert but guess there's tools for that anyway. (aptitude install recode)
User avatar
Tim Blokdijk
Posts: 1242
Joined: 29 May 2005, 11:18

Post by Tim Blokdijk »

heze wrote:If you have ISO-8859-1 in database and you read it and display the data in UTF-8, obviously characters wont be displayed correctly.

Now, how do you fix this? Convert the database data to UTF8.

If you got mixed content in the database, it might be a bit more tricky to convert but guess there's tools for that anyway. (aptitude install recode)
Yhea, but it's a ratter big *if* we have ISO-8859-1 in the database.
The phpbb tables are set to "latin1_swedish_ci" (that's the informal name for ISO 8859-1) but it works with both ISO-8859-1 and UTF-8, that means that *or* mysql just won't care about the shit you send to it (quite possible but I don't know for sure) *or* phpbb uses its own technique to store Unicode in a "latin1_swedish_ci" encoding (that also seems to depend on the Apache <-> browser encoding).
Another point is that the board is actually supporting Unicode (you can use characters outside of the Latin1 coverage) but I have no idea how exactly as everything I can see is set to ISO-8859-1. Anyway the real problem is that I'm guessing and I don't know for sure.

I can convert it all to UTF-8 in the database, I just don't know if phpbb supports that.
Right now we don't have mixed encodings in the database as the current (old) site uses only ISO-8859-1. This changes the moment we use the new site as then things go to UTF-8. So I like to understand the issue and its implications before the change.

I never really had to deal with legacy data like this, I always use UTF-8 for everything.
User avatar
PauloMorfeo
Posts: 2004
Joined: 15 Dec 2004, 20:53

Post by PauloMorfeo »

I think you can try using the encoding ISO-8859-15 instead of ISO-8859-1. The "15" is an advancement to the "1" one and is more specific for what people need, now, since it has, for example, encoding for ├óÔÇÜ┬¼, which is not present in the "1" one.

At least in IIS under .NET, WTF-8 gives the same results as ISO-8859-1.
User avatar
Tim Blokdijk
Posts: 1242
Joined: 29 May 2005, 11:18

Post by Tim Blokdijk »

I have been reading up on the issue and some practical info can be found here http://www.phpbb.com/community/viewtopic.php?t=246070
I still don't understand it 100% but it's a common issue. We have to convert the db content to Unicode with the switch to the new site if we like to preserve the existing posts with characters like ├â┬Ñ├â┬Â├â┬ñ├â┬®├â┬¿├â┬½.
Post Reply

Return to “Engine”