This site uses ISO-8859-1 the new version in development uses UTF-8.
The following characters ├â┬Ñ├â┬Â├â┬ñ├â┬®├â┬¿├â┬½ give problems with that, they are used incidentally on this mesageboard. The problem seems to be limited to only the existing posts, new posts made with UTF-8 encoding seem to work fine. But I do like to know if the problem can be fixed for the existing posts and if there are any other problems with this encoding that I'm not aware of at this moment.
Do people (that need to) understand this, or praat ik poep?
Little forum problem. (Fnordia?)
Moderator: Moderators
- Tim Blokdijk
- Posts: 1242
- Joined: 29 May 2005, 11:18
Well, as I understand it at this moment..
The Apache <-> browser encoding is used as the encoding for the database, the database is not Unicode - but I think mysql just accepts it anyhow, values retrieved from the database are as they where put into the database.
Posts with ISO-8859-1 don't show up correct (where the encoding is different) with the new site as we now use UTF-8.
It would be the most simple explanation and it would mean the problem is relatively minor, but I'm far from sure on this and would love the input from someone that *knows* what's going on.
The Apache <-> browser encoding is used as the encoding for the database, the database is not Unicode - but I think mysql just accepts it anyhow, values retrieved from the database are as they where put into the database.
Posts with ISO-8859-1 don't show up correct (where the encoding is different) with the new site as we now use UTF-8.
It would be the most simple explanation and it would mean the problem is relatively minor, but I'm far from sure on this and would love the input from someone that *knows* what's going on.
If you have ISO-8859-1 in database and you read it and display the data in UTF-8, obviously characters wont be displayed correctly.
Now, how do you fix this? Convert the database data to UTF8.
If you got mixed content in the database, it might be a bit more tricky to convert but guess there's tools for that anyway. (aptitude install recode)
Now, how do you fix this? Convert the database data to UTF8.
If you got mixed content in the database, it might be a bit more tricky to convert but guess there's tools for that anyway. (aptitude install recode)
- Tim Blokdijk
- Posts: 1242
- Joined: 29 May 2005, 11:18
Yhea, but it's a ratter big *if* we have ISO-8859-1 in the database.heze wrote:If you have ISO-8859-1 in database and you read it and display the data in UTF-8, obviously characters wont be displayed correctly.
Now, how do you fix this? Convert the database data to UTF8.
If you got mixed content in the database, it might be a bit more tricky to convert but guess there's tools for that anyway. (aptitude install recode)
The phpbb tables are set to "latin1_swedish_ci" (that's the informal name for ISO 8859-1) but it works with both ISO-8859-1 and UTF-8, that means that *or* mysql just won't care about the shit you send to it (quite possible but I don't know for sure) *or* phpbb uses its own technique to store Unicode in a "latin1_swedish_ci" encoding (that also seems to depend on the Apache <-> browser encoding).
Another point is that the board is actually supporting Unicode (you can use characters outside of the Latin1 coverage) but I have no idea how exactly as everything I can see is set to ISO-8859-1. Anyway the real problem is that I'm guessing and I don't know for sure.
I can convert it all to UTF-8 in the database, I just don't know if phpbb supports that.
Right now we don't have mixed encodings in the database as the current (old) site uses only ISO-8859-1. This changes the moment we use the new site as then things go to UTF-8. So I like to understand the issue and its implications before the change.
I never really had to deal with legacy data like this, I always use UTF-8 for everything.
- PauloMorfeo
- Posts: 2004
- Joined: 15 Dec 2004, 20:53
I think you can try using the encoding ISO-8859-15 instead of ISO-8859-1. The "15" is an advancement to the "1" one and is more specific for what people need, now, since it has, for example, encoding for ├óÔÇÜ┬¼, which is not present in the "1" one.
At least in IIS under .NET, WTF-8 gives the same results as ISO-8859-1.
At least in IIS under .NET, WTF-8 gives the same results as ISO-8859-1.
- Tim Blokdijk
- Posts: 1242
- Joined: 29 May 2005, 11:18
I have been reading up on the issue and some practical info can be found here http://www.phpbb.com/community/viewtopic.php?t=246070
I still don't understand it 100% but it's a common issue. We have to convert the db content to Unicode with the switch to the new site if we like to preserve the existing posts with characters like ├â┬Ñ├â┬Â├â┬ñ├â┬®├â┬¿├â┬½.
I still don't understand it 100% but it's a common issue. We have to convert the db content to Unicode with the switch to the new site if we like to preserve the existing posts with characters like ├â┬Ñ├â┬Â├â┬ñ├â┬®├â┬¿├â┬½.