Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A while back, Firefox removed the ability to override the text encoding of pages. If Firefox gets the encoding wrong, you are screwed. The thing is, according to the developers themselves, detecting various single-byte Latin encodings is very unreliable, and they have indicated that making Firefox detect one encoding more reliably nearly always means introducing failures at detecting another one, so there will be no progress in this area. Despite this, they have seen it fit to remove the encoding menu, because telemetry shows that most users don’t use it, and because when they do, it often takes them multiple tries to guess the correct encoding. Their solution? To replace it with a “guess again” button, completely removing the ability to choose manually. Henri, one of the developers responsible for this change, has argued that as long as both encodings are Latin-script encodings with a common ASCII subset, it is not catastrophic for the user to be stuck with the wrong encoding, because the text is likely “still legible”. To give an example of what he is calling an “acceptable” level of mojibake, consider this text from the Polish Wikipedia, encoded as Latin-2 and decoded as Latin-1.

> Hamidiye turecki kr±¿ownik pancernopok³adowy z pocz±tku XX wieku, wodowany w 1903 roku, zbudowany w brytyjskiej stoczni Armstronga. Wyporno¶æ normalna okrêtu wynosi³a 3904 tony, a d³ugo¶æ siêga³a 112 metrów. Napêd stanowi³y maszyny parowe o mocy 12 000 KM, pozwalaj±ce na osi±ganie maksymalnej prêdko¶ci 22 wêz³y. Artyleria g³ówna sk³ada³a siê z dwóch pojedynczych dzia³ kalibru 152 mm i o¶miu dzia³ kalibru 120 mm. S³u¿y³ w marynarce Imperium Osmañskiego podczas wojen ba³kañskich oraz I wojny ¶wiatowej, a nastêpnie w marynarce Republiki Turcji do 1947 roku.

Personally, I think this change is ridiculous. It has been a few versions already since it was rolled out, but people are still complaining in the issue tracker. I know one Russian guy who has resorted to using an extension which replaces arbitrary strings in order to correct common mojibake sequences in order to deal with the regression in functionality brought about by this change.

If you feel strongly about this like I do, I strongly encourage you to comment in the Bugzilla thread. Yes, the web should be using Unicode these days, but if it isn’t, that is not your fault as a user, and making the experience miserable for the end user is not justifiable.



> Hamidiye turecki kr±¿ownik pancernopok³adowy z pocz±tku XX wieku, wodowany w 1903 roku, zbudowany w brytyjskiej stoczni Armstronga. Wyporno¶æ normalna okrêtu wynosi³a 3904 tony, a d³ugo¶æ siêga³a 112 metrów. Napêd stanowi³y maszyny parowe o mocy 12 000 KM, pozwalaj±ce na osi±ganie maksymalnej prêdko¶ci 22 wêz³y. Artyleria g³ówna sk³ada³a siê z dwóch pojedynczych dzia³ kalibru 152 mm i o¶miu dzia³ kalibru 120 mm. S³u¿y³ w marynarce Imperium Osmañskiego podczas wojen ba³kañskich oraz I wojny ¶wiatowej, a nastêpnie w marynarce Republiki Turcji do 1947 roku.

As a Polish person who has been seeing this sort of mis-encoded Polish text for over two decades now, my gut instinct is to immediately reach for the encoding menu. That menu is gone now.

We live in the era of almost omnipresent UTF-8, but it simply feels wrong to remove backwards compatibility with older documents on the Polish web that are mis-encoded like that - and there are still some of them out there.


Would it be impossible to put this in an extension?

What is the compatibility worth if people have to manually figure out how to switch it on?


I made one after seeing this discussion: https://addons.mozilla.org/en-US/firefox/addon/override-text...

Feedback & bug reports welcome!


yep, as someone from the balkans with old sites and iso8859-2 vs cp1250, I used this feature sometimes too... well.. I guess not anymore.


I wonder how possible it would be to make an extension


It turned out to be pretty easy; I hacked one up in a few hours and got it published on AMO: https://addons.mozilla.org/en-US/firefox/addon/override-text...

It's open-sourced on GitHub in case you're curious. It just overrides the Content-Type header on pages where you've turned it on, setting the `charset` parameter to whatever you select.


After the submenu was replaced with a single item in August, how many times have you 1) encountered Polish mojibake and 2) the single item wasn't able to fix it?


I seldom use this feature, but when I do and I can't find the encoding menu, then it will be a very frustrating experience. Their logic seems to be like dialing emergency service is a rarely use feature based on telemetry, so we will just remove that. No discussion is allowed, if you challenged, then they ask you about do you have data about the use case of dialing emergency service? People can still send text to emergency service, it is fine.

And because of this post, I just checked if chromium supported encoding menu. Yep, they removed it long ago. Basically, when you need to read some very old Japanese site where their encoding is pretty non standard, you are screwed.


Per https://hsivonen.fi/encoding-telemetry/, it looks like they run a special encoding detector on .jp sites, which is presumably designed specifically to choose between the various Japanese encodings (Shift-JIS, EUC-JP, etc).

I think your example of emergency services is a bit hyperbolic. This is a feature that is really not often used, whose omission is not fatal, which often requires several tries to get right, and which is increasingly less useful thanks to gradual changes on the Web. Much more widely-used features like FTP and Flash have been deprecated; people howl and yell every time, but yet things still seem to work.


Of course, it is not about the fatality. It is an example to point out that a rarely used feature does not mean it is not important.

Another example is Google search remove the ability to perform exact text search such as "some phrase I found important to search". Maybe based on statistic, it was 20% of user using that, 80% of users does not rely on this feature. The logic is saying messing up 20% of user is not a problem. We are serving the 80% of users pretty great.

The current paradigm of UX design for providing "good default" to serve 80% of users and removing customization to screw 20% of advanced user makes me pretty helpless for using modern app. That's why I prefer cli nowadays for the flexibility of those software.


So now you have to rely on hardcoded heuristics based on the website’s domain name instead of just being able to choose. There are so many complicated heuristics to choose this hard-to-predict factor which it still gets wrong often. Removing the user’s control over this seems like a dreadful move.

> This is a feature that is really not often used, whose omission is not fatal, which often requires several tries to get right, and which is increasingly less useful thanks to gradual changes on the Web.

How often do you visit small indie websites in non-English languages? Because in my experience, as soon as you do, this is not a rare occurrence.


It breaks things without possibility of repair for a certain group of users. How is this not an issue?


> Basically, when you need to read some very old Japanese site where their encoding is pretty non standard, you are screwed.

As it happens, the primary reason why the single menu item remains instead of an override being totally gone is usage in Japan. (Many people in this thread comment as if Firefox had done what Chrome did: complete removal of override as you note.) The remaining menu item is pretty accurate for its primary use case, which is dealing with Japanese legacy sites that misdeclare their encoding. (There is one exception: If the page declares UTF-8 but is actually ISO-2022-JP, then you don't have recourse.)


> when you need to read some very old Japanese site where their encoding is pretty non standard, you are screwed

If you need to read them frequently, yes. Otherwise, if this is just one-off, maybe user can download the page source, extract text content, paste into text editor and select encoding? Do you have an example CJK site I may try?


I rarely need to do that, so I cannot find an example for the moment. It is usually a one-off thing, then yes I could do the conversion manually, but it was obviously an usability degraded.


It seems he did extensive background research to investigate and find optimal solution for this issue: https://hsivonen.fi/chardetng/

Goal was apparently to bring FF to feature parity with Chrome. Sad to see it's not working as well as intended. (Even sadder to see his person being attacked by some commenters below.)


The problem is that it's impossible to guess correctly 100% of the time. You can improve the guessing algorithm as much as you want to (not that I'm saying it's useless - it's very good for regular users who only rarely encounter such issues), but you can never achieve total accuracy, which is why some sort of manual override is absolutely necessary in cases when it fails.


I agree that good detection is important and I am glad that he is working on it, but I don’t think that the level of detection which is possible is a good replacement for the menu, based on what he himself has indicated he thinks is possible to achieve in Firefox.


Extensive maybe, correct apparently not so much.


Too late, code merged. Bugzilla locked. You the user do not matter really to Firefox anymore.

If the encoding is broken, it cannot be fixed by Henri Sivonen auto-detection code, but he can claim he "simplified the menus" as a bullet point to project managers.

This is the gradual but inevitable backslide of Firefox into Chrome.


Chrome has marketshare.


Simplifying a submenu by replacing it with a quick option that works in most cases is a great improvement, but removing the underlying feature that lets users select the text encoding manually is bad. There must be a good way to keep the menu bar simple while keeping the option.

Firefox already shows page-specific info in the left side of the address bar. There's usually a shield icon and a lock icon, and permission requests appear there too. Why not add an "encoding icon" that shows whenever the encoding repair feature is active? Such a marker would also be a great place to put a "override with custom encoding" menu.


The quick option was already there before though - it was labeled "automatic" before. The only thing he did was remove all other options.


Good luck but I am not hopeful Firefox's behavior will be corrected. See also this 13 year old pearl courtesy of Bugzilla:

"Set screen coordinates during HTML5 drag event"

> The current HTML5 spec describes that all DragEvent properties should be available during all the events - according to editor Ian Hickson.

>> Note though that it doesn't specify what the properties should be set to, just that they should be set and we currently set them to 0.

https://bugzilla.mozilla.org/show_bug.cgi?id=505521


Seems like they restricted the comments, because, you know, head in the sand is the standard Mozilla approach.


Or, maybe, hear me out here: deliberately posting a link to an issue to a widely-visited community with an inflammatory title, with the explicit intent of dogpiling the maintainer led to them locking the thread? You know, something like what HN itself does to prevent vote manipulation?


Your example is not an example of actual Firefox failure mode: If I encode your text as ISO-8859-1 and run chardetng on the bytes, it says ISO-8859-2. It’s generally unlikely that chardetng would misdetect ISO-8859-2 Polish as windows-1252.

When you claim that something isn’t working for you (and it’s completely plausible that you have encountered a case where chardetng doesn’t work for you), it would be polite to post the actual failure and not something that’s not an actual example of failure.


[flagged]


This person makes a decision supported by reasonable evidence and argumentation, confirmed through discussion with other members of the organization and wider community, and shipped after a thorough review (all happening in the open on a bug tracker!); and you're calling for them to be fired?

I sure hope you don't make any mistakes at your company!


> all happening in the open on a bug tracker!

I find it suspicious that there was not a single comment from a user before the change shipped - but immediately after, there were various complaints. That's not the first time I see this pattern in a bug.

This looks more like the "Hitchhiker's Guide to the Galaxy" definition of "open" - where the discussion is technically public, but you heavily rely on the fact that no one knows it exists.


That's a good point. The removal of such user-critical features should at the very least be announced well in advance via a clear deprecation notice, to allow for the collection of relevant feedback.


> user-critical features

They didn't remove JavaScript or the tabs. Very few users will know it's gone - I wouldn't have noticed.


If the page becomes unreadable I would say that it's critical.


How could Mozilla publicize every change they make? And then discuss them all?


By submitting every new Bugzilla ticket as a new HN post, of course!


Do other browsers offer character encoding menus or are they from improper companies? (Or were the developers fired but their changes kept?)


He "simplified the menus", remaining broken old web pages be damned. Mozilla's goal is to be as user hostile as Chrome.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: