When I worked at Mozilla I was on the MDN (developer.mozilla.org) team, and we had this inexplicable bug: articles can be categorized with tags, and both articles and tags are localizable for all the languages MDN supports. So, for example, English reference articles on CSS properties were tagged "CSS Reference", while French reference articles on CSS properties were tagged "CSS Référence".
And... sometimes an English article's page would show it as having the French ("Référence") tag, and sometimes the French article's page would show it as having the English article's tag.
Turns out, MySQL's case-insensitive UTF-8 collation treated "e" and "é" as the same character. We didn't know about that, and hadn't noticed because the tagging library we used worked around it. Until one day a new version of it didn't, and tags from one language would start showing on another language's articles (if the words were the same, aside from diacritics/accents on certain characters). Which led to this:
When I worked at Mozilla I was on the MDN (developer.mozilla.org) team, and we had this inexplicable bug: articles can be categorized with tags, and both articles and tags are localizable for all the languages MDN supports. So, for example, English reference articles on CSS properties were tagged "CSS Reference", while French reference articles on CSS properties were tagged "CSS Référence".
And... sometimes an English article's page would show it as having the French ("Référence") tag, and sometimes the French article's page would show it as having the English article's tag.
Turns out, MySQL's case-insensitive UTF-8 collation treated "e" and "é" as the same character. We didn't know about that, and hadn't noticed because the tagging library we used worked around it. Until one day a new version of it didn't, and tags from one language would start showing on another language's articles (if the words were the same, aside from diacritics/accents on certain characters). Which led to this:
https://github.com/mozilla/kuma/blob/00fc05b101658f863f58d7f...
That's a custom MySQL collation, which MDN defines and installs, to work around MySQL's default inability to tell "e" and "é" apart.