Found in 1 comment on Hacker News
nanis · 2022-02-19 · Original thread
> "one code point in unicode does not necessarily map to one character on the screen."

Also, importantly, some characters are not represented at all. For example, there is no codepoint or combination of characters that distinguishes a capital Turkish dotless I from its identical looking but conceptually different sibling the Latin capital I. Similarly for capital Turkish dotted İ.

I find it extremely weird that two codepoints couldn't have been spared, yet we have gradations of skin tone in emoji. One can use composition to deal with the round-trip from i -> İ -> i (within a closed ecosystem), but even that fails when it comes to ı -> I -> ı.

A case in point are the product pages for my friend's book on Amazon. Compare "Sınır Ötesi" to "Sinir Ötesi". The former means "beyond borders" whereas the latter means "exceedingly irritating".

Yet, because there is no unambiguous representation of the Turkish I's, the rendering makes an assumption on the basis of the domain. Note that all other non-US ASCII letters involved are rendered correctly.

\c[PERSON FROWNING, ZERO WIDTH JOINER, PROGRAMMER]

[tr]: https://www.amazon.com.tr/Gezging%C3%B6z-S%C4%B1n%C4%B1r-T%C...

[us]: https://www.amazon.com/Gezging%C3%B6z-Sinir-T%C3%BCrkiye-Mir...

Fresh book recommendations delivered straight to your inbox every Thursday.