The Absolute Minimum Every Software Developer Must Know About Unicode in 2023 (Still No Excuses!) - eviltoast
  • neutron@thelemmy.club
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    If we’re being really pedantic, the last part in Korean is counted with different units:

    • 각 as precomposed character: 1자 (unit ja for CJK characters)
    • 각 (ㄱㅏㄱ) as decomposable components: 3자모 (unit jamo for Hangul components)

    So we could have separate implementations of length() where we count such cases with different criteria… But I wouldn’t expect non-speakers of Korean know all of this.

    Plus, what about Chinese characters? Are we supposed to count 人 as one but 仁 as one (character) or two (radicals)? It gets only more complicated.