Anki 2.1 lost kanji compatibility

KKS's Avatar

KKS

28 Apr, 2019 10:48 PM

I am not 100% sure when, but somewhere after 2.0.52 an update to Anki's code broke Japanese compatibility for some kanji.
The problem seems to affect kanji that have been simplified, and the program treats both the old and new form as the same character, despite them being different unicode characters and even being separately recognized by the government jinmeiyou list.
This is not an exhaustive list, but here are some kanji which had the issue, along with their old forms:
- 欄廊朗虜類猪神祥福諸
- 欄廊朗虜類猪神祥福諸
They are perfectly discernible and treated as different characters up until 2.0.52, but the newer version just convert them into the same symbol.
If this post also gets converted because of how the website is configured, you can check this website for the character differences:
https://kanji.jitenon.jp/kanjim/6364.html

I don't think it's a font issue, as they display fine when making the card - only after tabbing to another card and going back to it the conversion takes place.

Hope this is enough to work with, I've since reinstalled an older version of Anki so I can keep studying.

  1. Support Staff 1 Posted by Damien Elmes on 29 Apr, 2019 09:54 AM

    Damien Elmes's Avatar

    I'm afraid 2.1 stores characters in normalized unicode format, which has the unwanted side effect of replacing archaic variants with their canonical form.

  2. 2 Posted by KKS on 30 Apr, 2019 10:24 AM

    KKS's Avatar

    Despite being older, these kanji are still part of the official kanji list, so they are not obsolete - in fact, they are required learning for tests like Kanji Kentei.

    As a Japanese student, these are characters I have learned (with Anki) and not being able to visualize them is an issue.
    I'm also not knowledgeable enough about Unicode to determine which characters have been changed, so I can't upgrade Anki anymore.

    Is there any solution to this?

  3. Support Staff 3 Posted by Damien Elmes on 01 May, 2019 06:28 AM

    Damien Elmes's Avatar

    Unfortunately I don't have a better solution to offer you than sticking with 2.0 at the moment - normalization solves a bunch of other problems that affect more users, but as far as I can see there is not a way to work around these variants being converted to their canonical representation.

  4. 4 Posted by KKS on 02 May, 2019 09:09 AM

    KKS's Avatar

    For future people reading this discussion, here's some useful information:

    Upon further inspection, all the kanji being deleted seem to belong to the JIS-3 (JIS第3水準) character set, thus all the 6355 kanji of the JIS-1 and JIS-2 set should be safe (for now).

    Keep that in mind when studying Japanese and store your JIS-3 (and beyond) kanji somewhere UTF-8-friendly if using an older version of Anki is not a choice.
    Notepad++ and LibreOffice are compatible, didn't test any others, but you can use the characters I originally submitted for testing yourself.

    Happy forgetting!

  5. 5 Posted by lovac42 on 07 Dec, 2019 09:21 PM

    lovac42's Avatar

    What do you think about this?
    https://ankiweb.net/shared/info/1654528592

    Media, tag, and deck names are still being normalized, though we can change that as well and make it the same as 2.0, but I wanted to keep it that way for searching.

  6. Support Staff 6 Posted by Damien Elmes on 07 Dec, 2019 09:49 PM

    Damien Elmes's Avatar

    Please add a disclaimer making people aware of the tradeoffs - users may find they can't search for non-Latin text added on one platform that uses a different encoding to the platform they are currently on.

  7. 7 Posted by KKS on 13 Dec, 2019 06:49 AM

    KKS's Avatar

    lovac42:

    What do you think about this? Bravo! Works like charm, I can open 2.1 with renewed confidence that it won't cause a mess.
    I doubt anyone would suffer from tags or deck names being limited, but if filenames get normalized, that'd cause missing media, no? Unless Anki alters the filename saved on your HDD as well, not just the card field.

    Please add a disclaimer making people aware of the tradeoffs The same should have been done for Anki 2.1.

  8. 8 Posted by lovac42 on 13 Dec, 2019 08:50 AM

    lovac42's Avatar

    It normalizes at the time of recording. So it won't change the filename, but if you edit the note, it will change the file reference < img src="XXXXX">. XXX is changed without this addon. So any notes that you added before Anki 2.0.32? or 42? will result in blank images.

    Anki 2.0.52 normalizes media filenames, so any notes created by 2.0.52 will not be affected. This isn't in the changelogs, so I am not sure the exact version, but I am still using an old version 2.0.6 that is subjected to this error. So upgrading to 2.1 requires this addon to ensure data integrity if you have old notes made from old Anki version.

  9. 9 Posted by wtnelson on 17 Jan, 2020 03:08 PM

    wtnelson's Avatar

    Is there any solution to this?

    Two easy solutions:

    1. Use HTML entities, for example &#xfa1a; → 祥 U+FA1A
    2. Use variation sequences, for example <U+7965,U+E0100> → 祥󠄀

    ※ For option 2, you'll need to ensure the variation sequence is supported by the selected font. There's no automatic font fallback for variation sequences.

  10. 10 Posted by Emil on 20 Jun, 2020 09:08 AM

    Emil's Avatar

    I noticed that this was "fixed" in 2.1.28alpha1.

    Unicode normalization:

    If you are studying rare CJK characters and wish to prevent them from being converted into modern equivalents, the following in the debug console will stop Anki from normalizing note text.

    mw.col.conf["normalize_note_text"] = False

    Is this setting persistent between restarts?

  11. 11 Posted by lovac42 on 21 Jun, 2020 07:39 AM

    lovac42's Avatar

    My guess is that it does, otherwise there's no point in adding it.

    But I don't think any of us bothered to update yet due to the constant disruption to our schedule with these micro changes between versions. It'd be faster if you test it yourself if you've already updated. That would be the fastest way to confirm.

Comments are closed, but you can start a new discussion.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac