Hey folks!
I made a free Chinese reading tool after trying to get through The Hunger Games almost drove me crazy.
Any Chinese (or, I assume, Japanese) learner will tell you there's a frustrating uncanny valley where you're good enough to start pursuing "real" reading material, but there are just too many unknown characters to actually make it through.
What I'm really working toward with this is something where you can upload an EPUB and it will give you a modified EPUB with exactly, and only, the language help that you need added as text annotations (called "rubies" typographically).
Probably the world's most niche web app, but hey -- for those in this specific situation, it's a real struggle.
I hacked it together pretty quickly, so if this gets any sizable load from HN I'm sure it will crash, but I would LOVE feedback from Chinese learners out there!
I tried to construct an example sentence demonstrating multiple issues: 几百个人中没有一个愿意帮李苧。 https://www.mandopando.com/file/ffa240af-2713-4d75-b1a9-b62e... (I used the Simplified Chinese view; having to explicitly select it when the input was already Simplified Chinese was a bit weird.)
几 jī "small table" should be jǐ "a few"
个人 gě rén "individual" should be 个 gè "measure word" + 人 rén "human"
中 zhōng "China" should be "among"
个 gě "used in 自個兒|自个儿" should be gè "measure word"
李 lǐ "plum" should probably be "surname Li"
苎 zhù "Boehmeria nivea" should be 苧 níng "tangled" (苧 is an edge case I like to use for testing, see https://en.wikipedia.org/wiki/Ambiguities_in_Chinese_charact... for more)
(Not being able to select ruby text was a bit annoying when making this list.)
The translation "used in 自個兒|自个儿" makes me think you're probably using a dictionary based on CC-CEDICT, but the third tone on the 个 in 个人 suggests that it's probably an older version with many errors, so you should be able to improve the quality a bit by using the latest release https://www.mdbg.net/chinese/dictionary?page=cc-cedict
Based on the pinyin for 几 and 个 I suspect you're sorting possible candidates lexicographically and picking the first (ji1 < ji3, ge3 < ge4). If you have a lot of free time you can just make a big list of the best choice for common words, or crib my work here: https://github.com/Tatoeba/sinoparserd/pull/2
But there will always be some instances where your choice of dictionary entry will turn out to be incorrect, so I think it would be nice to have a way to see alternative possible interpretations.