Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: An adaptive Chinese reader. Only helps where you need it (mandopando.com)
2 points by eob on Nov 28, 2022 | hide | past | favorite | 1 comment
Hey folks!

I made a free Chinese reading tool after trying to get through The Hunger Games almost drove me crazy.

Any Chinese (or, I assume, Japanese) learner will tell you there's a frustrating uncanny valley where you're good enough to start pursuing "real" reading material, but there are just too many unknown characters to actually make it through.

What I'm really working toward with this is something where you can upload an EPUB and it will give you a modified EPUB with exactly, and only, the language help that you need added as text annotations (called "rubies" typographically).

Probably the world's most niche web app, but hey -- for those in this specific situation, it's a real struggle.

I hacked it together pretty quickly, so if this gets any sizable load from HN I'm sure it will crash, but I would LOVE feedback from Chinese learners out there!



The mechanic of clicking the pinyin or translation above a word to hide it for every occurrence is great and feels satisfying to use somehow, but I think the NLP quality leaves something to be desired.

I tried to construct an example sentence demonstrating multiple issues: 几百个人中没有一个愿意帮李苧。 https://www.mandopando.com/file/ffa240af-2713-4d75-b1a9-b62e... (I used the Simplified Chinese view; having to explicitly select it when the input was already Simplified Chinese was a bit weird.)

几 jī "small table" should be jǐ "a few"

个人 gě rén "individual" should be 个 gè "measure word" + 人 rén "human"

中 zhōng "China" should be "among"

个 gě "used in 自個兒|自个儿" should be gè "measure word"

李 lǐ "plum" should probably be "surname Li"

苎 zhù "Boehmeria nivea" should be 苧 níng "tangled" (苧 is an edge case I like to use for testing, see https://en.wikipedia.org/wiki/Ambiguities_in_Chinese_charact... for more)

(Not being able to select ruby text was a bit annoying when making this list.)

The translation "used in 自個兒|自个儿" makes me think you're probably using a dictionary based on CC-CEDICT, but the third tone on the 个 in 个人 suggests that it's probably an older version with many errors, so you should be able to improve the quality a bit by using the latest release https://www.mdbg.net/chinese/dictionary?page=cc-cedict

Based on the pinyin for 几 and 个 I suspect you're sorting possible candidates lexicographically and picking the first (ji1 < ji3, ge3 < ge4). If you have a lot of free time you can just make a big list of the best choice for common words, or crib my work here: https://github.com/Tatoeba/sinoparserd/pull/2

But there will always be some instances where your choice of dictionary entry will turn out to be incorrect, so I think it would be nice to have a way to see alternative possible interpretations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: