Back to Silas S. Brown's home page

CedPane: Chinese-English Dictionary Public-domain Additions for Names Etc

Last update: 2020-10-11   Table rows: 62,801

People learning Chinese as a foreign language sometimes use software to help them read a text. But when Western names are written using Chinese characters, the result is not always something an average dictionary can help with---the software might give you an inappropriate "analysis" like irrigate this/now irrigate thought instead of 沃兹沃思WòzīwòsīWordsworth. So I found it useful to compile a list of names (focusing on, but not limited to, Western names) and a few other potentially-useful phrases not always found in learners' software, with examples of how these have been written in Chinese, which we can add to our software to help with our reading.

我们学汉语的西方人有时使用电脑软件加拼音和下定义,不过那软件里的词典经常缺乏专名。比如,《沃兹沃思》是个著名诗人的贵姓,英语Wordsworth,但有些学汉语软件看《沃兹沃思》就说: “沃”等于“灌溉”,“兹”等于“这个”,“沃”等于“灌溉”,“思”等于“思想”。电脑不知道“沃兹沃思”是个名字而不需这样分割。所以,我编写一个汉英词典《专名等副刊》让我们的软件看出哪里有英文名的译音和类似不必分割的词语。

While the primary purpose of this list is to help software recognise a name when it sees one, it's understandable that some people will also want to use it to 'look up' how a specific name "should" be translated. However:

  1. There is sometimes more than one way that a particular non-Chinese name has been written in Chinese.
    • Sometimes it "doesn't really matter"---you can pick any of the existing translations, or even invent a new one (within reason), and nobody will mind.
    • But occasionally it does matter---the translation you choose might imply you are of a certain age, persuasion or background (which you might or might not want to identify with), and in extreme cases you could offend someone and suffer the consequences. So I have to disclaim all legal liability for your use of my data!
  2. Sometimes several different Western names can be written the same way in Chinese, and are therefore indistinguishable in back-translations.
So please don't take my list as an "authority", and definitely don't use it to criticise other translations (it's not exhaustive). The lexicography here is descriptive (what I have seen done), not necessarily prescriptive (saying what "should" be done).

I've been wanting to put as much as possible into the public domain, so that commercial software like Wenlin, Pleco, Hanping and ChinaScribe as well as community projects like CC-CEDICT and online services can all help learners to read by incorporating these words by default instead of an "after-market" addition. But I was held back by possible 'intellectual property' considerations: if I (as a learner) saw a word in a text, and wanted my software to recognise it next time, I'd add it to my personal dictionary (with extra notes on where and when I saw it, and maybe other thoughts too), but that by itself doesn't mean I can share it: how do I know my source doesn't have some kind of "trademark rights" to their particular way of writing it?

I now understand that most countries' copyright laws do include a provision for third-party indexing, so you can say "I saw this word on page 234 of that book" and not be held liable for copyright infringement of favourite books that feature too often in your list: at worst, your list is an index of your books, which is (in countries that have those provisions) allowed. But you still run the risk of accidentally defaming a book by writing wrong notes---there are "free speech" laws protecting reviews (up to a point), but I quite like the books I read and didn't want to cast them in a bad light by publishing all my misunderstandings.

So I tried querying a large Chinese Internet search engine for each of my words, to get some measure of which words were common enough to warrant disregarding my reading notes and just saying "here's a translation that's 'out there' and worth recognising". I had to be careful to ensure the search results really showed the word in common use (not just illegal copies of the source I read), and I also had to beware of having documented a rare different use-case of an otherwise common word.

After subsetting and editing my database, I can now present 78% of the 'specialist' words I collected between 2009 and 2020 as confirmed "public domain" words you can do what you want with (i.e. please do add them to products to help learners---and email me if you'd like me to mention here that you've added it to your product). The other 22% (and my reading notes) have not been added to CedPane, but I hope it's already useful.

CedPane is a table that you should be able to copy into the spreadsheet software of your choice (use Select All, Copy, Paste). The columns are:

  1. Word as it might be written in an English text (in the case of a non-English name this is usually a transcription), or a brief definition
  2. Simple-form ("Simplified") Chinese
  3. Full-form ("Traditional") Chinese
  4. Mandarin pronunciation in Hanyu Pinyin
  5. Cantonese pronunciation in Yale (provisional---my Cantonese is much worse than my Mandarin, so I haven't been able to proof-read this column to the same standard)
  6. English pronunciation in IPA (for words where I wanted to correct my English speech synthesizer; other pronunciations may be equally correct)

Of course it goes without saying that, despite my best efforts, mistakes are possible anywhere (as is true of every dictionary) and I'm happy to receive corrections.
There is an SVN repository thanks to Cameron Wong: svn co http://svn.code.sf.net/p/e-guidedog/code/ssb22/CedPane
and there is a Git repository on GitHub: git clone https://github.com/ssb22/CedPane.git
and GitLab: git clone https://gitlab.com/ssb22/CedPane.git
and BitBucket: git clone https://bitbucket.org/ssb22/CedPane.git
and two experimental mirrors in East Asia: Gitea (git clone https://gitea.com/ssb22/CedPane.git) and Gitee: git clone https://gitee.com/ssb22/CedPane.git

(I also have a separate collection of Chinese words that are in typical dictionaries, with short English definitions that have either been confirmed by multiple independent sources to the extent that it is reasonable to believe they are public domain, or that I've written myself. This separate collection is not likely to help with software that already has a good normal dictionary, but it might be useful for developers to prototype interlinear annotators etc. It is in the Git repositories as PD-English-Definitions.txt but has not been included in the main CedPane files.)

Warning: the CedPane table has 62,801 rows. If you are on a mobile device, viewing it might slow down or crash your device. You should use a capable desktop computer to view it.
Take that risk and load the version of this page with the table included
Alternatively you can have it as a tab-delimited text file, or if you're using Pleco you can add the first four columns via these Pleco user dictionaries: CPN-CE.pqb and CPN-EC.pqb. You might wish to update these periodically (sorry there's not yet a notification system for third-party Pleco dictionary updates).
Please do NOT write programs that download a new copy of CedPane for every word. My data is free but my server is NOT. Your repeated downloads at 4M+ per hit is rude (especially with your fake Windows browser strings and fake Referer so I don't know who you are---if some programming book told you to do that, please tell me what book it was so I can write a rotten review). I've already had to block or redirect several of your browser strings, but you keep changing them. I may have to start limiting everybody to one download per day. What are you trying to DO? Can I help? Email me and let's work out a solution that does not involve my server being hammered. Thanks.

All material © Silas S. Brown unless otherwise stated.