[Python] Conversion of Traditional and Simplified Chinese


OpenCC is a tool (both online and offline) for conversion Traditional and Simplified Chinese. In this post, we will write a Python program to use OpenCC to convert Simplified Chinese to Traditional Chinese.

Install OpenCC

See OpenCC repository on GitHub for installation. If you use Ubuntu Linux 15.10, you can install OpenCC by:

$ sudo apt-get install opencc

OpenCC binding for Python

I found a lot of OpenCC bindings for Python ([3], [5], [6]) We will use pyOpenCC in [3] to convert Chinese. If you use Ubuntu Linux 15.10, you can install pyOpenCC by:

# Install necessary header files for compilation
$ sudo apt-get install python-dev libopencc-dev
# Install pip. Ignore this step if you already installed pip
$ sudo apt-get install python-pip
# Install pyOpenCC
$ sudo pip install pyopencc

Souce Code For Demo

convert-chinese.py | repository | view raw
1
2
3
4
5
6
7
8
#!/usr/bin/env python
# -*- coding:utf-8 -*-

import pyopencc
CN2TW = pyopencc.OpenCC('zhs2zhtw_vp.ini').convert

if __name__ == '__main__':
  print(CN2TW("中国鼠标软件打印机"))

You can replace zhs2zhtw_vp.ini with other configurations according to your needs. All configurations I found by locate opencc are:

mix2zhs.ini
mix2zht.ini
zhs2zht.ini
zhs2zhtw_p.ini
zhs2zhtw_v.ini
zhs2zhtw_vp.ini
zht2zhs.ini
zht2zhtw_p.ini
zht2zhtw_v.ini
zht2zhtw_vp.ini
zhtw2zhcn_s.ini
zhtw2zhcn_t.ini
zhtw2zhs.ini
zhtw2zht.ini