[Python] Convert PO file to JSON Format


Introduction

Write a Python program to convert PO files to JSON format. The data of JSON format can be passed to front-end by web servers to translate a text string into the user's native language. You can use the JSON data from PO files to implement gettext function in browsers.

Sample PO files

In this example, we support two locale, zh_TW (Traditional Chinese) and vi_VN (Vietnamese). The zh_TW PO file are located at locale/zh_TW/LC_MESSAGES/messages.po and vi_VN PO file are located at locale/vi_VN/LC_MESSAGES/messages.po.

zh_TW PO file locale/zh_TW/LC_MESSAGES/messages.po:

messages.po | repository | view raw
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Chinese translations for PACKAGE package.
# Copyright (C) 2013 THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# Automatically generated, 2013.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2013-06-04 10:20+0800\n"
"PO-Revision-Date: 2013-03-10 05:19+0800\n"
"Last-Translator: Automatically generated\n"
"Language-Team: none\n"
"Language: zh_TW\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

msgid "Home"
msgstr "首頁"

msgid "Canon"
msgstr "經典"

msgid "About"
msgstr "關於"

msgid "Setting"
msgstr "設定"

msgid "Translation"
msgstr "翻譯"

vi_VN PO file locale/vi_VN/LC_MESSAGES/messages.po:

messages.po | repository | view raw
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Vietnamese translations for PACKAGE package.
# Copyright (C) 2013 THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# Automatically generated, 2013.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2013-06-06 23:05+0800\n"
"PO-Revision-Date: 2013-06-06 22:50+0800\n"
"Last-Translator: Automatically generated\n"
"Language-Team: none\n"
"Language: vi\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=1; plural=0;\n"

msgid "Home"
msgstr "Trang chính"

msgid "Canon"
msgstr "Kinh điển"

msgid "About"
msgstr "Giới thiệu"

msgid "Setting"
msgstr "Thiết lập"

msgid "Translation"
msgstr "Dịch"

Source Code

Convert PO files to JSON format:

po2json.py | repository | view raw
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/usr/bin/env python
# -*- coding:utf-8 -*-

import re
import json

def getPOPath(locale, domain, localeDir):
  return localeDir + "/" + locale + "/LC_MESSAGES/" + domain + ".po"

def extractFromPOFile(poPath):
  with open(poPath, 'r') as f:
    tuples = re.findall(r'msgid "(.+)"\nmsgstr "(.+)"', f.read())
  return tuples

def PO2JSON(locales, domain, localeDir):
  # create PO-like json data for i18n
  obj = {}
  for locale in locales:
    # English is default language
    if locale == "en_US": continue

    obj[locale] = {}
    tuples = extractFromPOFile( getPOPath(locale, domain, localeDir) )
    for tuple in tuples:
      obj[locale][tuple[0].decode('utf-8')] = tuple[1].decode('utf-8')
      #obj[locale][tuple[0]] = tuple[1]

  return json.dumps(obj)

if __name__ == '__main__':
  locales = ["zh_TW", "vi_VN"]
  domain = "messages"
  localeDir = "locale"
  print(PO2JSON(locales, domain, localeDir))

Output of Demo

{"zh_TW": {"Home": "\u9996\u9801", "About": "\u95dc\u65bc", "Setting": "\u8a2d\u5b9a", "Canon": "\u7d93\u5178", "Translation": "\u7ffb\u8b6f"}, "vi_VN": {"Home": "Trang ch\u00ednh", "About": "Gi\u1edbi thi\u1ec7u", "Setting": "Thi\u1ebft l\u1eadp", "Canon": "Kinh \u0111i\u1ec3n", "Translation": "D\u1ecbch"}}

Tested on: Ubuntu Linux 15.10, Python 2.7.10.


References:

[1]Python Regular Expressions | Google for Education | Google Developers
[2]Regex replace (in Python) - a simpler way? - Stack Overflow