[Python] Insert Line With Matched Pattern


Extract URL from reStructuredText link and insert the URL into the file as rst metadata via Python.

We extract URL from the following link in rst file:

`舊網頁 <http://nanda.online-dhamma.net/Tipitaka/Post-Canon/Visuddhimagga/Visuddhimagga.htm>`_

Then insert the URL back to the rst file as metadata.

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import os
import re

oldlink = re.compile('<http.+>')

def processFile(path):
  print("process " + path + " ...")
  tmpfilepath = os.path.join("/tmp", os.path.basename(path))
  with open(path, "r") as f, open(tmpfilepath, "w") as fo:
    # http://www.tutorialspoint.com/python/file_readlines.htm
    lines = f.readlines()
    insertAt = 0
    oldurl = ""
    for index, line in enumerate(lines):
      if ":tags:" in line:
        insertAt = index
      if "舊網頁" in line:
        result = oldlink.findall(line)
        oldurl = result[0][1:-1]

    assert insertAt != 0
    assert oldurl != ""
    # http://www.tutorialspoint.com/python/list_insert.htm
    # http://stackoverflow.com/questions/10507230/insert-line-at-middle-of-file-with-python
    # http://stackoverflow.com/questions/11968998/remove-lines-that-contain-certain-string
    lines.insert(insertAt, ":oldurl: " + oldurl + "\n")
    fo.writelines(lines)

  with open(tmpfilepath, "r") as f, open(path, "w") as fo:
    fo.write(f.read())


def processDir(rootDir):
  # http://www.tutorialspoint.com/python/os_walk.htm
  for root, dirs, files in os.walk(rootDir):
    for name in files:
      path = os.path.join(root, name)
      processFile(path)


if __name__ == '__main__':
  # http://stackoverflow.com/questions/50499/how-do-i-get-the-path-and-name-of-the-file-that-is-currently-executing
  processDir(os.path.join(os.path.dirname(__file__), "../content/articles"))

Tested on: Ubuntu Linux 15.10, Python 2.7.10.


References:

[1]twnanda/linkold.py at master · twnanda/twnanda · GitHub