[Golang] Convert All HTML Links to reStructuredText via goquery


Introduction

Convert all HTML links to restructuredtext in a webpage via goquery in Golang (Go programming language). If you want to do the same thing via net/html package without goquery, see [2].

Install goquery Package

$ go get -u github.com/PuerkitoBio/goquery

Source Code

Use goquery Find() and Each() method to iterate over all HTML links, and use Text() and Attr() method to retrieve the text and href of each link. Print to screen via os.Stdout and text/template package.

link2rst.go | repository | view raw
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
package main

import (
	"github.com/PuerkitoBio/goquery"
	"os"
	"strings"
	"text/template"
)

const rstLink = "`{{.Text}} <{{.Href}}>`_\n"

type htmlLink struct {
	Text string
	Href string
}

func main() {
	url := "https://siongui.github.io/2016/04/09/js-copy-to-clipboard/"

	doc, err := goquery.NewDocument(url)
	if err != nil {
		panic(err)
	}

	tmpl := template.Must(template.New("link2rst").Parse(rstLink))
	doc.Find("a").Each(func(_ int, link *goquery.Selection) {
		text := strings.TrimSpace(link.Text())
		href, ok := link.Attr("href")
		if ok {
			tmpl.Execute(os.Stdout, &htmlLink{text, href})
		}
	})
}

Usage

$ go run link2rst.go

Tested on: Ubuntu Linux 15.10, Go 1.6.


References:

[1]github.com/PuerkitoBio/goquery - GoDoc
[2][Golang] Iterate over All DOM Elements in HTML