[Golang] Convert All HTML Links to reStructuredText via goquery
Introduction
Convert all HTML links to restructuredtext in a webpage via goquery in Golang (Go programming language). If you want to do the same thing via net/html package without goquery, see [2].
Install goquery Package
$ go get -u github.com/PuerkitoBio/goquery
Source Code
Use goquery Find() and Each() method to iterate over all HTML links, and use Text() and Attr() method to retrieve the text and href of each link. Print to screen via os.Stdout and text/template package.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | package main import ( "github.com/PuerkitoBio/goquery" "os" "strings" "text/template" ) const rstLink = "`{{.Text}} <{{.Href}}>`_\n" type htmlLink struct { Text string Href string } func main() { url := "https://siongui.github.io/2016/04/09/js-copy-to-clipboard/" doc, err := goquery.NewDocument(url) if err != nil { panic(err) } tmpl := template.Must(template.New("link2rst").Parse(rstLink)) doc.Find("a").Each(func(_ int, link *goquery.Selection) { text := strings.TrimSpace(link.Text()) href, ok := link.Attr("href") if ok { tmpl.Execute(os.Stdout, &htmlLink{text, href}) } }) } |
Usage
$ go run link2rst.go
Tested on: Ubuntu Linux 15.10, Go 1.6.
References:
[1] | github.com/PuerkitoBio/goquery - GoDoc |
[2] | [Golang] Iterate over All DOM Elements in HTML |