goquery - Iterate over All DOM Elements in HTML


Introduction

Iterate over all DOM nodes/elements via goquery in Golang (Go programming language).

The trick is to use Find("*") to access all nodes in DOM tree.

Install goquery Package

$ go get -u github.com/PuerkitoBio/goquery

Source Code

node.go | repository | view raw
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
package main

import (
	"github.com/PuerkitoBio/goquery"
	"strings"
)

const html = `<html>
  <head>
    <title>traverse</title>
  </head>
  <body>
    <div>
      Hello
      <span>World</span>
      <!-- Goquery -->
    </div>
  </body>
</html>`

func main() {
	doc, err := goquery.NewDocumentFromReader(strings.NewReader(html))
	if err != nil {
		panic(err)
	}

	doc.Find("*").Each(func(_ int, node *goquery.Selection) {
		println(node.Text())
	})
}

Tested on: Ubuntu Linux 16.04, Go 1.6.2.


References:

[1]github.com/PuerkitoBio/goquery - GoDoc
[2][Golang] Iterate over All DOM Elements in HTML