[Golang] XML Parsing Example (1)


Assume we have a XML file as follows:

example-1.xml | repository | view raw
1
<?xml version="1.0" encoding="UTF-8"?><div>Example</div>

We would like to parse the XML file and extract the useful content. Here is how we do in Go programming language:

Run Code on Go Playground

parse-1.go | repository | view raw
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// http://golang.org/pkg/io/ioutil/
// http://golang.org/pkg/encoding/xml/
package main

import (
	"io/ioutil"
	"encoding/xml"
	"fmt"
)

type div struct {
	XMLName	xml.Name	`xml:"div"`
	// First letter must be capital. Cannot use `content`
	Content	string		`xml:",chardata"`
}

func main() {
	d := div{}
	xmlContent, _ := ioutil.ReadFile("example-1.xml")
	err := xml.Unmarshal(xmlContent, &d)
	if err != nil { panic(err) }
	fmt.Println("XMLName:", d.XMLName)
	fmt.Println("Content:", d.Content)
}

Put the above two files on the same directory and run the code. The ouput will be:

XMLName: { div}
Content: Example

Note

In the line:

XMLName     xml.Name        `xml:"div"`

XMLName and xml.Name are fixed syntax for reading div element, you can not change them at will.

Note

Also in the line:

Content     string          `xml:",chardata"`

The first letter of the variable Content must be capital. If you use content, the Go parser will fail to read the content in div element. You can use another name for the variable, as long as the first letter is capital.

Note

If you replace the line:

Content     string          `xml:",chardata"`

with

Content     string          `xml:",innerxml"`

i.e., replace ,chardata with ,innerxml. The output result will be the same because the raw XML nested inside the div element is the same as the character data of the div element in this case.


Makefile for automating the development:

Makefile | repository | view raw
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# cannot use relative path in GOROOT, otherwise 6g not found. For example,
#   export GOROOT=../go  (=> 6g not found)
# it is also not allowed to use relative path in GOPATH
export GOROOT=$(realpath ../../../../../go)
export GOPATH=$(realpath .)
export PATH := $(GOROOT)/bin:$(PATH)

all: parseFeed

example1:
	@# http://stackoverflow.com/questions/9967105/suppress-echo-of-command-invocation-in-makefile
	@go run parse-1.go

example2:
	@go run parse-2.go

example3:
	@go run parse-3.go

example4:
	@go run parse-4.go

example5:
	@go run parse-5.go

example5_2:
	@go run parse-5_2.go

example6:
	@go run parse-6.go

example7:
	@go run parse-7.go

atom2rss:
	@go run atom2rss.go

parseFeed:
	@go run parseFeed.go

help:
	go help

Tested on: Ubuntu Linux 14.10, Go 1.4.


[Golang] XML Parsing Example series:

[1][Golang] XML Parsing Example (1)
[2][Golang] XML Parsing Example (2)
[3][Golang] XML Parsing Example (3)
[4][Golang] XML Parsing Example (4)
[5][Golang] XML Parsing Example (5) - Parse OPML
[6][Golang] XML Parsing Example (6) - Parse OPML Concisely
[7][Golang] XML Parsing Example (7) - Parse RSS 2.0
[8][Golang] XML Parsing Example (8) - Parse Atom 1.0
[9][Golang] Convert Atom to RSS
[10][Golang] Parse Web Feed - RSS and Atom

References:

[a]xml - The Go Programming Language
[b]src/encoding/xml/example_test.go - The Go Programming Language
[c]Reading XML Documents in Go
[d]Goで任意のXMLを処理する - GolangRdyJp
[e]XML to Go struct : golang