How do you use regex to parse html?

How do you use regex to parse html?

Attached: regex_examples.png (700x496, 146K)

Other urls found in this thread:

stackoverflow.com/a/1732454/2378146
github.com/PuerkitoBio/goquery.
html.spec.whatwg.org/multipage/parsing.html#parsing
twitter.com/AnonBabble

>How do you use regex to parse html?
don't do that

H̸̡̪̯ͨ͊̽̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̝͖ͭ̏ͮ͟O̮̪̝͍ͮM̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ

stackoverflow.com/a/1732454/2378146

if absolutely need to parse html, use something like goquery,
github.com/PuerkitoBio/goquery.
example:
func ExampleScrape() {
// Request the HTML page.
res, err := http.Get("metalsucks.net")
if err != nil {
log.Fatal(err)
}
defer res.Body.Close()
if res.StatusCode != 200 {
log.Fatalf("status code error: %d %s", res.StatusCode, res.Status)
}

// Load the HTML document
doc, err := goquery.NewDocumentFromReader(res.Body)
if err != nil {
log.Fatal(err)
}

// Find the review items
doc.Find(".sidebar-reviews article .content-block").Each(func(i int, s *goquery.Selection) {
// For each item found, get the band and title
band := s.Find("a").Text()
title := s.Find("i").Text()
fmt.Printf("Review %d: %s - %s\n", i, band, title)
})
}

haha le EPIC zalgo meme, upvoted my r/stackoverflow friends :^)

Writing an HTML parser (and validator) is a great exercise for anyone trying to learn programming. No regex required.

>The cannot hold
gets me every time

Just use an HTML parser. Regex sucks for this task.

Good luck with that - the spec is ridiculously complex.
HTML is not XML, it's a really fault tolerant langauge and that's what makes it so difficult to properly parse.

Look at it: html.spec.whatwg.org/multipage/parsing.html#parsing Do you think that is "a great exercise for anyone trying to learn programming"? That's a great exercise for a fucking whole team of experiences programmers.