How do you use regex to parse html?
How do you use regex to parse html?
Other urls found in this thread:
stackoverflow.com
github.com
html.spec.whatwg.org
twitter.com
>How do you use regex to parse html?
don't do that
H̸̡̪̯ͨ͊̽̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̝͖ͭ̏ͮ͟O̮̪̝͍ͮM̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ
if absolutely need to parse html, use something like goquery,
github.com
example:
func ExampleScrape() {
// Request the HTML page.
res, err := http.Get("metalsucks.net")
if err != nil {
log.Fatal(err)
}
defer res.Body.Close()
if res.StatusCode != 200 {
log.Fatalf("status code error: %d %s", res.StatusCode, res.Status)
}
// Load the HTML document
doc, err := goquery.NewDocumentFromReader(res.Body)
if err != nil {
log.Fatal(err)
}
// Find the review items
doc.Find(".sidebar-reviews article .content-block").Each(func(i int, s *goquery.Selection) {
// For each item found, get the band and title
band := s.Find("a").Text()
title := s.Find("i").Text()
fmt.Printf("Review %d: %s - %s\n", i, band, title)
})
}
haha le EPIC zalgo meme, upvoted my r/stackoverflow friends :^)
Writing an HTML parser (and validator) is a great exercise for anyone trying to learn programming. No regex required.
>The cannot hold
gets me every time
Just use an HTML parser. Regex sucks for this task.
Good luck with that - the spec is ridiculously complex.
HTML is not XML, it's a really fault tolerant langauge and that's what makes it so difficult to properly parse.
Look at it: html.spec.whatwg.org