HTML Scraping

Take a look at this webpage. Should be easy enough to scrape the prices, right?
goatbots.com/prices-boosters
Wrong. After loading, the page makes an AJAX request that returns a giant image containing the prices (pic related) and style information that specifies the proper offset for each little div in the price table. I assume the intent is specifically to prevent HTML scraping.
Is this next level autism? I can still automate collection of the prices, but with the additional work of using OCR on the image. Meanwhile they've turned a few characters into an additional 40 KB request. Is the goal here to waste CPU cycles?

Attached: autism.png (1368x720, 20K)

Other urls found in this thread:

goatbots.com/card/treasure-chest-booster
twitter.com/NSFWRedditImage

>Is this next level autism?
It's pretty high up there.

Why not just scrape off one of the major sites? TCG, Goldfish, etc? Do you specifically want this site?

I am a lurker, but I am curious, what are you doing and for what purpose

That's hilarious dude I love it.

Great way to detter script kiddies from automatically scrapping your site

Attached: 1532503287981.gif (340x308, 1.97M)

This is a specific vendor on Magic Online, so I don't think TCGPlayer is relevant, and Goldfish prices are often out of date because online prices change rapidly (and they don't list Goatbots prices at all).

I was just looking for a way to keep track of the Treasure Chest buy price, because I win them and want to know when I should sell them, and then I wanted to solve the puzzle of "why don't the prices appear in the HTML?".

>posts his own site
fuck off shill

That would be pretty dumb, considering the intersection of "people who would care about Goatbots" (Magic Online players) and "people who haven't already heard of Goatbots" is miniscule.

OP just get the html, render it, convert to an image, pass it through a program which converts image to text

For a while my website had completely randomized divs to prevent scraping.

But eventually I figured the extra maintenance wasn't worth it.

jesus fucking christ.

tl;dr ya. this guy. you could use puppeteer, take a screen grab of the output and a sufficiently good column aware ocr and pray I guess.

this website is literally garbage though. to go out of the way to make such a poorly designed webapp. lmao

>I assume the intent is specifically to prevent HTML scraping.
don't attribute to malice what can be explained by ignorance

nah, this isn't ignorant. this is deliberate bullshit.
who the fuck is generating a raster server side and some css to layout a fucking table?

This is masterful web dev

>Is this a trap?
>No, they must be stooopid
lel you'd make a great strategist wouldn't you...

I also had random text that you have to filter through.

I still do that when I paste my email address with overly complex css.
Because the css was so random there was no way you could filter out the text in consistent manner.

I'm almost compelled to actually make a webscraper for this site just to put this fag into an arms race of pure autism.

Now I'm thinking about it. I'm considering making infinite text traps for shits and giggles.

Why not get the prices from the card pages directly?

goatbots.com/card/treasure-chest-booster

that wouldn't stop even the most basic scrape

It would make sorting through the data much harder.

Tcgplayer is protected by incapsula

That's what I'm doing now. I was poking around looking for some kind of API instead of scraping that page and saw that the OP page was making an AJAX request, and then uncovered this autism.