Google scrapes every page on Amazon every second of of every minute of every hour of every day

Google scrapes every page on Amazon every second of of every minute of every hour of every day.
I wrote a python script (admittedly in an amateurish fashion) to scrape an Amazon URL for ONE of my company’s (my company as in the company I work for) products and got my entire office building blacklisted from running script against Amazon. Why is it that Google’s web crawling doesn’t overload their servers but me taking literally less than 1kb of text data from Amazon is enough for them to block me for “overloading” their site?
Shouldn’t I be allowed to scrape our own products off Amazon? Is it just because “fuck you, little guy”?
Does anyone know if there is a way around this? I tried adding a line into the code to disguise the traffic as coming from a browser but still got the same 503 error.
Any scraping experts here who know best practices for what I’m doing?

Attached: F2E26977-8AF7-422B-A3E9-0711BC4EEEE6.png (1920x1080, 571K)

VPN or Tor before every script run ?

learn to use the official api

or

learn to impersonate a browser better

and

stop sending requests every 5ms

>Is it just because “fuck you, little guy”?
Absolutely. Google scraping them is fine - they absolutely need Amazon products to show up in Google search results - but you? What the fuck do you have to offer them?

It is impossible to state how much difference good code and bad code have on processes like that.

>why won't they let me trash their site?? wtf??? google does it and they only bring massive amounts of traffic to their site
>this is EXACTLY the same

Hey Jeff, how’s it going?

maybe start with headless chrome

Try using a real browser with selenium, it usually works for me.

Try using a rotating proxy. So that request go from different ip each time. This way amazon won't be able to track you.

Why?
• The traffic google will send them is obviously more valuable to them than whatever you're doing.
• Google isn't scraping 'every page, every second, every minute'. Amazon's high-priority, so it a change won't go unnoticed by Google for more than a day or so, but a least a few hours should be expected between re-scrapes.
• Google and Amazon most likely have come to a formal agreement regarding this data-traffic, allowing Google to get it's info, without raping all the bandwidth.
• Amazon has an API for getting almost all the information you could possibly want from the webpage, and *really* wants you to use it rather than screwing with their page-view metrics.

You should probably disclose in the User-Agent that you're actually a bot and follow whatever rules they have in their robots.txt
You are not good enough to pass off as a normal user and you will get fucked by them if that's what you're trying to do.
Also stop sending a request every cycle you dingus

>got my entire office building blacklisted from running script against Amazon
Run your script from AWS

This is why these cunts need to be broken up.

Google will most definitely NOT scrape data from Amazon. Amazon will give them a live pipe to their data instead.

a: literally do this from a browser (as a userscript or add on or something, so your requests look like they're from a browser)
b: space your requests out properly (importantly, don't do things exactly every ___ interval)

>scrapes every page on Amazon every second of of every minute of every hour of every day
They don't have such pwoer or need.

They probably don't scrape more than once per month on most sites.

>my company as in the company I work for
It's obvious you don't own the company...

Attached: 1561045145751.jpg (752x620, 138K)

At least half of the people here claim to be self employed

>claim to be self employed
While claiming unemployment, NEET-bucks, or both.

Impossible because there is none?

ahhh ha...

This is retarded.
Probably this.