I want to scrape a website and display that info (with some calculations and modifications) on a mobile app

I want to scrape a website and display that info (with some calculations and modifications) on a mobile app.

What are the best methods/tools/language to do this?

Attached: 1548652321028.jpg (1080x1350, 208K)

I'm doing this exact thing atm.

The scraper is written in php and is set to run every 4 hours via a cron job from my webserver. Works well.

I have a cpp scraper that scrapes 4chinz. But i can't share it.

Depends. If the site relies on special snowflake transport protocols (glorious and useless websocket wrappers) than nodejs is your only option (well, the only semi-sane option if you don't want to reimplement special snowflake protocols). Otherwise you can use python or whatever the fuck you want. Write data to database and have separate applications (written in a language of your choice) process and display it.

Lots of threaded workers communicating with each other using queues. Seriously, this simplifies a shitton of issues on design level.

Hey, please stop posting pictures of me on this forum, without my consent! :(

if it is android use Jsoup

OP who is that girl

Hey thanks for the replies. So if i use python and beautiful soup, can i just dump that data to some database and then grab it by some mobile dev language (maybe c# if id be using xamarin)

>scrape entire video game wikis
>sell the data in an app
>add a semi-hidden small-font creative commons copyright notice
>rake in the bux

depends on what the site is. If it has an api everything is easier. If it doesn't then search the website for what you want in the source code and in your code direct to that part in your app.

Python + BSoup

just dont add anything lel, no video game fanboy cucks are gonna take you to court over it

I did something like that:
>get a popular linux icon set
>convert them in bulk into png
>got an open source template for android icon sets
>semi-hidden copyright notice
>to the playstore it goes
I'm getting like 500 usd/year, it's not much but it's something for virtually no job done.

any language you want, fucko. why do you think that only one or two languages could be capable of doing this?

Can anyone answer this?
Is it basically as easy as scraping data of off web and putting it into database then grabbing that data with other tools/languages?

Bamp

Why php?

speaking of scarping websites
say I did that and displayed the content in my app, is that legal?

you are lucky. this isn't even a forum.

depends on what you do with that content.

dunno but I got a feeling you gonna tell us

>mobile app.
Can you expand on that?
iOS and Android? Do you have JavaScript exp.? If so React Native would be fastest.

If only one or the other just use ObjC/Swift or Java/Kotlin.

Or maybe you just need a website or "pwa." If that's the case you have tons of options.

>methods/tools/language to do this?
I'd make the API in Go. Goquery is a great lib for scraping. Used it this weekend again and went great.

Ok fatty

I just display it in the app for users to see
app has no ads, in case that's relevant

That's called a fucking browser and yes it's legal.

I want to scrape her butthole if you know what I mean

Both ios and android would be best. But first maybe android.

Bamp

Finally.
Ive been thinking of a scrape joke all day.
Mine was
>Id like to scrape the dirt off her feet with my teeth
>Id like to scrape

P much yeah

I've done this for a personal project and could accomplish it with bash & curl. Posterior processing was done with html2text, sed, awk & column.

it's me :)

Hand coded assembly.

>two sticks in one ice pole

wasteful

because not all can, fuckwad. try scraping a react-generated site using php. you'll get diddly-squat, since php is not rendering the dom, so no actual live data is available.

many sites have clauses in their T&Cs that prohibit automated harvesting of data. so, if you violate those, then, no, it is not legal.

it's only a browser if op let's the user enter the url and if it's general-purpose enough to work on any website. if, however, it is built to target only one (or a handful of) site(s), then it is NOT a browser, it is a scraper, and it could indeed be illegal.

what if you want to break it half and give half to a friend? :)

Exactly

Scrape data. Save as text. Do calculations and shit. Save as JSON. Parse JSON in app.

I'm sure the Newpipe developers will get arrested any day now.

I have a little scraper running on Heroku.
It uses Puppetter for scraping, mongoDB for db, express + pug to display the data in html

She has amazing feet. Wow.

>friend
>Jow Forums
good one user.

lewd

Attached: 1525334686647.jpg (1080x1350, 393K)

This may be the better option for legal reasons. Just do everything on the client side without scraping and saving it in a database. That way it's basically a browser and you are not redistributing the copyrighted work yourself.

Ok just with few lines of python code i can now scrape the site, display item titles and prices. Now i need to put everything into a database. Tips/advices?

PERL

literal cuck. i like you're style ;)

Is python + bs + selenium is hands down the best way to achieve this?

Bamp

prove it

Connect to db
Put stuff in it.
What advice do you need with that? What kind of db software? Postresql, now start scraping.

do websites ever flag your scraper for abuse? is there a way to get around it if so?

99% of times you'll just need python + requests package with spoofed browser headers. I'm doing it for a living and I've never found a website that would require anything more.
If you're dealing with the pathological 1%, chances are they have measures to block sellenium bots (sellenium sends some data that can help differentiate it from normal browser). In such case try using Puppeteer or unofficial pypuppeteer (Python implementation, if you don't want to use Node.js stuff).

this. God allmighty this. Times again and again. Didnt know puppeteer. Might check on it on another time.

Also, perl, for great cringe.

Usually just your ip/ip range or use external services for that, like Cloudflare. That doesn't mean you shouldn't use proxy, especially when scraping shitton of subpages. At least use Tor or better yet some rotating proxy (relatively easy to setup your own with haproxy, privoxy and loads of tor connections on different ports).
Always use proxies when scraping, kids.

Only way to go. I've used it in production systems while building ML data sets quickly

BCNF or bust
>implying/g/ actually knows anything else than fizzbuzz

Doin scraping on a daily basis on 2 sites. Regularly on several others for the better part of almost 2 years now. Proxys are on rotation on only one case. Banned only on one site. After the job was done by the way. But with rotating proxies layer on the scraper you wont deal with alot of issues.

Attached: manofculture.jpg (680x383, 29K)

pls don't scrape 4plebs

please do, don't hold the brakes. Infact, do so with company. Dont let them hold their brakes. Or yours. And don't touch mine.

Link to website? What calculations?

You wouldn't be trying to rip people off now, would you?

> Want to scrape a website and display that info (with some calculations and modifications) on a mobile app.

Why must you develop a native mobile application? Develop a web application that can be accessed on any device through a browser, unless you need interoperability with the device.

> What are the best methods/tools/language to do this?

This depends on many factors. If the website provides an API, then you may not even need to scrape anything at all. Here are other factors that you must sort:

> How often do you need to scrape data?
> Is the website plain HTML and CSS, or is there lots of dynamic content?

Here is how I would do it on the back-end: Golang application would periodically scrape / fetch data, do the necessary modifications, and store it into an SQLite database. Here is how I would do it on the front-end: HTML, CSS for the design. Javascript and XHR, where XHR requests would be used to fetch data from your Golang + SQLite solution. XHR requests would run periodically. I have not developed a web application in a long time, people may rather use SSE or something else other than XHR? Please correct me if I am wrong to any experts here.

ty for the info

Do any cloud services have reliable rotating proxy services? I imagine it should be pretty simple.

I have a scraper on a micro ec2 mini but it's not using a proxy right now.

this and do it with python

Thanks for great info. I have ran into an essential problem.

Fucking website is dynamic and to load all the items i want to scrape i need to click Load More a bunch lf times. Webpage url stays the same whole time.
Checked inspect element network and appears the button is sending some AJAX POST request to some url.

Can i replicate that with Python?
Or my only option is Selenium?

bamp for interest

Not OP, but how to avoid getting IP banned while testing scrapers? knowyourmeme banned me ;(

Prove it and I'll help you with everything you need cutie.

Of course you can, why not?
See what AJAX calls are made, look what data i sent and try to replicate that as series of requests.get and requests.post calls.

Use proxies, free if possible.
First option: Tor, however tor exit nodes are public so many websites just block them by default.
Second might be some free proxies from ProxyRotator.com, they are constantly refreshed (every minute I believe) and are less probable to be blocked by default (unless they are from China, I think you can only access chink websites through them).

>Parse JSON in app.
You can use Chainlink for this.

Attached: 1547732954753.jpg (700x1016, 73K)