I want to scrape a website and display that info (with some calculations and modifications) on a mobile app

Question

I want to scrape a website and display that info (with some calculations and modifications) on a mobile app

Noah Taylor

I want to scrape a website and display that info (with some calculations and modifications) on a mobile app.

What are the best methods/tools/language to do this?

Attached: 1548652321028.jpg (1080x1350, 208K)

February 18, 2019 - 15:18

Brody Stewart

I'm doing this exact thing atm.

The scraper is written in php and is set to run every 4 hours via a cron job from my webserver. Works well.

February 18, 2019 - 15:40

Nicholas Campbell

I have a cpp scraper that scrapes 4chinz. But i can't share it.

February 18, 2019 - 15:42

Alexander Richardson

Depends. If the site relies on special snowflake transport protocols (glorious and useless websocket wrappers) than nodejs is your only option (well, the only semi-sane option if you don't want to reimplement special snowflake protocols). Otherwise you can use python or whatever the fuck you want. Write data to database and have separate applications (written in a language of your choice) process and display it.

February 18, 2019 - 15:52

Kayden Campbell

Lots of threaded workers communicating with each other using queues. Seriously, this simplifies a shitton of issues on design level.

February 18, 2019 - 17:16

James Powell

Hey, please stop posting pictures of me on this forum, without my consent! :(

February 18, 2019 - 17:41

Benjamin Foster

if it is android use Jsoup

February 18, 2019 - 17:56

Ethan Perez

OP who is that girl

February 18, 2019 - 18:07

Lincoln White

Hey thanks for the replies. So if i use python and beautiful soup, can i just dump that data to some database and then grab it by some mobile dev language (maybe c# if id be using xamarin)

February 18, 2019 - 19:36

Brayden Cruz

>scrape entire video game wikis
>sell the data in an app
>add a semi-hidden small-font creative commons copyright notice
>rake in the bux

February 18, 2019 - 19:39

Adam Barnes

depends on what the site is. If it has an api everything is easier. If it doesn't then search the website for what you want in the source code and in your code direct to that part in your app.

February 18, 2019 - 19:41

Carter Adams

Python + BSoup

February 18, 2019 - 19:42

Elijah Diaz

just dont add anything lel, no video game fanboy cucks are gonna take you to court over it

February 18, 2019 - 19:43

Anthony Ortiz

I did something like that:
>get a popular linux icon set
>convert them in bulk into png
>got an open source template for android icon sets
>semi-hidden copyright notice
>to the playstore it goes
I'm getting like 500 usd/year, it's not much but it's something for virtually no job done.

February 18, 2019 - 19:44

Elijah Butler

any language you want, fucko. why do you think that only one or two languages could be capable of doing this?

February 18, 2019 - 19:45

Julian Scott

Can anyone answer this?
Is it basically as easy as scraping data of off web and putting it into database then grabbing that data with other tools/languages?

February 18, 2019 - 19:54

Jacob Moore

Bamp

February 18, 2019 - 20:22

Eli Anderson

Why php?

February 18, 2019 - 20:26

Christian Baker

speaking of scarping websites
say I did that and displayed the content in my app, is that legal?

February 18, 2019 - 20:28

Jeremiah Peterson

you are lucky. this isn't even a forum.

February 18, 2019 - 20:29

William Sanders

depends on what you do with that content.

February 18, 2019 - 20:30

Christopher James

dunno but I got a feeling you gonna tell us

February 18, 2019 - 20:30

Jaxson Sullivan

>mobile app.
Can you expand on that?
iOS and Android? Do you have JavaScript exp.? If so React Native would be fastest.

If only one or the other just use ObjC/Swift or Java/Kotlin.

Or maybe you just need a website or "pwa." If that's the case you have tons of options.

>methods/tools/language to do this?
I'd make the API in Go. Goquery is a great lib for scraping. Used it this weekend again and went great.

February 18, 2019 - 20:31

Jackson Gomez

Ok fatty

February 18, 2019 - 20:35

Noah Martin

I just display it in the app for users to see
app has no ads, in case that's relevant

February 18, 2019 - 20:36

Brandon Kelly

That's called a fucking browser and yes it's legal.

February 18, 2019 - 20:36

Colton Sanders

I want to scrape her butthole if you know what I mean

February 18, 2019 - 20:37

Landon Parker

Both ios and android would be best. But first maybe android.

February 18, 2019 - 21:08

Levi Gray

Bamp

February 18, 2019 - 22:08

Ryder Hernandez

Finally.
Ive been thinking of a scrape joke all day.
Mine was
>Id like to scrape the dirt off her feet with my teeth
>Id like to scrape

February 18, 2019 - 22:33

Levi Hernandez

P much yeah

February 18, 2019 - 22:39

Luke Moore

I've done this for a personal project and could accomplish it with bash & curl. Posterior processing was done with html2text, sed, awk & column.

February 19, 2019 - 00:58

Camden Sanchez

it's me :)

February 19, 2019 - 01:13

Jose Brooks

Hand coded assembly.

February 19, 2019 - 01:22

Austin Kelly

>two sticks in one ice pole

wasteful

February 19, 2019 - 01:32

Kevin Rodriguez

because not all can, fuckwad. try scraping a react-generated site using php. you'll get diddly-squat, since php is not rendering the dom, so no actual live data is available.

February 19, 2019 - 01:44

Levi Parker

many sites have clauses in their T&Cs that prohibit automated harvesting of data. so, if you violate those, then, no, it is not legal.

February 19, 2019 - 01:46

James Hill

it's only a browser if op let's the user enter the url and if it's general-purpose enough to work on any website. if, however, it is built to target only one (or a handful of) site(s), then it is NOT a browser, it is a scraper, and it could indeed be illegal.

February 19, 2019 - 01:50

Gabriel Cook

what if you want to break it half and give half to a friend? :)

February 19, 2019 - 03:44

Jose Reyes

Exactly

February 19, 2019 - 05:51

Wyatt Robinson

Scrape data. Save as text. Do calculations and shit. Save as JSON. Parse JSON in app.

February 19, 2019 - 06:02

Noah Adams

I'm sure the Newpipe developers will get arrested any day now.

February 19, 2019 - 06:06

Connor Martin

I have a little scraper running on Heroku.
It uses Puppetter for scraping, mongoDB for db, express + pug to display the data in html

February 19, 2019 - 06:15

Matthew Wood

She has amazing feet. Wow.

February 19, 2019 - 06:31

Kayden Howard

>friend
>Jow Forums
good one user.

February 19, 2019 - 11:42

Owen Reyes

lewd

Attached: 1525334686647.jpg (1080x1350, 393K)

February 19, 2019 - 11:52

Grayson Ward

This may be the better option for legal reasons. Just do everything on the client side without scraping and saving it in a database. That way it's basically a browser and you are not redistributing the copyrighted work yourself.

February 19, 2019 - 12:01

Nolan Wright

Ok just with few lines of python code i can now scrape the site, display item titles and prices. Now i need to put everything into a database. Tips/advices?

February 19, 2019 - 13:48

Lucas Bennett

PERL

February 19, 2019 - 13:51

Connor Hughes

literal cuck. i like you're style ;)

February 19, 2019 - 14:19

Aiden Bennett

Is python + bs + selenium is hands down the best way to achieve this?

February 19, 2019 - 14:32

Jason Robinson

Bamp

February 19, 2019 - 16:23

Anthony Johnson

prove it

February 19, 2019 - 16:25

Andrew Diaz

Connect to db
Put stuff in it.
What advice do you need with that? What kind of db software? Postresql, now start scraping.

February 19, 2019 - 16:47

Charles Foster

do websites ever flag your scraper for abuse? is there a way to get around it if so?

February 19, 2019 - 16:53

Dominic Watson

99% of times you'll just need python + requests package with spoofed browser headers. I'm doing it for a living and I've never found a website that would require anything more.
If you're dealing with the pathological 1%, chances are they have measures to block sellenium bots (sellenium sends some data that can help differentiate it from normal browser). In such case try using Puppeteer or unofficial pypuppeteer (Python implementation, if you don't want to use Node.js stuff).

February 19, 2019 - 16:53

Leo Martinez

this. God allmighty this. Times again and again. Didnt know puppeteer. Might check on it on another time.

Also, perl, for great cringe.

February 19, 2019 - 17:00

Aaron Murphy

Usually just your ip/ip range or use external services for that, like Cloudflare. That doesn't mean you shouldn't use proxy, especially when scraping shitton of subpages. At least use Tor or better yet some rotating proxy (relatively easy to setup your own with haproxy, privoxy and loads of tor connections on different ports).
Always use proxies when scraping, kids.

February 19, 2019 - 17:02

Julian White

Only way to go. I've used it in production systems while building ML data sets quickly

February 19, 2019 - 17:03

Ryan Jackson

BCNF or bust
>implying/g/ actually knows anything else than fizzbuzz

February 19, 2019 - 17:05

Luis Perry

Doin scraping on a daily basis on 2 sites. Regularly on several others for the better part of almost 2 years now. Proxys are on rotation on only one case. Banned only on one site. After the job was done by the way. But with rotating proxies layer on the scraper you wont deal with alot of issues.

February 19, 2019 - 17:07

Brandon Edwards

Attached: manofculture.jpg (680x383, 29K)

February 19, 2019 - 17:09

Jose Roberts

pls don't scrape 4plebs

February 19, 2019 - 17:20

Angel Reed

please do, don't hold the brakes. Infact, do so with company. Dont let them hold their brakes. Or yours. And don't touch mine.

February 19, 2019 - 17:25

Caleb Mitchell

Link to website? What calculations?

You wouldn't be trying to rip people off now, would you?

February 19, 2019 - 18:49

Robert James

> Want to scrape a website and display that info (with some calculations and modifications) on a mobile app.

Why must you develop a native mobile application? Develop a web application that can be accessed on any device through a browser, unless you need interoperability with the device.

> What are the best methods/tools/language to do this?

This depends on many factors. If the website provides an API, then you may not even need to scrape anything at all. Here are other factors that you must sort:

> How often do you need to scrape data?
> Is the website plain HTML and CSS, or is there lots of dynamic content?

Here is how I would do it on the back-end: Golang application would periodically scrape / fetch data, do the necessary modifications, and store it into an SQLite database. Here is how I would do it on the front-end: HTML, CSS for the design. Javascript and XHR, where XHR requests would be used to fetch data from your Golang + SQLite solution. XHR requests would run periodically. I have not developed a web application in a long time, people may rather use SSE or something else other than XHR? Please correct me if I am wrong to any experts here.

February 19, 2019 - 19:10

Aaron Adams

ty for the info

February 19, 2019 - 19:16

James Nelson

Do any cloud services have reliable rotating proxy services? I imagine it should be pretty simple.

I have a scraper on a micro ec2 mini but it's not using a proxy right now.

February 19, 2019 - 19:33

Joshua Scott

this and do it with python

February 19, 2019 - 19:43

Nicholas Smith

Thanks for great info. I have ran into an essential problem.

Fucking website is dynamic and to load all the items i want to scrape i need to click Load More a bunch lf times. Webpage url stays the same whole time.
Checked inspect element network and appears the button is sending some AJAX POST request to some url.

Can i replicate that with Python?
Or my only option is Selenium?

February 19, 2019 - 22:11

Isaiah Rivera

bamp for interest

February 19, 2019 - 22:13

Nathan Taylor

Not OP, but how to avoid getting IP banned while testing scrapers? knowyourmeme banned me ;(

February 19, 2019 - 22:16

Asher Reed

Prove it and I'll help you with everything you need cutie.

February 19, 2019 - 22:20

Asher Wood

Of course you can, why not?
See what AJAX calls are made, look what data i sent and try to replicate that as series of requests.get and requests.post calls.

February 19, 2019 - 22:43

Christopher Reyes

Use proxies, free if possible.
First option: Tor, however tor exit nodes are public so many websites just block them by default.
Second might be some free proxies from ProxyRotator.com, they are constantly refreshed (every minute I believe) and are less probable to be blocked by default (unless they are from China, I think you can only access chink websites through them).

February 19, 2019 - 22:48

Xavier Walker

>Parse JSON in app.
You can use Chainlink for this.

Attached: 1547732954753.jpg (700x1016, 73K)

February 19, 2019 - 22:51

1 2 ... 8 Next

I want to scrape a website and display that info (with some calculations and modifications) on a mobile app

Last threads