seriously? all there is to it are html parsers and regular expressions
Daniel Powell
If you teach me how I'll give you 3 cisco courses of content for free
Parker Collins
I scrape latest manga chapters because I like using comicrack
Ryder Clark
if you know a bit of java, and css selectors you're there with jsoup.
Leo Davis
do you know how to do selenium with docker containers? I want to break the few web scrapers I have out into containers to make the logging cleaner
Luis Richardson
Can I ask why you guys bother? Is there some profit in this? Are you scraping for content that interests you personally? What's the deal?
Juan White
Me personally I've been writing my own crawler to self host a search engine. All part of my quest to de-google myself. I don't technically scrape, just crawl, scan for keywords in the doc, and index them.
Zachary Green
So I'm a student with a netacad account. I want to be able to review my course material 5 years from now but don't have faith I'll have perpetual access to the content. If I scrape the course content it's like I own 3 Cisco textbooks i can keep and share. I downloaded a website scraper program, but all the actual course content wasn't retrieved.
Bentley Clark
>scrape content >use content to train algorithim >???
Jayden Moore
you could download the HTML manually
Nathaniel Jenkins
Yeah, I could. It will take a long time but since I can't scrape I guess I should just get started.
Andrew Barnes
You should consider making a torrent of it. free sharing of information and all.
Juan Diaz
I would do that. But it's just more likely to happen if I don't have to spend hours and hours manually downloading it and stitching it together.
Webscrapers are great man, but web robots in general are cool too. Most ideas require the ability to automate HTTP GET and POST requests and parse web data. I've been working on a couple things in C recently. Having used jsoup I really prefer the elegance and power of curl.
Michael Gutierrez
Use restassured if you want to do that
Justin Hill
I had an idea to build a web interface for a web scraper and sell it as a service
probably want to build it either containerized or serverless
Jaxson Thompson
I wrote an automated scraper to download images from a few certain popular social media image platforms, for content that is of a sexual interest to me.
Bentley Rivera
lo and behold a startup is born
Justin Davis
So a upgraded selenium IDE?
Justin Scott
I don't think it's a particularly revolutionizing idea.
I first manually gave it a few usernames of accounts to scrape, and then it also finds from those accounts, any other reference to other accounts. These are treated as "relations".
If for example User A and User B are related, then there's a chance that User B contains images of User A, which is... useful.
Asher Rogers
By default I use Python, specifically lxml's xpath and request. Its enough for 90% of online content.
I've written a handful of RSS aggregators but for the most part I just use it to build image scrapers from time to time.
Alexander Barnes
yeah, web-based for businesses trying to integrate old systems or something
i'm sure it already exists but either way it's a good practice, I figure if I were to figure out how to build an office or g-suite plugin it might actually sell pretty good numbers
Carter Reyes
Idk man I'm trying to avoid Java as much as possible. But we'll see. Thanks for the resource. This isn't something that normies need access to so I prefer the simplicity of just having it as one .c file with one library that I could probably run on a toaster from tty
Blake Reyes
Anyone work with headless automation? I really want to get into it but need a good idea where to learn/start
Xavier Edwards
you're building a directed graph, check out Gephi if you want to visualize it.
I was referring to downloading ass pics off instagram as a decent idea
Cameron Baker
scraper for what exactly?
Easton Barnes
>Gephi I'll check it out, thanks
Yeah, it's pretty nice. I'm pretty much a data hoarder, and I used to spend hours every day just manually downloading this shit, so i just decided to just spend some time to automate it, and it's definitely been worth it.
Bentley Carter
Yeah, basically. Scrape sites of interest and related sites for content and then sort it for later or just consume and delete. I like to pile up useful reference material on a file server at home, to illustrate the former case, and pull daily from a collection of podcasts, vidcasts, and articles about my hobbies to my phone via termux which I usually delete after watching or listening to over my lunch hour.
Obviously, you can simply use search engine(s) and a few other pieces of software to manually accomplish the same thing, but this saves a fairly significant chunk of time when you regularly check more than a few sites for content.
Robert Wood
see
Caleb Sullivan
I used to use pure php, and then used phantomjs.
Been a while since I did some menial work though.
This is one of the easiest tasks anyone can do if you are simply "automating", but it's a hassle if you are doing it for testing your website, rather just fucking do it manually on an installed browser.
Parker Richardson
Can't they just migrate the backend?
Aaron Wright
you have no idea how many companies don't want to spend the money on that but they'll spend maybe 1/3 or 1/4 on something like this to just "make it work".
There's more money in middleware software than anything, that's why SAP is such a giant.