I made this neat python script that downloads all files from any given Jow Forums thread.
It's pretty awesome so i wanted to share it with youse.
github.com
I made this neat python script that downloads all files from any given Jow Forums thread.
It's pretty awesome so i wanted to share it with youse.
github.com
Other urls found in this thread:
a.4cdn.org
a.4cdn.org
a.4cdn.org
github.com
boards.Jow
github.com
a.4cdn.org
google.com
twitter.com
how long did ittake you?
wget -P durr -nd -nc -r -l 1 -H -D i.4cdn.org -A png,gif,jpg,jpeg,webm [URL]
bloated as fuck. I wrote this one for downloading all webms, adjust it a bit and use and throw yours into garbage
#!/bin/bash -e
mkdir -p ~/moar
cd ~/moar
awk '{
gsub(/boards.Jow Forums(nel)?/,"a.4cdn")
gsub(/ |$/,".json&")
for(i=1;i1 then "next" else "create-dir" end,
($post
| (.filename
| if .!=""
then "output=\"\($dir)/\(.).webm\""
else "remote-name" end
),
"url=i.4cdn.org
)
);
repeat(input|fromjson|_(input))?' | curl -sSK-
>using windows
L M A O
M
A
O
I got the first working version up and running in about 1.5-2 hours, and then I've been adding more functionality over the last 2 days
I use Debian on my work laptop, but I use my desktop for gaming, so it kinda needs to be winblows
Go be 12years old somewhere else
Mine downloads not only webms, but images as well, and you can download multiple threads at a time, and it has a neat progress bar. Does yours?
It's great. Thanks
this thread is peak nu/g/
I made shit.
#!/usr/bin/env bash
[ $# -ge 1 -a -f "$1" ] && input="$1" || input="-"
while IFS= read -r url; do
if [ -z "$url" ]; then
continue
fi
board="$(printf -- '%s' "${url:?}" | cut -d '/' -f4)"
thread="$(printf -- '%s' "${url:?}" | cut -d '/' -f6)"
workdir="Jow Forums/$board/$thread"
mkdir -p "$workdir" "$workdir/media" "$workdir/thumbnails" "$workdir/snapshots" > /dev/null 2>&1
pushd "$workdir" > /dev/null 2>&1
echo "$url" > url.txt
curl --silent --time-cond "$thread.json" --user-agent 'Mozilla/5.0' \
"a.4cdn.org
if [ -s ".$thread" ]; then
isodate=$(date -u +'%Y%m%dT%H%M%SZ')
echo "$url"
"$thread.json"
cp ".$thread" "snapshots/$thread.$isodate.json"
mv ".$thread" "$thread.json"
else
echo "$url" >&2
fi
jq -r '
.posts[]
| select(.tim != null)
| .ext as $ext
| "
wget called, this shit already exists
Good job though, user
Thanks, lads
that's pretty niggerlicious.
You know, all threads have a json file..
u can do this w/wget
but w/e well done u made somethin
Your point being?
You can press x to .json
OP could also download the JSON file as an additional feature.
Or he could stop wasting his life downloading anything off this website.
Many games will now work on Linux with proton, I'm thinking of switching on my desktop soon
parsing json is 10x easier than fucking around with bs4?
>parsing json is 10x easier than fucking around with bs4?
TIL.
I was just putting the shit i learned in school intro practice
>this whole thread
Will OP take down his repo in shame?
Fucking neat, defnly gonna go full Linux then
It's a throwaway account, so I'll nuke it when the thread dies
Why is this more than 200 lines long? I wrote a Jow Forums scraper for the entire website that also generated static html pages with the images and messages in less than that.
You could try reading the code and find the answer for yourself
this happens so often where someone overengineers something so simple because they don't know there's an easier way to do it
and it's funny every time
Only OP could have written a defensive comment like this.
Decided to improve this by discarding thumbnails. It works, but I never learned how to regex so this is probably retarded. Someone make it more efficient.
wget -P durr -nd -nc -r --regex-type pcre --reject-regex "^https:\/\/i\.4cdn\.org\/\D\/[^s]*s" -l 1 -H -D i.4cdn.org -A png,gif,jpg,jpeg,webm [URL]
t. brainlet
Is that Windows cmd you're using? It doesn't have utf support so it won't download this pic cause you're trying to print it out in the same try block as you're saving it in. :(
op btfo
works perfectly fine
Normally CMD throws an error, have you changed anything?
beat me to it. here's how I did it:
wget -P durr -nd -nc -r -l 1 -H -D i.4cdn.org -A png,gif,jpg,jpeg,webm --reject-regex='s.jpg' [URL]
I have not
>'s.jpg'
Wait, omg yeah thumbnails are always jpg, obviously. Fucking hell I'm an idiot.
what an irony
Just use jdownloader 2 faggot
My slower, verbose version from a few years ago.
wget \
--recursive \
--no-directories \
--directory-prefix="${1}" \
--accept '.jpg,.jpeg,.png,.gif,.webm' \
--reject '*s.jpg' \
--span-hosts \
--domains=i.4cdn.org \
--execute robots=off \
--wait=2 \
--random-wait \
--limit-rate=800k \
"boards.Jow Forums.org${2}"
Just use jdownloader 2 man
>using some shitty program you haven't programmed yourself
gtfo
That's cool though you could just use downthemall, an firefox addon, with some filter settings and do the same thing with a right click of the mouse.
Better than spending all day in the basement you fucking son of a bitch
Why would I install new software to mimic the functionality of software that is already installed?
I can't be bothered for reading the entire program but just the first function you wrote, get_html, is completely useless
Because of a nice gui hurr durr
Gui masterrace
>random-wait and limit-rate
Was there a reason for those? Because atm it doesn't seem like there's anything in place to kick you off.
DTA is dead.
That's a shame didn't use it for some time now and before that I used it with firefox ESR.
Cat
Only if you're scraping whole boards in parallel. It isn't needed for boutique outfits like this.
There really isn't any reason for "random-wait" or "limit-rate". I think (not 100% sure) that years ago there was a request limit for Jow Forums (at least for the API). So that was the reason I included the "wait" option. The rate-limit was something I used because I was on a slow computer with slow internet when I was learning about wget.
I'm pretty sure I added random-wait because of something I read in a general webscraping tutorial. So that certainly isn't needed as Jow Forums doesn't care that you are downloading.
What I did probably shouldn't be used as a guide. I was just showing my shitty bash script from back in the day.
eow i live above ground you homo
-R *s.jpg
Now make one that downloads images from a Pinterest board
>python
No thanks.
Nope you motherfucking motherfucker, i am the homo prince and you shall all kneel to my arrival
There's a limit, but honestly, they don't seem to care.
>Pinterest
Don't wanna make something i'm never gonna use
Yeah, figured it out since.
I wrote one 3 years ago that I still use. Better solutions exist but it's mine and I'll keep it.
github.com
I honestly don't know how anyone could use Pinterest, seems like a giant mess.
I respect that
Only soccer moms use it
>not implementing thread watching to get new replies without running it multiple times like a mongoloid
>not running it on a timer because who gives a shit
4dl() {
board="$(printf -- '%s' "${1:?}" | cut -d '/' -f4)"
thread="$(printf -- '%s' "${1:?}" | cut -d '/' -f6)"
echo 4dl from board: $board / thread: $thread
echo downloading json: a.4cdn.org
wget -qO- "a.4cdn.org
.posts
| map(select(.tim != null))
| map((.tim | tostring) + .ext)
| map("
cant you already do this with 4ChanX?
How would you change this so it can be set as a script that pauses and asks for the url to be input?
>half the thread is just "cant you do this with X program?"
Newfags like you are the reason half of Jow Forums is shitposting about phones and browsers.
Can confirm, my mom uses it and she's as tech savvy as your average WWII veteran. I really dislike the site because they force you to log in to see their shit and they advertise their site on google with their fake bloated resolutions. But you have to admit it's a huge ressource of images and you will stumble on it, more often than not, when you search for particular images. It's also rather well organised with the tagging system they use. If there's a script or third party alternative that lets me use their database without needing to access the site directly I would use it.
What is a good recommended text editor that is free? I'm using sublime but I just realized it's an "evaluation" and they want me to buy a copy. It doesn't really matter I guess, but I dislike having the (UNREGISTERED) at the top.
Emacs
vim
----- BEGIN LICENSE -----
Affinity Computer Technology
10 User License
EA7E-909026
D64472BD FA040F1B 20F23C0D 114D57E4
AF4DDFDC A3FDDA29 00319FA1 91EE46D2
B3210738 54154723 F12511D6 950F839D
C5A83395 76EAEC5B FC25B644 9802A931
28A62A8C 9483EC49 E28E1A3B 997FA0FA
678ED4D3 2F4C2645 8E88274C 8AC599C2
F2D578D3 DF19037B 544F5304 18F3F196
6F1AC83E 2E1FCE1D BA74F528 1340A09F
------ END LICENSE ------
ed
>OP actually made something useful
>non-contributing faggots on Jow Forums start screeching
VIIIIIIIIIIIM
xsel
can you share it?
>python
this
Thanks but it's no longer valid as it's been shared.
I was just shitposting, this is the leaked NSA sublime license. It did work for a while though, despite being "10 user".
No matter, if you actually want a working one, there's tons of them on github. google.com
Ah, thanks. I appreciate it.
What sort of syntax would I be looking at with xsel? I haven't heard of it before
wget -P durr -nd -nc -r -l 1 -H -D i.4cdn.org -A png,gif,jpg,jpeg,webm --reject-regex='s.jpg' $(xsel)
>P durr
kek
oh wait you it to wait and ask you
#!/bin/bash
echo 'give me cum nigga'
read SEX
wget -P durr -nd -nc -r -l 1 -H -D i.4cdn.org -A png,gif,jpg,jpeg,webm --reject-regex='s.jpg' "$(SEX)"
Doesn't seem to work. is i.4cdn.org necessary if you're taking an input url to pull images from?
>anything written in python is bad/malware/written by a bad programmer
wake up, cfag
I figured it out
#!/bin/bash
echo 'URL?'
read url
wget -P Images -nd -nc -r -l 1 -H -D i.4cdn.org -A png,gif,jpg,jpeg,webm --reject-regex='s.jpg' $url
> line 4: syntax error near unexpected token `newline'
???
>all these low quality shell scripts
we get it faggots, you all think you know how to shell.