Jow Forums Download Script

I made this neat python script that downloads all files from any given Jow Forums thread.
It's pretty awesome so i wanted to share it with youse.

github.com/BabiYagi/Jow Forums-Downloader

Attached: sample.webm (1540x1080, 965K)

Other urls found in this thread:

a.4cdn.org/g/threads.json
a.4cdn.org/g/thread/51971506.json
a.4cdn.org/g/thread/70447342.json
github.com/mikf/gallery-dl
boards.Jow
github.com/birdmanravo/grabber/blob/master/grabber.py
a.4cdn.org/${board}/thread/${thread}.json
google.com/search?client=firefox-b-d&q=sublime licence github
twitter.com/AnonBabble

how long did ittake you?

wget -P durr -nd -nc -r -l 1 -H -D i.4cdn.org -A png,gif,jpg,jpeg,webm [URL]

bloated as fuck. I wrote this one for downloading all webms, adjust it a bit and use and throw yours into garbage
#!/bin/bash -e
mkdir -p ~/moar
cd ~/moar

awk '{
gsub(/boards.Jow Forums(nel)?/,"a.4cdn")
gsub(/ |$/,".json&")
for(i=1;i1 then "next" else "create-dir" end,
($post
| (.filename
| if .!=""
then "output=\"\($dir)/\(.).webm\""
else "remote-name" end
),
"url=i.4cdn.org/\($board)/\(.tim).webm"
)
);
repeat(input|fromjson|_(input))?' | curl -sSK-

>using windows
L M A O
M
A
O

I got the first working version up and running in about 1.5-2 hours, and then I've been adding more functionality over the last 2 days

I use Debian on my work laptop, but I use my desktop for gaming, so it kinda needs to be winblows

Go be 12years old somewhere else

Mine downloads not only webms, but images as well, and you can download multiple threads at a time, and it has a neat progress bar. Does yours?

It's great. Thanks

this thread is peak nu/g/

I made shit.

#!/usr/bin/env bash

[ $# -ge 1 -a -f "$1" ] && input="$1" || input="-"

while IFS= read -r url; do
if [ -z "$url" ]; then
continue
fi

board="$(printf -- '%s' "${url:?}" | cut -d '/' -f4)"
thread="$(printf -- '%s' "${url:?}" | cut -d '/' -f6)"
workdir="Jow Forums/$board/$thread"

mkdir -p "$workdir" "$workdir/media" "$workdir/thumbnails" "$workdir/snapshots" > /dev/null 2>&1
pushd "$workdir" > /dev/null 2>&1

echo "$url" > url.txt

curl --silent --time-cond "$thread.json" --user-agent 'Mozilla/5.0' \
"a.4cdn.org/$board/thread/$thread.json" --output ".$thread" > /dev/null 2>&1

if [ -s ".$thread" ]; then
isodate=$(date -u +'%Y%m%dT%H%M%SZ')
echo "$url"

"$thread.json"
cp ".$thread" "snapshots/$thread.$isodate.json"
mv ".$thread" "$thread.json"
else
echo "$url" >&2
fi

jq -r '
.posts[]
| select(.tim != null)
| .ext as $ext
| "

wget called, this shit already exists
Good job though, user

Thanks, lads

that's pretty niggerlicious.

a.4cdn.org/g/threads.json
a.4cdn.org/g/thread/51971506.json

You know, all threads have a json file..

a.4cdn.org/g/thread/70447342.json

u can do this w/wget
but w/e well done u made somethin

Your point being?

You can press x to .json

OP could also download the JSON file as an additional feature.
Or he could stop wasting his life downloading anything off this website.

Many games will now work on Linux with proton, I'm thinking of switching on my desktop soon

parsing json is 10x easier than fucking around with bs4?

Attached: (you).jpg (188x264, 12K)

>parsing json is 10x easier than fucking around with bs4?
TIL.
I was just putting the shit i learned in school intro practice

>this whole thread
Will OP take down his repo in shame?

Fucking neat, defnly gonna go full Linux then

It's a throwaway account, so I'll nuke it when the thread dies

Why is this more than 200 lines long? I wrote a Jow Forums scraper for the entire website that also generated static html pages with the images and messages in less than that.

You could try reading the code and find the answer for yourself

this happens so often where someone overengineers something so simple because they don't know there's an easier way to do it
and it's funny every time

Only OP could have written a defensive comment like this.

github.com/mikf/gallery-dl

Attached: ok j.jpg (296x274, 11K)

Decided to improve this by discarding thumbnails. It works, but I never learned how to regex so this is probably retarded. Someone make it more efficient.
wget -P durr -nd -nc -r --regex-type pcre --reject-regex "^https:\/\/i\.4cdn\.org\/\D\/[^s]*s" -l 1 -H -D i.4cdn.org -A png,gif,jpg,jpeg,webm [URL]

t. brainlet

Is that Windows cmd you're using? It doesn't have utf support so it won't download this pic cause you're trying to print it out in the same try block as you're saving it in. :(

Attached: κΉ€νƒœμ—°.jpg (1372x2048, 573K)

op btfo

Attached: πŸ’―πŸ’―πŸ’―πŸ˜‚α»‡γŠ—ζ±‰κΉ€.jpg (640x853, 68K)

works perfectly fine

Attached: 2019-04-06 15_18_17-C__Windows_System32_cmd.exe.png (1612x699, 151K)

Normally CMD throws an error, have you changed anything?

beat me to it. here's how I did it:
wget -P durr -nd -nc -r -l 1 -H -D i.4cdn.org -A png,gif,jpg,jpeg,webm --reject-regex='s.jpg' [URL]

I have not

>'s.jpg'
Wait, omg yeah thumbnails are always jpg, obviously. Fucking hell I'm an idiot.

what an irony

Just use jdownloader 2 faggot

My slower, verbose version from a few years ago.

wget \
--recursive \
--no-directories \
--directory-prefix="${1}" \
--accept '.jpg,.jpeg,.png,.gif,.webm' \
--reject '*s.jpg' \
--span-hosts \
--domains=i.4cdn.org \
--execute robots=off \
--wait=2 \
--random-wait \
--limit-rate=800k \
"boards.Jow Forums.org${2}"

Just use jdownloader 2 man

>using some shitty program you haven't programmed yourself
gtfo

That's cool though you could just use downthemall, an firefox addon, with some filter settings and do the same thing with a right click of the mouse.

Better than spending all day in the basement you fucking son of a bitch

Why would I install new software to mimic the functionality of software that is already installed?

I can't be bothered for reading the entire program but just the first function you wrote, get_html, is completely useless

Because of a nice gui hurr durr
Gui masterrace

>random-wait and limit-rate

Was there a reason for those? Because atm it doesn't seem like there's anything in place to kick you off.

DTA is dead.

Attached: 1503740758158.jpg (640x640, 60K)

Attached: InfinicatisLove.InifinicatisLife..Strawhatguild.com_0c60de_4598775.jpg (500x702, 72K)

That's a shame didn't use it for some time now and before that I used it with firefox ESR.

Attached: cat_trap.jpg (1173x1151, 214K)

Cat

Attached: D3D206DD3310428C946D9FDD76BD37C5.jpg (1200x1458, 207K)

Only if you're scraping whole boards in parallel. It isn't needed for boutique outfits like this.

There really isn't any reason for "random-wait" or "limit-rate". I think (not 100% sure) that years ago there was a request limit for Jow Forums (at least for the API). So that was the reason I included the "wait" option. The rate-limit was something I used because I was on a slow computer with slow internet when I was learning about wget.

I'm pretty sure I added random-wait because of something I read in a general webscraping tutorial. So that certainly isn't needed as Jow Forums doesn't care that you are downloading.

What I did probably shouldn't be used as a guide. I was just showing my shitty bash script from back in the day.

eow i live above ground you homo

-R *s.jpg

Now make one that downloads images from a Pinterest board

>python
No thanks.

Nope you motherfucking motherfucker, i am the homo prince and you shall all kneel to my arrival

Attached: 00501378745566733.gif (500x500, 206K)

There's a limit, but honestly, they don't seem to care.

>Pinterest
Don't wanna make something i'm never gonna use

Yeah, figured it out since.

I wrote one 3 years ago that I still use. Better solutions exist but it's mine and I'll keep it.
github.com/birdmanravo/grabber/blob/master/grabber.py

I honestly don't know how anyone could use Pinterest, seems like a giant mess.

I respect that

Only soccer moms use it

>not implementing thread watching to get new replies without running it multiple times like a mongoloid

>not running it on a timer because who gives a shit

4dl() {
board="$(printf -- '%s' "${1:?}" | cut -d '/' -f4)"
thread="$(printf -- '%s' "${1:?}" | cut -d '/' -f6)"
echo 4dl from board: $board / thread: $thread
echo downloading json: a.4cdn.org/${board}/thread/${thread}.json
wget -qO- "a.4cdn.org/${board}/thread/${thread}.json" | jq -r '
.posts
| map(select(.tim != null))
| map((.tim | tostring) + .ext)
| map("

cant you already do this with 4ChanX?

How would you change this so it can be set as a script that pauses and asks for the url to be input?

>half the thread is just "cant you do this with X program?"
Newfags like you are the reason half of Jow Forums is shitposting about phones and browsers.

Can confirm, my mom uses it and she's as tech savvy as your average WWII veteran. I really dislike the site because they force you to log in to see their shit and they advertise their site on google with their fake bloated resolutions. But you have to admit it's a huge ressource of images and you will stumble on it, more often than not, when you search for particular images. It's also rather well organised with the tagging system they use. If there's a script or third party alternative that lets me use their database without needing to access the site directly I would use it.

What is a good recommended text editor that is free? I'm using sublime but I just realized it's an "evaluation" and they want me to buy a copy. It doesn't really matter I guess, but I dislike having the (UNREGISTERED) at the top.

Emacs

vim

----- BEGIN LICENSE -----
Affinity Computer Technology
10 User License
EA7E-909026
D64472BD FA040F1B 20F23C0D 114D57E4
AF4DDFDC A3FDDA29 00319FA1 91EE46D2
B3210738 54154723 F12511D6 950F839D
C5A83395 76EAEC5B FC25B644 9802A931
28A62A8C 9483EC49 E28E1A3B 997FA0FA
678ED4D3 2F4C2645 8E88274C 8AC599C2
F2D578D3 DF19037B 544F5304 18F3F196
6F1AC83E 2E1FCE1D BA74F528 1340A09F
------ END LICENSE ------

ed

>OP actually made something useful
>non-contributing faggots on Jow Forums start screeching

VIIIIIIIIIIIM

xsel

can you share it?

>python

this

Thanks but it's no longer valid as it's been shared.

I was just shitposting, this is the leaked NSA sublime license. It did work for a while though, despite being "10 user".
No matter, if you actually want a working one, there's tons of them on github. google.com/search?client=firefox-b-d&q=sublime licence github

Ah, thanks. I appreciate it.

What sort of syntax would I be looking at with xsel? I haven't heard of it before

wget -P durr -nd -nc -r -l 1 -H -D i.4cdn.org -A png,gif,jpg,jpeg,webm --reject-regex='s.jpg' $(xsel)

>P durr
kek

oh wait you it to wait and ask you

#!/bin/bash
echo 'give me cum nigga'
read SEX
wget -P durr -nd -nc -r -l 1 -H -D i.4cdn.org -A png,gif,jpg,jpeg,webm --reject-regex='s.jpg' "$(SEX)"

Doesn't seem to work. is i.4cdn.org necessary if you're taking an input url to pull images from?

>anything written in python is bad/malware/written by a bad programmer
wake up, cfag

I figured it out

#!/bin/bash
echo 'URL?'
read url
wget -P Images -nd -nc -r -l 1 -H -D i.4cdn.org -A png,gif,jpg,jpeg,webm --reject-regex='s.jpg' $url

> line 4: syntax error near unexpected token `newline'

???

>all these low quality shell scripts

we get it faggots, you all think you know how to shell.