What's the best way to download a thread on Jow Forums?

When I use "Save page as" in Mozilla Firefox I download the whole thread, but the images seems to be saved as thumbnails only in the folder. When I open the .html in Firefox I can click on the images such that they expand, but to get the original pictures then I have to manually save each one of them in the browser. What I am looking for is a "Save as page" analogue that saves the pictures in original form.

Attached: 109.png (3200x2400, 227K)

Other urls found in this thread:

a.4cdn.org/#{board}/catalog.json`
a.4cdn.org/#{board}/thread/#{threadno}.json`)['posts']
is2.Jow
twitter.com/SFWRedditVideos

You should just fetch the json of the page by going to http(s)://a.4cdn.org/board/thread/threadnumber.json and then just fetching the images

There is no need to save redundant html code

you could use wget

ChanThreadWatch

it gets the text, images, and videos. doesnt get youtube vids yet though and also doesnt have a linux port.

Is this bait? I am afraid to try that link :p

I am trying out wget right now. Trying to get it to download the images as well as the text.

I am on Linux, but hopefully this helps somebody else.

Thanks guy!

Why do you want to download a fucking 4chin thread? Let it die

basc-archiver

I know you do it sometimes :)

> basc-archiver
Very nice! Works like a charm straight outta the box.

submit a freedom of information act form to the FBI

Wtf?

Some generals actually keep their own archives, like that one farming general has something like 170+ threads in it, but no full sized images.

With the Quantum version of Firefox, downloading with an image grabber was mostly killed. Use a web crawler/web scraper. To download the page and 1 layer which should get all full sized images. Some may have options for recreating everything so it can be used offline. I think maybe surfoffline does that, but I don't really remember.

nice link, user. /saved and bookmarked.

I have made a scrapper before, didn't know about this JSON feature, would have made things much more easy for me.
Do archives have this as well?

Most of them.

I've wanted to mess around with wget but most fully-featured site grabbing scripts seems overly complex. Is there any good resource for learning this kind of stuff or is it a learn as you go kind of thing?

On how to use wget or how to use the JSON feature to retrieve the images? Because the latter can be made with a simple Python or even a bash script.

This is a ruby script I wrote really quickly that searches for all threads whose OP match a regular expression passed on the command line, then outputs a list of every image link from those threads.

#!/usr/bin/env ruby

require 'json'

board=?b
data=`curl a.4cdn.org/#{board}/catalog.json`

j = JSON.load(data)

threads = j.map{|page|page['threads']}.flatten

threads.select{|thread|thread['com'] =~ /#{ARGV.shift}/}.map{|thread|thread['no']}.map{|threadno|
posts=JSON.load(`curl -s a.4cdn.org/#{board}/thread/#{threadno}.json`)['posts']
posts.select{|post|post.keys.include?('filename')}.map{|post|
img_url="is2.Jow Forums.org/#{board}/#{post['tim']}#{post['ext']}"
puts img_url
}
}


I'm not going to figure out how to make it work for what you want to do, but maybe its a starting point.

Use xpath OP.

You can use httrack it's open source and really easy to use

it's funny that netscape and IE used to have a feature to save websites X link levels deep, and now I guess that's just not a thing?

Optimum theory is wrong

how does he simulates his thing on excel?

Does Chanu still work? I remember using it a few years ago and it being decent