Archiving websites

I'm creating a script to archive websites but I can't decide on something,
I want to recursively download every part of the site so not just a single page and whatever it needs to display correctly but some sites like Jow Forums use different domain names for different stuff like images but if I enable wget to span hosts i will end up trying to download the whole internet.
What should I do?

Attached: 82.jpg (500x375, 159K)

Other urls found in this thread:

github.com/JonasCz/save-for-offline
example.com/ass
example.com/tits
example.com/cock
gnu.org/software/wget/manual/html_node/Types-of-Files.html
twitter.com/SFWRedditVideos

wget -r -H -D4chan.org,4cdn.org,boards.Jow Forums.org

yeah that's what i use for my specific Jow Forums thread script but i want a generic one for other websites

wget -r example.com

you are welcome

Those google fonts, wordpress plugins, and gravatars are known as "the botnet". Be like RMS and only browse webpages emailed to you as a PDF.

Another option: github.com/JonasCz/save-for-offline

that doesn't even span hosts
so in Jow Forums for example you would download only a single html file with no images

You have to work it out for each site since every site is different. Or you could write your own logic to parse each html file, then feed them individually into wget using page-requisites.

the same problem exist, if i parse it myself how do i know which hosts contain parts of the website and which are completely unrelated

so, like that is how you hack Jow Forums, lets hack Jow Forums then, and make Jow Forums great again :)

Attached: bYOn88w.png (1200x1074, 973K)

that's completely irrelevant
>script to archive websites
>download every part of the site so not just a single page