jfitzpatrick at October 3rd, 2013 16:01 — #1
Originally published at: http://www.howtogeek.com/171948/how-can-i-download-an-entire-web-site/
You don’t just want an article or an individual image, you want the whole web site. What’s the easiest way to siphon it all?
michaeltunnell at October 3rd, 2013 17:02 — #2
this is kind of misleading...
"How Can I Download an Entire Web Site?"
The short answer...these days that is impossible.
You can download the HTML, CSS, JS, Images and any media files...but that is NOT the Entire website.
It would not be possible to download every PHP, XML, etc server side files because most systems don't tell you what files are needed. If you have direct links to each php file you could probably download them except for those that have die commands if they are accessed directly.
Databases are also impossible to download without permission to do so.
So with HTTRACK or Wget or any other kind of downloader...this question is not possible.
If the word "Entire" wasn't in the question then it would be a totally different conversation.
meoow at October 4th, 2013 07:20 — #3
wget --header="Accept-Language: en-us,en;q=0.5" --header="Accept-Charset: ISO-8859-1,utf-8;q=0.7,;q=0.7" --header="Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,/*;q=0.5" --user-agent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:22.214.171.124) Gecko/20070725 Firefox/126.96.36.199" -r -p -k -c -np -E --tries=1 --timeout=5 -e robots=off http://the.web.site
nsdcars5 at October 4th, 2013 08:39 — #4
You can do a base install of Arch Linux with less text.
musicman1601 at October 4th, 2013 09:26 — #5
Sitesucker has been my go to app on the Mac for a number of years.
michaeltunnell at October 4th, 2013 16:14 — #6
that will bypass htaccess but that will not allow proper downloading of php files and it will absolutely not allow anyone to download databases.
The idea of downloading an "entire" website is impossible now...regardless of any legit tools.
You could black hat attack a website to get the contents but other than doing that it is not possible...and in some cases not possible to do that either.
raphoenix at October 4th, 2013 20:01 — #7
tad at October 4th, 2013 23:38 — #8
I've always wanted a browser feature (e.g. extension) that would automatically do this for every website I visited. Especially now with huge HDDs this wouldn't be a problem (it could be set to clear the saves every 3-7 days, for example). It wouldn't need to be a "functioning" website, but rather like a screenshot of the page as you currently see it. Sometimes when traveling I may be without any connection for a while, so it would be great if that Wikipedia article I opened before I left (but didn't remember to manually save an offline copy of) was saved and ready for me to use for reference in my homework, for example.