The Crawling Scope is a general rule for defining which pages are to be downloaded. When there are no pages left within the scope, the session is considered done.
By default, this is set to Same Directory, which means SiteCrawler will only download resources (pages, images, etc) which are located in the same server directory as the Starting URL, or in directories below it.
The Same Sub-Domain scope downloads all resources that are located on the same exact host name as the starting URL. Using http://www.example.com/foo/
as the starting URL would match all pages under www.example.com
, but not those under bar.example.com
.
Same domain matches everything under the same second-level domain, like everything under example.com
, including www.example.com
and bar.example.com
. Do note however, that second-level domains are not always specific to one web site. Some top-level domains require domains to be arranged under specific second-level names. For example, commerical British sites reside under .co.uk
. Crawling one of these sites with the Same domain setting would accept all domains under .co.uk
.
Any domain does not limit the scope at all, allowing all addresses. Be careful with this option, as the session might keep on forever, trying to crawl the entire web.