Crawling a URL
The next step is to crawl the URL.
- Click on the Crawl button to crawl all URLs listed.
The Site Maps tool will prompt for any exclusion options to be entered. Excluded URLs are not included in the site map.
To exclude specific URLs or to specify a wildcard to exclude, enter the URL in the "Exclude URLs" section. eg. If you don't want to include the page 404.htm then you would enter:
If you wanted to exclude all shtml files, then you would enter:
Ignore dynamic links
This option will ignore dynamic links i.e. links containing the ? character.
Strip URL parameters
Strips parameters from URLs.
Do not Cache files locally
Files will not be cached locally and will be fetched from the Internet as required. If left unchecked, the Toolkit will download the files once, and all tools will use the cached files.
- Click on the start button to begin the crawl. When finished, the URLs from the site will be listed, with their last modified date and title.
To view a URL, right click the URL and click on the Open button. The URL will be opened in a browser window. Click the Refresh button to refresh URL information.