Fast Sitemap Maker and Broken link detector settings
Here are explanations and examples of usage for all parameters and settings.
Keep in mind that each setting is saved within project file. So each project (site) has its own parameters.
Application lists all URLs that can be accessed from browser but not all of them are web pages (for example,downloads,or links to images and video files).
Only regular web pages will be included in sitemap such wit status code 200 OK, and which type is textual. URLs are color coded, Blue means it is not textual page (html,htm,php…) and red means there was some error and page couldn’t be loaded (missing page,broken link,some other error…)
Start URL: Paste in (or type) starting url of a site to be crawled (e.g. http://ww.example.com)
Stay in path: Check this to scan only pages within starting path. For example if start URL is www.example.com/pages than only documents belonging to folder ‘pages’ and its subfolders will be added to sitemap and checked for broken links.
Set URL tags
Each url in sitemap can have three optional tags:
The date of last modification of the file. This date should be in W3C Datetime format. This format allows you to omit the time portion, if desired, and use YYYY-MM-DD.
Note that this tag is separate from the If-Modified-Since (304) header the server can return, and search engines may use the information from both sources differently.
How frequently the page is likely to change. This value provides general information to search engines and may not correlate exactly to how often they crawl the page. Valid values are:
The value “always” should be used to describe documents that change each time they are accessed. The value “never” should be used to describe archived URLs.
Please note that the value of this tag is considered a hint and not a command. Even though search engine crawlers may consider this information when making decisions, they may crawl pages marked “hourly” less frequently than that, and they may crawl pages marked “yearly” more frequently than that. Crawlers may periodically crawl pages marked “never” so that they can handle unexpected changes to those pages.
The priority of this URL relative to other URLs on your site. Valid values range from 0.0 to 1.0. This value does not affect how your pages are compared to pages on other sites—it only lets the search engines know which pages you deem most important for the crawlers.
The default priority of a page is 0.5.
Please note that the priority you assign to a page is not likely to influence the position of your URLs in a search engine’s result pages. Search engines may use this information when selecting between URLs on the same site, so you can use this tag to increase the likelihood that your most important pages are present in a search index.
Also, please note that assigning a high priority to all of the URLs on your site is not likely to help you. Since the priority is relative, it is only used to select between URLs on your site.
Speed (active connections)
How many parallel connections are being used. All servers have limit to this number so some are OK with running huge numbers of parallel connections at once and some are more stern (e.g. in order to conserve their resources)
Default setting is 10. If you get many 500Errors or TimeOut errors try lowering this number.
Under some conditions two pages may redirect to each other and than it goes on for a long time,to prevent this from happening set this number.
A page can be slow for some reason or at a particular time,to avoid long waits for such pages to load set this number.(although most of the servers will send an error after some not too long time)
This is how you identify yourself to the web server
Don’t crawl links longer
Sometimes there are very long URLs (either because of bad structure,negligence or even some bug in PHP coding). These pages are usually irrelevant (or even can hurt your SEO). Sometimes they are automatically generated and prevent the crawl from being finished. Limit URL length to some reasonable amount(e.g. ~100)
Don’t crawl links containing these strings
Maybe for a reason you don’t want particular pages to be indexed in your sitemap. For example you don’t want to include pages generated from a database by search parameters (there can be any number of such pages and crawler will never finish. Or maybe you don’t want to include pages from a particular directory. List such exclusions line by line, for example: