Panscient operates a large-scale web crawler which crawls millions of websites on a regular basis. Similar to the web crawlers used by the large search engines, our web crawler crawls public websites looking for specific types of information to include in vertical search engines.
Panscient primarily crawls the web looking for corporate information, such as company names, addresses, executive biographies, job openings and product information. We also crawl the web to locate genealogy pages, such as birth, marriage and death records, obituaries and census records.
Our web crawler only accesses publicly available information published on websites. We respect the rights of website owners to control what content our crawler analyzes. Our crawler obeys the Robot Exclusion Standard, and will not collect content from any pages that are off-limits to robots.
We crawl the entire list of registered .com domain names, which is publicly available through Verisign. Once you register a domain name, our crawler will periodically check it for business information.
The Panscient web crawler identifies itself using the user-agent "panscient.com", and obeys the Robot Exclusion Standard. To exclude the Panscient web crawler from accessing portions of your site, please modify your website's robots.txt file to identify the directories and files which the crawler should not request. Our web crawler also obeys the robots meta-tag directives of "noindex" and "nofollow", which can be placed in the header section of individual web pages.
To completely exclude our web crawler from your site, add the following entry to your robots.txt file:
Contact us at email@example.com and we'll do our best to respond to your query promptly.