Panscient - powering vertical search

FAQ

Web crawler questions

Panscient operates a large-scale web crawler which crawls millions of websites on a regular basis. Similar to the web crawlers used by the large search engines, our web crawler crawls public websites looking for specific types of information to include in vertical search engines.

What kind of information are you crawling for?

Panscient primarily crawls the web looking for corporate information, such as company names, addresses, executive biographies, job openings and product information. We also crawl the web to locate genealogy pages, such as birth, marriage and death records, obituaries and census records.

Are you violating privacy?

Our web crawler only accesses publicly available information published on websites. We respect the rights of website owners to control what content our crawler analyzes. Our crawler obeys the Robot Exclusion Standard, and will not collect content from any pages that are off-limits to robots.

How did your crawler find my website?

We crawl the entire list of registered .com domain names, which is publicly available through Verisign. Once you register a domain name, our crawler will periodically check it for business information.

How can I control the information your crawler collects from my website?

The Panscient web crawler identifies itself using the user-agent "panscient.com", and obeys the Robot Exclusion Standard. To exclude the Panscient web crawler from accessing portions of your site, please modify your website's robots.txt file to identify the directories and files which the crawler should not request. Our web crawler also obeys the robots meta-tag directives of "noindex" and "nofollow", which can be placed in the header section of individual web pages.

To completely exclude our web crawler from your site, add the following entry to your robots.txt file:

User-Agent: panscient.com
Disallow: /

Why is your web crawler trying to access pages that don't exist on my website?

Our web crawler attempts to extract links to valid web pages from javascript and other scripting languages. The crawler may misinterpret the information in these scripts and request a page that does not actually exist. These requests are attempts to retrieve valid web content, and are not an attempt to circumvent your webserver security.

How often will your crawler request a page from my server?

The Panscient web crawler will request a page at most once every second from the same domain name or the same IP address.

I still have a question or concern about your web crawler. What should I do?

Contact us at crawler@panscient.com and we'll do our best to respond to your query promptly.