Panscient operates a large-scale web crawler which crawls millions of websites on a regular basis. Similar to the web crawlers used by the large search engines, our web crawler crawls public websites looking for specific types of information to include in vertical search engines.
Panscient primarily crawls the web looking for corporate information, such as company names, addresses, executive biographies, job openings and product information. We also crawl the web to locate genealogy pages, such as birth, marriage and death records, obituaries and census records.
Our web crawler only accesses publicly available information published on websites. We respect the rights of website owners to control what content our crawler analyzes. Our crawler obeys the Robot Exclusion Standard, and will not collect content from any pages that are off-limits to robots.
We crawl the entire list of registered .com domain names and several smaller top-level domains, which are publicly available through Verisign. Once you register a domain name, our crawler will periodically check it for business information.
The Panscient web crawler identifies itself using the user-agents "panscient.com" or "pantest", and obeys the Robot Exclusion Standard. To exclude the Panscient web crawler from accessing portions of your site, please modify your website's robots.txt file to identify the directories and files which the crawler should not request. Our web crawler also obeys the robots meta-tag directives of "noindex" and "nofollow", which can be placed in the header section of individual web pages.
To completely exclude our web crawler from your site, add the following entry to your robots.txt file:
User-Agent: panscient.com
Disallow: /
Our web crawler attempts to extract links to valid web pages from javascript and other scripting languages. The crawler may misinterpret the information in these scripts and request a page that does not actually exist. These requests are attempts to retrieve valid web content, and are not an attempt to circumvent your webserver security.
The Panscient web crawler will request a page at most once every second from the same domain name or the same IP address.
Contact us at crawler@panscient.com and we will respond to your query promptly.
Category | Examples | Collected, sold and disclosed |
---|---|---|
A. Identifiers. | A real name, alias, postal address, unique personal identifier, online identifier, Internet Protocol address, email address, account name, Social Security number, driver's license number, passport number, or other similar identifiers. | We may have collected some of these data elements on California Residents. |
B. Personal information categories listed in the California Customer Records statute (Cal. Civ. Code § 1798.80(e)). | A name, signature, Social Security number, physical characteristics or description, address, telephone number, passport number, driver's license or state identification card number, insurance policy number, education, employment, employment history, bank account number, credit card number, debit card number, or any other financial information, medical information, or health insurance information. Some personal information included in this category may overlap with other categories. | We may have collected some of these data elements on California Residents. |
C. Protected classification characteristics under California or federal law. | Age (40 years or older), race, color, ancestry, national origin, citizenship, religion or creed, marital status, medical condition, physical or mental disability, sex (including gender, gender identity, gender expression, pregnancy or childbirth and related medical conditions), sexual orientation, veteran or military status, genetic information (including familial genetic information). | We do not collect this type of information. |
D. Commercial information. | Records of personal property, products or services purchased, obtained, or considered, or other purchasing or consuming histories or tendencies. | We do not collect this type of information. |
E. Biometric information. | Genetic, physiological, behavioral, and biological characteristics, or activity patterns used to extract a template or other identifier or identifying information, such as, fingerprints, faceprints, and voiceprints, iris or retina scans, keystroke, gait, or other physical patterns, and sleep, health, or exercise data. | We do not collect this type of information. |
F. Internet or other similar network activity. | Browsing history, search history, information on a consumer's interaction with a website, application, or advertisement. | We do not collect this type of information. |
G. Geolocation data. | Physical location or movements. | We do not collect this type of information. |
H. Sensory data. | Audio, electronic, visual, thermal, olfactory, or similar information. | We do not collect this type of information. |
I. Professional or employment-related information. | Current or past job history or performance evaluations. | We may have collected some of these data elements on California Residents. |
J. Non-public education information (per the Family Educational Rights and Privacy Act (20 U.S.C. Section 1232g, 34 C.F.R. Part 99)). | Education records directly related to a student maintained by an educational institution or party acting on its behalf, such as grades, transcripts, class lists, student schedules, student identification codes, student financial information, or student disciplinary records. | We do not collect this type of information. |
K. Inferences drawn from other personal information. | Profile reflecting a person's preferences, characteristics, psychological trends, predispositions, behavior, attitudes, intelligence, abilities, and aptitudes. | We do not collect this type of information. |