Rectifies Citation Research Crawler
Rectifies is an academic-grade citation research platform that tracks how web content is cited in AI-generated answers. Our crawler, rectifies-citation-research, visits publicly accessible web pages to help publishers understand and improve their visibility in AI systems.
What this crawler does not do.
It does not collect personal data. It does not train AI foundation models. It does not bypass CAPTCHAs, authentication, or paywalls. It does not submit forms or trigger transactional actions. It respects robots.txt, Crawl-delay, X-Robots-Tag, and meta noindex directives.
User-Agent
rectifies-citation-research/1.0 (+https://rectifies.io/crawler)robots.txt directives
To block this crawler, add to your robots.txt:
User-agent: rectifies-citation-research
Disallow: /To allow with rate limiting:
User-agent: rectifies-citation-research
Allow: /
Crawl-delay: 10IP address ranges
Machine-readable list: crawler-ips.json
All crawler IPs resolve via reverse DNS to *.crawl.rectifies.io.
What we do with crawled data
- Extract page features (schema markup, freshness signals, content structure)
- Version page snapshots for longitudinal citation research
- Correlate page features with observed AI citation behaviour
What we do not do:
- We do not redistribute crawled content
- We do not use crawled content to train language models
- We do not share raw crawled content with customers
Opt-out
To opt out of crawling entirely, email crawler@rectifies.io with your domain. We honour opt-out requests within 2 business days.
Contact
crawler@rectifies.io — 2-business-day response SLA.