Rectifies Citation Research Crawler

Rectifies is an academic-grade citation research platform that tracks how web content is cited in AI-generated answers. Our crawler, rectifies-citation-research, visits publicly accessible web pages to help publishers understand and improve their visibility in AI systems.

What this crawler does not do.

It does not collect personal data. It does not train AI foundation models. It does not bypass CAPTCHAs, authentication, or paywalls. It does not submit forms or trigger transactional actions. It respects robots.txt, Crawl-delay, X-Robots-Tag, and meta noindex directives.

User-Agent

rectifies-citation-research/1.0 (+https://rectifies.io/crawler)

robots.txt directives

To block this crawler, add to your robots.txt:

User-agent: rectifies-citation-research
Disallow: /

To allow with rate limiting:

User-agent: rectifies-citation-research
Allow: /
Crawl-delay: 10

IP address ranges

Machine-readable list: crawler-ips.json

All crawler IPs resolve via reverse DNS to *.crawl.rectifies.io.

What we do with crawled data

  • Extract page features (schema markup, freshness signals, content structure)
  • Version page snapshots for longitudinal citation research
  • Correlate page features with observed AI citation behaviour

What we do not do:

  • We do not redistribute crawled content
  • We do not use crawled content to train language models
  • We do not share raw crawled content with customers

Opt-out

To opt out of crawling entirely, email crawler@rectifies.io with your domain. We honour opt-out requests within 2 business days.

Contact

crawler@rectifies.io — 2-business-day response SLA.