
Live 30-day case study. Baseline ran on 2026-05-09 across 44 conversations on the four-engine baseline panel. Two cites total. Top competitors named: Buffer, Hootsuite, Sprout Social. The Autopilot run is live. Final score lands June 8, 2026.
RiteKit is a social media toolkit, founded by the same operator who built MentionFox. Years of product, modest organic SEO, no historical GEO work. That makes it a clean test brand: a real product with real users but a low AI-visibility floor, exactly the cohort that needs Autopilot the most.
RiteKit was chosen specifically because we knew the baseline would be low. A case study that started at 60 and ended at 65 would prove almost nothing. A case that starts at 5 and ends wherever it ends after 30 days of Autopilot is informative either way.
| Engine | Conversations | Cites | Cite rate |
|---|---|---|---|
| ChatGPT | 11 | 2 | 18.18% |
| Gemini | 11 | 0 | 0% |
| Claude (panel only) | 11 | 0 | 0% |
| Perplexity | 11 | 0 | 0% |
The day-zero baseline used the four-engine validation panel (the simplified set used to seed the case study). The full seven-LLM scoring panel kicks in for the daily Autopilot run starting day 1.
The same 44 conversations surfaced these names, in order of cite count:
Reading: AI engines are confidently recommending the established players in the social media tool category. RiteKit appears 2 times. Buffer appears 21. The gap is content depth, accumulated citations, and category-leader presence in the training data — not product quality.
The plan below was written and committed on day 0. No mid-run changes. If the result is bad, the plan is bad — we will not retroactively edit it.
Case studies are easy to fake. The most common pattern is to run a measurement, hide the runs that did not move, and publish the one that did. We reject that pattern. Specifically:
For this category (social media tools, with established incumbents like Buffer and Hootsuite), a realistic 30-day target is to move from "Invisible" (0-15) into "Mentioned" (16-35) on the four-engine score. That would mean RiteKit is being named more frequently and on more engines than the day-zero state, but is not yet a default recommendation.
Reaching "Considered" (36-55) on a 30-day run for a brand that started at 5 would be unusually fast. We would investigate whether the measurement is over-counting before celebrating.
Reaching "Recommended" (56-75) in 30 days from a 5-starting baseline would be impossible without something else changing simultaneously (a major news event, a viral product launch, a high-authority listicle landing in the same window). We do not expect that.
If the score moves dramatically, we will dig into whether something else changed (a competitor went down, a major listicle landed, a model retrained). We will not let "the score moved a lot" stand alone as proof. The methodology only works if we are honest about confounders.
This case uses the same protocol described on the methodology page. Same query generation. Same definition of what counts as a win. Same engine panel for ongoing measurement (the four-engine baseline panel was used for the day-zero snapshot for speed; the full seven-LLM panel runs nightly during the case). Same scoring math.
If you spot a difference between this case and the protocol, that is a bug to report, not a feature. Email and we fix the page.
Five-day free trial. Day-zero baseline runs immediately. The 30-day plan is yours.
Start a baseline