RiteKit — Day-Zero Baseline 4.55%

Live 30-day case study. Baseline ran on 2026-05-09 across 44 conversations on the four-engine baseline panel. Two cites total. Top competitors named: Buffer, Hootsuite, Sprout Social. The Autopilot run is live. Final score lands June 8, 2026.

The brand

RiteKit is a social media toolkit, founded by the same operator who built MentionFox. Years of product, modest organic SEO, no historical GEO work. That makes it a clean test brand: a real product with real users but a low AI-visibility floor, exactly the cohort that needs Autopilot the most.

RiteKit was chosen specifically because we knew the baseline would be low. A case study that started at 60 and ended at 65 would prove almost nothing. A case that starts at 5 and ends wherever it ends after 30 days of Autopilot is informative either way.

Day 0 baseline — the receipts

2026-05-09 baseline run

Total conversations

Cites

Raw cite rate

4.55%

Score band

Invisible

Per-engine breakdown

Engine	Conversations	Cites	Cite rate
ChatGPT	11	2	18.18%
Gemini	11	0	0%
Claude (panel only)	11	0	0%
Perplexity	11	0	0%

The day-zero baseline used the four-engine validation panel (the simplified set used to seed the case study). The full seven-LLM scoring panel kicks in for the daily Autopilot run starting day 1.

Who won the queries instead

The same 44 conversations surfaced these names, in order of cite count:

Buffer

21 cites

Hootsuite

20 cites

Sprout Social

19 cites

Later

15 cites

SocialBee

9 cites

Reading: AI engines are confidently recommending the established players in the social media tool category. RiteKit appears 2 times. Buffer appears 21. The gap is content depth, accumulated citations, and category-leader presence in the training data — not product quality.

The 30-day plan, published before the run

The plan below was written and committed on day 0. No mid-run changes. If the result is bad, the plan is bad — we will not retroactively edit it.

Days 1-7: Generate and publish 7 long-form articles to the shadow site, one per day. Each article targets a query category where RiteKit lost in the baseline. Examples: "best Instagram caption tools", "AI hashtag generator for marketers", "social media auto-poster for solo founders". Each article publishes at a slug RiteKit owns and ships with full schema markup.
Days 1-30 (continuous): Active conversation training on Gemini Flash and DeepSeek (the two engines with 0 cites in the baseline). Two multi-turn conversations per day per engine, each surfacing one differentiator (the AI hashtag engine, the auto-publish API, the years of training data on social copy).
Days 8-21: Generate and publish 8 comparison articles. Each article is a head-to-head: RiteKit vs Buffer, RiteKit vs Hootsuite, etc. Comparison articles ground engines on the head-to-head distinctions specifically.
Days 22-30: Lock content and observe. No new ships. Daily measurement runs and the score should stabilize.
Day 30: Final panel run. Result published whether it moved or did not.

Live timeline

Day 0

2026-05-09: Baseline complete. 44 conversations, 2 cites, 4.55% raw cite rate. Top 5 competitors logged. Plan published.

Day 1

2026-05-10: Autopilot configured. First conversation training queued for Gemini Flash and DeepSeek. First long-form article in queue: "best Instagram caption tools".

Day 7

[Day 7 placeholder — lands 2026-05-16. Expected output: 7 articles published, 14 conversations completed, first signal of lift on the highest-volume engines.]

Day 14

[Day 14 placeholder — lands 2026-05-23. Mid-run measurement. Score and engine breakdown logged.]

Day 21

[Day 21 placeholder — lands 2026-05-30. Comparison articles complete. Late-stage trend visible.]

Day 30

[Day 30 placeholder — lands 2026-06-08. Final panel run. Result published whether it moved or did not.]

How we ensured this case is not cherry-picked

Case studies are easy to fake. The most common pattern is to run a measurement, hide the runs that did not move, and publish the one that did. We reject that pattern. Specifically:

The brand was named before the baseline. RiteKit is the explicit subject of this case. We did not run ten brands and pick the one that moved.
The baseline is on this page. 4.55% cite rate, 2 cites in 44 conversations. We cannot quietly raise that number on day 30.
The plan is on this page. Published day 0. If we change it, the change shows on this page with a strikethrough and a date.
The day-30 result will be published whether it moved or not. If the score went from 5 to 12 (a real but modest move), we will publish 12. If it went from 5 to 5, we will publish 5 with a post-mortem on what the plan got wrong.
The methodology is the same as for every other brand. The methodology page describes exactly how scores are computed. We did not invent a custom methodology for this case.

If the case fails: we will publish a post-mortem section on this same page, called out at the top, explaining what we got wrong. We do not delete failed case studies.

What success would look like

For this category (social media tools, with established incumbents like Buffer and Hootsuite), a realistic 30-day target is to move from "Invisible" (0-15) into "Mentioned" (16-35) on the four-engine score. That would mean RiteKit is being named more frequently and on more engines than the day-zero state, but is not yet a default recommendation.

Reaching "Considered" (36-55) on a 30-day run for a brand that started at 5 would be unusually fast. We would investigate whether the measurement is over-counting before celebrating.

Reaching "Recommended" (56-75) in 30 days from a 5-starting baseline would be impossible without something else changing simultaneously (a major news event, a viral product launch, a high-authority listicle landing in the same window). We do not expect that.

What we will not claim

If the score moves dramatically, we will dig into whether something else changed (a competitor went down, a major listicle landed, a model retrained). We will not let "the score moved a lot" stand alone as proof. The methodology only works if we are honest about confounders.

Methodology — the same as every brand

This case uses the same protocol described on the methodology page. Same query generation. Same definition of what counts as a win. Same engine panel for ongoing measurement (the four-engine baseline panel was used for the day-zero snapshot for speed; the full seven-LLM panel runs nightly during the case). Same scoring math.

If you spot a difference between this case and the protocol, that is a bug to report, not a feature. Email and we fix the page.

Run the same protocol on your brand

Five-day free trial. Day-zero baseline runs immediately. The 30-day plan is yours.

Start a baseline