Stop Hiring CSMs. Start Building Systems.

You can't hire your way out of CS scaling. A four-layer operating system for what comes next.

May 05, 2026

At GitHub, the team I built was called Customer Outcomes. We grew it from zero to 120+ people, and we were sophisticated about how we did it. We pre-hired CSMs against modeled growth rather than reacting to whatever sales had just closed. We started a few scaling motions on the side, like expanding Community to cover the long tail of customers we couldn’t staff to. We talked about digital playbooks. But the digital piece didn’t get serious investment until several years in, and the operating model was still fundamentally a hiring model.

By the time that chapter ended, I was pretty sure we’d built the wrong thing.

The pattern was the same every quarter. The model would forecast more accounts. We’d staff CSMs ahead of them. We’d promote managers to handle the new CSMs. Someone would call a meeting about span of control. Every conversation about “scaling CS” ended at the same place: hire.

What we built was a service organization with “Customer Success” written on the org chart.

Most CS orgs scale this way. It’s the default, and the math gets ugly fast. At $150K fully loaded per CSM, every 50 new accounts is another headcount you have to fund. There’s no point at which the operation gets cheaper, only points at which it gets less obviously broken. Eventually finance does the unit economics, and CS becomes the line item that gets a “growth pause.”

The “revenue era of CS” crowd wants to make this worse. Their fix is to turn CSMs into quota-carrying sellers and let expansion revenue cover the cost of the team. I’ve watched that movie. I know how it ends. Trust erodes, champions stop being candid with you, and your NRR flattens within four quarters.

There’s a different play, and the best CS orgs are already running it.

They aren’t hiring faster. They’ve quietly rebuilt the underlying operation so that one CSM does the work that used to take three. Not by working harder. By moving most of the work somewhere else.

I've written about the operational pieces individually: health scores, QBRs, onboarding, and expansion. This post is the architecture that holds them all together.

The Framework: The CS Operating System

Four layers. Each one absorbs work from the layer above. The job of every CS leader is to push every interaction down the stack to the lowest layer that can still handle it well.

Layer 1: Self-Service Infrastructure

Documentation, in-app guidance, quickstart templates, automated onboarding flows, community forums.

In a well-designed CS operation, this layer handles 60 to 70 percent of customer needs without anyone on your team getting involved. Technical buyers prefer it that way. Engineers don’t want to file a support ticket. They want to find the answer at 2am and keep building.

If your CSMs are answering the same five questions in Slack every week, the problem isn’t your customers. It’s that your docs aren’t where your customers expect to find them.

Layer 2: Automated Signal and Response

Event-driven workflows that detect state changes and trigger the right action.

A customer’s deployment velocity drops for two weeks running. Your system flags the state transition, pushes an in-app message with the right resource, and adds the account to a CSM review queue. The CSM only sees the account once a real human decision is required.

This is where the state-based model from the first post stops being a reporting tool and starts running things. The state transitions are the events. The events fire the workflows. CSMs stop spending Mondays manually scanning a portfolio for things that have changed since Friday.

Layer 3: AI-Augmented CSM

Agents that handle research, prep, drafting, and analysis.

A CSM should walk into Monday with a briefing already in their inbox: priority accounts for the week, prep notes for every scheduled meeting, accounts that haven’t been touched in a while, product releases mapped to specific customers in their portfolio. They shouldn’t be spending the first three hours of the week gathering context they’ll only use once.

Good AI augmentation doesn’t try to replace what the CSM is best at. It clears out the work that gets in the way of doing it.

Layer 4: Human-Led Strategic

The 20 percent of interactions that drive 80 percent of the outcome.

Executive alignment. Expansion strategy. Real renewal negotiations. Champion development. Architecture reviews where you’re being asked to commit to a customer’s direction. Complex technical problem-solving that no agent is going to handle credibly anytime soon.

This is what CSMs should actually be spending their week on. Everything else belongs in a layer below it.

Two Ways CS Leaders Get This Wrong

Failure Mode 1: Headcount Theater

You scale CS by adding bodies. The KPI on the leadership dashboard is “headcount per X accounts” rather than “outcomes per CSM.” Every operational gap becomes a hiring requisition. The org chart fills out faster than the retention numbers move, and by the time someone in finance models the cost per retained dollar, you’ve already overhired by twenty percent.

Failure Mode 2: AI Sprawl

The opposite trap, and the one I see more often now. Teams scatter prompts and automations across the org with no architecture behind any of it. Five CSMs are running five different ChatGPT prompts. Two managers built competing dashboards. Nobody actually knows what’s automated, what’s manual, and what was supposed to be automated but broke six weeks ago.

The leadership view is “CS is using AI.” The operational reality is that CSMs are doing more work than before, just with more tabs open.

The Layer Cake fixes both. It tells you what belongs where, and it tells you what work to push down next.

The Prompt Library

Prompt 1: CS Coverage Model Analyzer

Analyze our current CS team structure and identify opportunities to shift
work from human-led to systemized layers.

Current state:
- Number of CSMs: {N}
- Average book of business per CSM: {accounts}
- Current CSM time allocation (estimate %):
  - Onboarding: {%}
  - Recurring check-ins: {%}
  - Renewal management: {%}
  - Escalation handling: {%}
  - Strategic/expansion conversations: {%}
  - Internal coordination: {%}
  - Admin/data entry: {%}

For each activity, recommend:
1. Which layer it belongs in (Self-Service, Automated, AI-Augmented,
   Human-Led Strategic)
2. What it would take to move it down one layer
3. Estimated time savings per CSM per week
4. Implementation complexity (low/medium/high)

Calculate the projected effective book of business per CSM after
systemization, and flag the top 3 highest-ROI moves.

Prompt 2: Automation Opportunity Scorer

Review the following list of recurring CS tasks and score each for
automation potential.

Tasks:
{list of tasks}

For each task, evaluate:
- Frequency (how often does this happen?)
- Consistency (is the process the same each time?)
- Judgment required (does this need human intuition?)
- Data availability (do we have the inputs needed to automate?)
- Customer impact if automated poorly (what's the blast radius?)

Score each 1-5 on automation readiness. Recommend the top 5 to automate
first, ranked by highest time savings with lowest risk. For each
recommendation, specify which Layer Cake layer it should move to and
what infrastructure is required.

Agent Workflow: Weekly CSM Briefing Generator

You are an AI agent supporting a CSM. Every Monday morning, generate a
briefing for the week ahead.

Inputs:
- Account portfolio data: {structured data}
- Calendar for the week: {meetings}
- Recent support tickets: {tickets}
- Product changelog: {recent releases}
- Any flagged state transitions from the previous week: {transitions}

Generate a briefing that includes:
1. Top 3 priority accounts this week and why
2. Prep notes for each scheduled meeting (key context, recommended
   talking points, technical anchors per the customer's stack)
3. Accounts with no scheduled touchpoint that need attention
4. New product releases relevant to specific accounts in the portfolio
5. Suggested time blocks for proactive outreach vs. reactive work

Keep it scannable. No more than 1 page. No filler.

The One Metric That Matters

The number I’d track instead of headcount-to-account ratio is what I think of as effective book of business. It’s the number of accounts a single CSM can manage well, measured by the outcome data that actually matters: gross retention, net retention, time-to-architecture, expansion rate.

Most CS orgs run at 20 to 30 accounts per CSM. With a working version of the Layer Cake in place, the same CSM can manage 60 to 80 without a meaningful drop in retention or expansion for non-strategic accounts. That delta is what an AI-native CS operation actually looks like in practice.

If your effective book has been flat for two years, your stack isn’t compounding. You’ve added tools, not leverage.

Michael Goetz

Discussion about this post

Ready for more?