Pricing Experiments & A/B Testing: What Actually Moves Revenue

We ran a pricing A/B test that showed a 23% conversion lift with 95% statistical significance. Three months later, revenue was down 8%. The test was right, but we were measuring the wrong thing. Here's what actually matters in pricing experiments.

We ran a pricing A/B test that showed a 23% conversion lift with 95% statistical significance. The product team celebrated. The marketing team celebrated. We rolled out the new pricing to 100% of traffic and waited for revenue to explode.

Three months later, revenue was down 8%.

The test was statistically valid. The results were clear. But we'd optimized for the wrong metric. We increased conversion rate by attracting customers with lower willingness-to-pay, which meant more customers at lower average revenue per customer. We grew our customer count and shrunk our revenue.

I spent the next year learning that pricing experiments are the most dangerous type of A/B test you can run. Get the test design wrong and you'll confidently implement changes that destroy your business model.

Why Pricing A/B Tests Are Different From Feature Tests

Most product teams run A/B tests constantly. Test a new button color, measure click-through rate, roll out the winner. The downside risk is low—worst case, the new button performs slightly worse and you revert.

Pricing tests don't work this way. The metrics you can measure in a two-week test (conversion rate, signup rate, activation rate) are often inversely correlated with the metrics that actually matter (revenue, customer lifetime value, retention).

Our disastrous pricing test proved this. We tested two pricing models:

Control: $49/month for Pro plan, $149/month for Enterprise Treatment: $29/month for Pro plan, $99/month for Enterprise

After two weeks with 5,000 visitors per variant, the results were clear:

Treatment conversion rate: 8.2%
Control conversion rate: 6.7%
Lift: 22.4%, p-value: 0.003 (highly significant)

We implemented the treatment pricing. Conversion rate increased exactly as predicted. Revenue immediately dropped because we'd cut prices by 40% and only increased conversion 22%.

But it got worse. Over the next three months, we discovered:

Customers acquired at $29/month churned at 2.5x the rate of $49/month customers
$29/month customers expanded to Enterprise at 1/4 the rate of $49/month customers
Customer support costs were identical, which meant per-customer profitability was destroyed

We'd attracted price-sensitive customers who had lower lifetime value, higher churn, and worse expansion rates. The test told us we'd win on conversion rate. It didn't tell us we'd lose on every metric that actually mattered.

The Metrics That Actually Predict Revenue Impact

After running dozens of pricing experiments, I've learned that early-funnel metrics (conversion rate, signup rate) are almost useless for predicting long-term revenue impact. The metrics that actually matter take months to measure.

Customer Lifetime Value (LTV)

This is the only metric that actually predicts whether a pricing change will increase or decrease revenue. But you can't measure it in a two-week test.

LTV = (Average Revenue Per Customer) × (Average Retention Months) × (Expansion Rate)

To measure this accurately, you need at least 6-12 months of customer behavior data. Which means pricing experiments require patience most companies don't have.

The workaround we developed: use proxies that correlate with LTV but can be measured faster.

Activation rate (measured in 7-14 days): Do customers who sign up under new pricing complete onboarding at the same rate? If activation drops, LTV will drop.

Feature usage depth (measured in 30 days): Do customers use advanced features at the same rate? Low-engagement customers churn faster.

Expansion intent (measured in 30-60 days): Do customers add team members, upgrade tiers, or expand usage at the same rate?

Net Revenue Retention cohort analysis (measured in 3-6 months): Are customers in the new pricing cohort retaining and expanding at the same rate as control cohort?

None of these perfectly predict LTV, but collectively they give you directional confidence before you've waited 12 months to know if the pricing experiment destroyed your business.

Customer Acquisition Cost (CAC) Payback

Pricing changes affect how long it takes to recover your customer acquisition costs. If you cut prices but conversion increases enough, CAC payback might improve. If you raise prices and conversion drops, CAC payback might worsen.

The math: CAC Payback (months) = Total Marketing & Sales Cost / Monthly Revenue Per Customer

We ran a pricing experiment that increased our price from $79 to $99/month. Conversion dropped 15%, but CAC payback improved from 8.2 months to 6.1 months because the higher-priced customers generated more revenue faster.

This meant we could afford to spend more on customer acquisition, which unlocked new marketing channels that were previously unprofitable. The "failed" experiment (lower conversion) actually improved our growth because faster payback increased our marketing budget efficiency.

Customer Quality Indicators

Different price points attract different customer segments. The quality of customers you acquire at $19/month is different from customers at $99/month.

Quality indicators we track:

Company size and maturity: Are we attracting enterprise customers or freelancers?
Team size growth: Do customers add team members?
Feature adoption depth: Do customers use advanced features or just basic functionality?
Support ticket volume: Do customers require high-touch support or self-serve?
Payment method: Credit card customers churn faster than invoice-based enterprise customers

We ran a pricing test that reduced our entry-level price from $49 to $29. Conversion increased 30%, but customer quality plummeted. We acquired mostly solo freelancers who never expanded, churned after 3-4 months, and required disproportionate support.

The test looked like a win on conversion metrics. It was actually a disaster when we analyzed customer quality indicators.

The Test Design That Actually Works

After learning these lessons the hard way, I rebuilt our pricing experimentation framework around long-term revenue impact instead of short-term conversion metrics.

Cohort-based testing with extended measurement periods

Instead of testing for two weeks and measuring conversion, we test for 4-6 weeks and track cohorts for 6-12 months.

We assign customers to pricing cohorts based on signup week. Cohort A gets existing pricing, Cohort B gets new pricing. We measure:

Week 1: Conversion rate (directional signal)
Week 2: Activation rate (quality signal)
Week 4: Feature usage depth (engagement signal)
Month 3: First expansion rate (LTV signal)
Month 6: Retention and NRR (LTV confirmation)
Month 12: Full LTV analysis (ground truth)

This means pricing experiments take a year to fully validate. But it prevents catastrophic mistakes like our first experiment.

The key insight: you should use early metrics (conversion, activation) to make go/no-go decisions quickly, but you validate those decisions with later metrics (retention, expansion, LTV).

Stratified sampling by customer segment

Not all customers respond to pricing the same way. Enterprise buyers care about different things than SMB buyers. Stratifying your test ensures you understand segment-specific impacts.

We segment every pricing test by:

Company size (1-10, 11-50, 51-200, 200+ employees)
Use case (self-serve vs. sales-assisted)
Geography (US/EU/APAC—willingness to pay varies significantly)
Traffic source (organic vs. paid—different intent levels)

We ran a pricing test that showed 15% conversion lift overall. When we stratified by segment, we discovered:

SMB (1-10 employees): +35% conversion, -20% LTV
Mid-market (11-50): +8% conversion, +12% LTV
Enterprise (51+): -5% conversion, +45% LTV

The overall results were misleading. The new pricing was great for mid-market and enterprise, terrible for SMB. We implemented the new pricing for mid-market+ only and kept old pricing for SMB.

Segment-specific pricing strategies outperformed one-size-fits-all pricing by a huge margin.

Control for external factors

Pricing tests are vulnerable to external factors that corrupt your results. Seasonality, marketing campaigns, competitor moves, and economic conditions all affect conversion independent of your pricing changes.

We ran a Black Friday pricing test and saw a 40% conversion lift. We were celebrating until someone pointed out that Black Friday traffic converts 35% better than normal traffic regardless of pricing. The actual pricing lift was maybe 5%, but seasonal factors made it look like 40%.

The controls we implement now:

Run tests for minimum 4-6 weeks to average out weekly seasonality
Avoid running tests during major campaigns or seasonal events
Track competitor pricing changes during test period
Monitor macro conversion rate trends independent of test variants

We also run "A/A tests" periodically—split traffic 50/50 with identical pricing to measure natural variance. This gives us a baseline for how much conversion rate fluctuates without any changes. If your A/A test shows 5% variance, you need pricing tests to show >10% lift to be confident the effect is real.

The Experiments That Actually Moved Revenue

Not all pricing tests are created equal. Some test variables have massive revenue impact. Others are noise.

The tests that actually moved the needle:

Test 1: Annual vs. monthly billing defaults

We tested whether defaulting to annual billing (with monthly billing as an option) would increase annual commitments without hurting conversion.

Control: Monthly billing selected by default, annual as opt-in Treatment: Annual billing selected by default, monthly as opt-in

Result:

Conversion rate: -3% (not significant)
Annual commitment rate: +47% (massive increase)
First-year revenue per customer: +38%
Cash collected upfront: +290%

This single test had more revenue impact than a dozen other pricing experiments combined. We didn't change the price or the product—we just changed the default selection and framing.

The psychological insight: defaults matter enormously. Most customers don't have strong preferences about billing frequency, so they accept whatever you default to.

Test 2: Removing the middle tier

We had three pricing tiers: Basic ($29), Pro ($79), Enterprise ($199). We tested removing Pro to force customers into a binary choice.

Control: Three tiers (Basic, Pro, Enterprise) Treatment: Two tiers (Basic at $39, Enterprise at $199)

Result:

Conversion rate: -8% (concerning but not catastrophic)
Average revenue per customer: +52% (most customers chose Enterprise instead of Basic)
Expansion rate: +35% (customers who started at Enterprise stayed at Enterprise)

We'd been losing revenue to the "good enough" middle tier. Removing it forced customers to choose between cheap and full-featured. Surprisingly, most chose full-featured.

This experiment taught me that more options don't always increase conversion. Sometimes fewer, clearer choices drive better economics.

Test 3: Value metric pricing

We switched from seat-based pricing to usage-based pricing for one customer segment.

Control: $79/month for up to 10 team members, $149 for unlimited Treatment: $49 base + $0.10 per transaction processed

Result:

Conversion rate: +12% (the lower entry price helped)
90-day revenue per customer: +67% (usage-based customers scaled faster)
Customer satisfaction: +22 NPS points (customers felt pricing was fairer)

The right value metric aligned our pricing with customer value delivered. Customers grew usage as they got more value, and we captured that value through usage-based pricing.

The caveat: this only worked because we had the metering infrastructure to support usage-based billing. Without reliable metering, this experiment would have failed.

What Most Pricing Tests Get Wrong

The most common mistake I see in pricing experiments: testing prices without testing packaging, messaging, or positioning alongside the price change.

Price doesn't exist in a vacuum. A $99/month product positioned as "affordable solution for small teams" is different from a $99/month product positioned as "enterprise-grade platform."

We ran a test where we increased price from $79 to $119 without changing anything else. Conversion dropped 18%. We ran the same price increase with updated messaging emphasizing enterprise features, security, and compliance. Conversion dropped only 4%.

The price increase itself hurt conversion. But the repositioning around enterprise value mitigated most of the damage and attracted higher-quality customers.

The lesson: always test pricing + positioning together, not pricing in isolation.

Another common mistake: testing too many variables at once. We ran a test that changed:

Price (from $49 to $79)
Features (added three new features to justify price increase)
Messaging (repositioned around new features)
Trial length (from 14 days to 30 days)

The test showed +15% conversion. But we had no idea which change drove the lift. Was it the lower price? The new features? The longer trial? We couldn't isolate the effect.

Now we test one variable at a time, or use multivariate testing with clean isolation between factors.

The Uncomfortable Truth About Pricing Experiments

Most pricing experiments fail. Not because the methodology is wrong, but because companies aren't willing to wait long enough to measure what actually matters.

Your CEO wants to know if the pricing change worked within two weeks. But LTV takes 12 months to measure. CAC payback takes 6 months. Retention curves take 9 months to stabilize.

The companies that succeed with pricing experimentation are the ones with patient leadership who understand that optimizing for conversion rate is optimizing for the wrong thing.

If you're running pricing tests and measuring success based on conversion rate after two weeks, you're almost certainly making your business worse while feeling data-driven about it.

The real metrics that matter—LTV, retention, expansion, customer quality—take months to measure and require cohort analysis across multiple customer segments.

If you're not willing to wait six months to validate a pricing change, you shouldn't be running pricing experiments. You're just gambling with your revenue model and using statistics to convince yourself it's science.

The best pricing changes I've ever made looked like failures in the first month and wins by month six. The worst pricing changes I've made looked like wins in the first month and disasters by month six.

Pricing experimentation is about patience and discipline. The data will tell you the truth—but only if you're willing to wait long enough to measure the metrics that actually matter.

Platform

Build & Document

Plan & Execute

Analyze & Optimize

See Segment8 in Action

Product Marketing

Leadership

Operations & Research

Not sure where to start?

Pricing Experiments & A/B Testing: What Actually Moves Revenue

Why Pricing A/B Tests Are Different From Feature Tests

The Metrics That Actually Predict Revenue Impact

Customer Lifetime Value (LTV)

Customer Acquisition Cost (CAC) Payback

Customer Quality Indicators

The Test Design That Actually Works

Cohort-based testing with extended measurement periods

Stratified sampling by customer segment

Control for external factors

The Experiments That Actually Moved Revenue

Test 1: Annual vs. monthly billing defaults

Test 2: Removing the middle tier

Test 3: Value metric pricing

What Most Pricing Tests Get Wrong

The Uncomfortable Truth About Pricing Experiments

Kris Carter

More from Pricing & Packaging

Ready to level up your GTM strategy?