The secret sauce to growing search KPIs: A/B testing

Listen to this blog as a podcast:

Algolia powers nearly 1.8 trillion queries per year. Customers depend on Algolia to deliver the best results — the ones that drive signups and conversions. However, what the "best results" are really depends on your goals and KPIs. Instead of letting our retrieval engine be a “black box”, we give every customer the built-in analytics to continually improve and test their setup. A/B testing is a great way to understand how configuration changes impact relevance, but it’s treated like a second-class citizen in most of the retrieval systems out there: even the best implementations are just a binary checkbox that compares two variations of a single parameter without making any claims about the statistical significance of the results.

That narrow, shallow approach just doesn’t match the complexity of modern search: a web of ranking signals, semantic models, personalization layers, and merchandising rules all interacting to shape the final result set. Other A/B testing frameworks for search give you a low-resolution view of a high-complexity problem — only with a deeply-integrated A/B testing framework like Algolia’s can users actually validate the changes that lead to better KPIs. Algolia’s A/B testing isn’t limited to toggling a single setting either, since it can evaluate any set of adjustments across the entire retrieval stack.

Why A/B testing matters

Like baked goods, retrieval performance is the output of a carefully written recipe. If any of the ingredients are off, the final result isn’t going to turn out well. So how do we arrive at the right ingredients? The recipe developer has to test all the different configurations. That’s why A/B testing is the gold standard in retrieval optimization: it’s the semi-automated process that settles on the best settings for your application. Without it, it’s highly unlikely that intuition alone will lead you to the optimal config. Even testing it yourself is going to lead to subpar results, since you won’t be able to replicate all the edge cases of heavy-tail query behaviors, extreme engagement spikes, and interactions between ranking signals, models, personalization, and merchandising. The A/B testing rule of thumb: rigor reduces risk.

Algolia’s A/B testing capabilities can give your team a leg up in optimizing search results that maximize KPIs

Algolia’s A/B testing reintroduces that rigor by producing reliable comparison metrics even when traffic is uneven. Statistical analysis isolates and removes outliers that would skew results, leading to trustworthy results despite messy traffic in production. Every combination of configurable settings is testable too, letting teams run multi-dimensional experiments that actually test realistic hypotheses. Because these experiments are integrated right into Algolia, the mess of deployment, traffic routing, and aggregation are handled for you, right out of the box. Success isn’t measured by arbitrary metrics, but instead by the KPIs you’re already tracking: conversion rates, click-through behavior, product discovery depth, revenue per session, and overall customer satisfaction.

That’s the value: statistical rigor and unrestricted scope set Algolia’s implementation of A/B testing apart. Other implementations let you jot down notes about individual ingredients; Algolia lets you validate the whole recipe. Our clients and partners consistently report this as one of the biggest differentiators in why they chose Algolia. Here’s what their successful implementations look like:

Experiments in A/B testing

Let’s walk through three specific ways that production teams use Algolia to move beyond simple feature toggling. For each, we’ll identify the hypothesis, the setup of our A/B test, the lesson learned, and the actionable steps forward.

Conversions vs. revenue

Hypothesis: Optimizing for conversions alone is cannibalizing our high-ticket revenue.

In this scenario, we’re concerned there’s a possibility that we’re just conflating success with conversions. Five $100 headphone sales should be valued above 20 $2 sticker sales, even though it technically represents fewer conversions. We’ll set up the test like this:

Variant A: NeuralSearch optimized for Conversion Rate (CVR). The model prioritizes items with the highest probability of being purchased (high-velocity, low-cost items like socks or accessories).
Variant B: NeuralSearch optimized for Revenue (RPS). The model weighs the price point alongside the probability of purchase, favoring items that move the needle on the bottom line.

What we learn: You’ll find the sweet spot between volume and margin. Algolia provides confidence intervals for Revenue per Search, allowing you to see if the revenue lift in Variant B is statistically significant or if the lower conversion volume makes it too volatile to trust. This will entirely depend on your specific product catalog, so the only real way to know is to test it.

Next steps: If you’re seeing a statistically significant result, switch from optimizing for conversions to revenue.

Personalization depth

Hypothesis: Aggressive personalization is limiting how much users can discover products outside of their typical domain, causing long-term churn.

Personalization is a double-edged sword. Too heavy and the user only sees what they already looked at; too light and the results feel generic. We’ll set up the test like this:

Variant A: Standard ranking with low-weight personalization. Results are driven primarily by global popularity and keyword relevance.
Variant B: Ranking with heavy-weight personalization (user affinity for specific brands, colors, or price tiers).

What we learn: If Add-to-Cart events stay roughly even or increase for variant B, then a decrease in Average Click Position (ACP) is a great thing. People are finding what they want, and they’re finding it faster and higher up in the search results. But if Add-to-Cart events go down, then a decrease in ACP might signify that the personalization is narrowing the options the user sees, leading them to buy less.

Next steps: Set up a more in-depth A/B test to find out exactly the right balance of personalization, then leave it set there.

Manual or dynamic merchandising

Hypothesis: Our manual custom ranking based on stock levels and profit margin is more effective than an AI model that optimizes for popular trends.

This is the viewpoint of a lot of Algolia customers who haven’t yet tried Dynamic Re-Ranking. Especially if they’ve been with us for a long time, they might have spent significant effort tuning their custom ranking algorithm. We’ll set up the test like this:

Variant A: Your tuned Custom Ranking. This variant relies on the business logic you’ve configured, perhaps prioritizing products based on margin, inventory_count, or a global_popularity_score updated weekly.
Variant B: Dynamic Re-Ranking enabled. Algolia’s AI analyzes real-time click and conversion data to automatically "boost" items that are trending for specific queries.

What we learn: You discover whether your intuition can keep pace with real-time shopper intent. If Variant B shows a lift in Click-Through Rate (CTR) and Conversion, it proves that while your business logic is a solid foundation, it’s too static to capture "micro-trends" or shifts in user behavior that happen between your manual updates. However, if Variant A holds its ground on Revenue per Session, you’ve validated that your custom ranking is successfully protecting your bottom line against the "noise" of popular-but-cheap trends.

Next steps: If Dynamic Re-Ranking wins, you can use it as a final re-ordering layer on top of your existing custom ranking. Use Re-Ranking to provide the "finishing touch" on the first few results while keeping your inventory-based logic as the foundational tie-breaker. This combines the best of both worlds: business-driven strategy and AI-driven agility.

Get experimenting for free

These are just a few test ideas to get you started, but there's so much more you can do. If you can measure it, you can test it. Stop guessing and start proving what works for your unique audience by launching your first experiment today.

If you’re already sending click and conversion events to Algolia, you have everything you need to move beyond static configurations and into a data-driven search strategy. With Algolia’s integrated A/B testing framework, you can validate complex hypotheses with the statistical rigor required to make confident business decisions.

Head to the A/B Testing tab in your dashboard to set up a test in minutes; when the barrier to entry is this low and the potential for compounding ROI is this high, the only real mistake is not testing at all.

Learn more about Algolia’s A/B testing: