In the last few years, it seems like everything has been about AI — LLMs in particular. If you’re at a loss about why, I asked one of the most advanced LLMs out there and it mentioned:
Here’s the tricky part though: LLMs are, at their core, very sophisticated pattern matchers. They can learn how grammar works and string together natural sentences that never appeared in their training corpus, but they have trouble with originality. They also have trouble following exact instructions, since their programming is nondeterministic1. When we ask tasks of a human, sometimes we want creativity and sometimes we want the person to do exactly as they’re told. LLMs, by design, can never fully embody either extreme. This balance is governed by a setting called temperature — a higher temperature makes responses more creative, while a lower one makes them more predictable2.
As LLM usage exploded, some speculated that it might replace search engines. But now, thanks to extensive real-world usage and expert feedback, a consensus has emerged: LLMs can augment search, but they won’t replace it. Why?
RAG, short for Retrieval-Augmented Generation, bridges the gap by combining:
Optimized search engines are precise but struggle to be flexible, and LLMs are flexible but struggle to be precise. RAG is the in-between, the approach that gives an LLM access to retrieve structured data from a precise datasource, data which then can be used to augment the generated output. Any time you want an LLM to use real data, this is the tool to reach for because RAG doesn’t guess.
In our case, we can let the LLM run real searches using an established and optimized engine like Algolia, then use those results to augment the output in a contextually meaningful way. This is the idea behind one of Algolia’s latest tools: the Agent Studio.
Agent Studio is a turnkey RAG-as-a-Service solution designed to bring generative AI capabilities into your search experience without the high latency or cost of real-time LLM calls.
The toolkit lets LLMs use Algolia indices (like product catalogs, blogs, guides) as retrieval sources for context-aware responses.
Agent Studio is a turnkey RAG-as-a-Service platform that brings generative AI capabilities directly into your search experience, except without the high cost and latency of making live LLM calls for every user interaction. It lets you store, retrieve, generate, and serve high-quality content at scale while maintaining flexibility and brand control. At its core, the toolkit allows LLMs to query Algolia indices (like product catalogs, blogs, help docs, or reviews) as structured retrieval sources. The LLM’s grounded, context-aware responses are then stored with Algolia.
What does the Agent Studio do exactly?
Your LLM can retrieve from multiple sources, not just product records. You can index structured data (like a product catalog), semi-structured data (like internal help guides), or even unstructured content (like blog posts, user reviews, press releases). Wherever you end up using the generated responses, it can pull from anything that a human customer service agent could. That means it doesn’t matter anymore if all the needed information is scattered through random files and notes — buying guides, dynamic FAQ sections, even real-time context-aware chatbots are all possible because the AI has access to everything it needs to answer any relevant question. See how Algolia uses unstructured review content to drive better relevance.
You can pre-generate and store LLM responses, which lets you reuse them across users and sessions. This is especially useful for personalizing previously-generic content, like article headlines, section summaries, product descriptions, and buying guides. Typically LLM calls are expensive and slow, and since calls that use RAG tools count the RAG boilerplate as tokens too, those end up being especially expensive and slow when used repeatedly. Already having some of those pieces in place makes any on-the-fly dynamic generations more consistent, inexpensive, and quick to load. The LLM calls whose results we can persist are made only once or only periodically so you avoid paying repeatedly for the same answers. See how to save LLM responses in the docs.
Running an LLM directly from the frontend would expose your secret API key to whatever service you’re using. The only ways around this would be to either (1) run the LLM entirely on the frontend using WebAssembly (something which only became possible recently, and is still rather implausible for production use) or (2) to create a backend proxy that generates the prompts, protects against prompt injection, responds with any data the LLM requests, and keeps your API keys secret. Algolia’s Agent Studio handles the second option for you by running all those secure backend operations on your behalf. See how prompts are managed on the Algolia backend in the docs.
Using the easy SDK, you can collect user feedback through vote
events. Users can upvote or downvote generated results — that continuous tuning lets you see in your analytics pipeline what pieces of generated content are working well and which need to be regenerated. This feature comes out-of-the-box with the Algolia Agent Studio. Check out the full voting API in the docs.
RAG grounds LLM-generated content in real data… but that only works if you trust the data. Any vulnerability that could compromise the data defeats the purpose of using RAG in the first place, and including user input in your prompt is a big vulnerability. Algolia bakes in multiple safety layers out-of-the-box to mitigate this risk.
Malicious tokens or formatting can be included in user input if someone really wants to mess with our responses. Since we’re reusing stored responses, the risk that an infected response spreads to other users’ screens would be high if it weren’t for Algolia’s rigorous sanitization procedures. Algolia’s Agent Studio normalizes user input before submission to LLMs to mitigate those risk.
Beyond user input that can break your LLM’s responses, users can also get the LLMs to say inappropriate things. That’s where Algolia’s built-in moderation tools come in, flagging and filtering outputs with harmful, unsafe, or even just off-brand content. Custom rules let you define what types of content are never allowed in responses.
We use red team prompting to proactively test our generative AI systems against known prompt injection, jailbreak, and adversarial input scenarios. This makes sure that the Algolia Agent Studio remains resilient to malicious or unintended behaviors before those issues reach production environments.
The toolkit also promotes responsible, scalable LLM usage by keeping track of usage logs. Those logs contain valuable data that helps Algolia protect you against abuse and unexpected traffic spikes. They also give you the transparency, traceability, and compliance support that’s crucial before you can deploy in a production environment confidently.
Using this new technology, let’s see how we can add genuinely useful genAI-driven features to a demo ecommerce website. These instructions are simplified, but you can check out a more complex and customized production version at spencerandwilliams.com.
In the Generative AI > Guides window in the Algolia dashboard, you can choose to create a new shopping guide. This kicks off a process in the dashboard that creates your RAG setup behind the scenes. You get to choose the category of products that the shopping guide describes, as well as the tone and language of the AI’s response.
Once the LLM looks at our products and begins to write a meaningful guide, we can customize it to our specific brand voice.
It’ll offer a few topic ideas and then generate full articles for the ones you choose.
As for the coding, let’s start by building a little wrapper hook to encapsulate the logic. We’ll need the
@algolia/generative-experiences-api-client
package to create our API client and the@algolia/generative-experiences-react
package to fetch the guides. We’ll also need some built-in hooks to trigger the parent components to rebuild when the guides are fully fetched.
import { createClient } from '@algolia/generative-experiences-api-client';
import { useGuidesHeadlines } from '@algolia/generative-experiences-react';
import { useEffect, useState } from 'react';
We can use the createClient
function we imported to get a client set up. That client is what every other Algolia hook or component will use to fetch data.
const client = createClient({
appId: "ALGOLIA_APPLICATION_ID",
indexName: "ALGOLIA_INDEX_NAME",
searchOnlyAPIKey: "ALGOLIA_API_KEY",
region: 'us'
});
Then comes our first custom hook, which just wraps the hook from the Algolia library, except that it passes in the client we made and pulls out just the headlines
from the response.
export const useGuides = (category: string) => {
return useGuidesHeadlines({
client,
category,
showImmediate: true
}).headlines;
};
Our next custom hook gets the products from our index. It filters the results to only those with a category that matches the one passed into the hook and asynchronously stores the results in a state variable.
export const useProducts = (category: string) => {
const [products, setProducts] = useState<any[]>([]);
useEffect(() => {
client.searchSingleIndex({
indexName: 'ecommerce_ns_prod',
searchParams: {
filters: `hierarchicalCategories.level${(category.match(/>/g) || []).length}:"${category}"`
}
}).then(response => {
setProducts(response.hits)
})
}, [category]);
return products;
};
Now that the fetching logic is in place, we just have to design the display components.
const GuideHit = ({ hit }) => (
<div
className="hit guide"
>
<img
src={`IMAGE_URL/${hit.objectIDs[0]}.jpg`}
/>
<h3>{hit.title}</h3>
<p>{hit.description}</p>
<button>Read the article</button>
</div>
);
const ProductHit = ({ hit }) => (
<div
className="hit product"
>
<img
src={`IMAGE_URL/${hit.image}`}
alt={hit.name}
/>
<h3>{hit.name}</h3>
<button
className="hit-add-to-cart"
>Add to cart for ${hit.price.toFixed(2)}</button>
</div>
);
Throw in a little CSS and those will look great. Inside a new ProductCarousel
component, we load in our new data:
import { useGuides, useProducts } from "../hooks/carouselHooks";
export const ProductCarousel = ({ category }) => {
const guides = useGuides(category);
const products = useProducts(category);
...
};
Then we create a new array called hits
that’s mostly the products from this category, with one of the guides inserted near the beginning. To make sure only gets reevaluated if our guides
or products
arrays change, I’ll wrap it in the useMemo
hook.
import { useMemo } from "react";
export const ProductCarousel = ({ category }) => {
...
const hits = useMemo(
() => [
...products
.slice(0, 2)
.map(product => <ProductHit hit={product} key={product.objectID} />),
...guides
.slice(0, 1)
.map(guide => (
<GuideHit hit={{
...guide,
isGuide: true
}} key={guide.objectID} />
)),
...products
.slice(2)
.map(product => <ProductHit hit={product} key={product.objectID} />)
],
[guides, products]
);
...
};
Now to wrap it all up in a carousel so it looks nice. I decided to use the react-multi-carousel
library since it seemed fairly point-and-click.
import Carousel from "react-multi-carousel";
import "react-multi-carousel/lib/styles.css";
export const ProductCarousel = ({ category }) => {
...
return (
<Carousel
responsive={{
desktop: {
breakpoint: { max: 3000, min: 1024 },
items: 5
},
tablet: {
breakpoint: { max: 1024, min: 464 },
items: 3
},
mobile: {
breakpoint: { max: 464, min: 0 },
items: 2
}
}
}>
{ hits }
</Carousel>
)
};
All put together, our prototype looks like this:
Right inside the product carousel, we’re inserting generated content that informs the reader’s choices. But that content isn’t random, it directly relates to the products we’re trying to advertise to the customer because we gave all that information to the LLM in advance. With a little more styling, in a production setting, here’s what it looks like on spencerandwilliams.com:
1. “Nondeterministic” just means that given the same inputs, the program will not always produce the same outputs. There is some inherent quasi-randomness in the algorithm. For more information, see this Wikipedia article.
2. For technical detail about why setting the temperature of an LLM to zero doesn’t do what you’d think it would do, see this thread from OpenAI.
Patrick Loomis
Senior Manager, Product Management