Squeezing search relevance out of everything, even product reviews

Building a good search feature on your ecommerce site is a lot like baking cookies. To make the most delicious and consistent cookies — and also to make as many as possible — you can’t let any dough cling to the edges of the bowl and go unused.

In the same way, to make the most useful, consistent, and revenue-generating search tool, we can’t let any morsels of relevance “stick to the bowl” as it were. Every opportunity to scrape down some dough by extracting meaningful conclusions from our existing data needs to be capitalized on for our search to meet its potential.

One opportunity that is often completely ignored is all the useful information in the unstructured part of your product records. Take product reviews; do they contribute to your search relevance? Usually, they’re completely unstructured and hard to parse. If you try to extract keywords from reviews, you won’t usually get what you’re expecting because of the unpredictability of human language. (For example, someone might post “I threw this phone away and got an iPhone instead. One star” and a naive algorithm would think the product is an iPhone.)

So how could we take something so finicky as product reviews and get some meaningful, structured data out of it? That’s one thing that generative AI is actually really good at. Imagine if we could just ask ChatGPT to scan over the product reviews, then look at the info we already have about the product, and lastly generate an array of keywords that represent what the users captured about the product that our current searchable attributes (like title and description) didn’t.

Why would this improve relevance? Well, imagine a customer is specifically searching for rugs that work well for families with dogs. Unless your brand specifically caters to this niche, it’s unlikely that your displayed product descriptions are stuffed with keywords like “dog-safe”. However, your users might have made connections like that for you:

A four-star review that says “Perfect for Kylo - I was afraid because carpets like this usually trap smells and stains and get shredded easily, and my new pup loves making messes. But its been very durable and he hasn't been able to tear it up or anything. He tracked in mud the other day and it all wiped right off. It's a little expensive for my taste but hopefully it'll last a long time”

NeuralSearch — our hybrid keyword-and vector search engine — is smart enough to recommend this result for dog-related queries if we can generate some adjacent keywords. Tags like “pet-friendly” are semantically similar enough to our target that NeuralSearch can connect products with those tags to a query for “dog-proof carpets”.

Building a transformation function

Let’s see if it’s tough to build out some logic like this. The first step would be to paste in the typical OpenAI boilerplate. I built this with the OpenAI JavaScript SDK originally, but we’re going to use this code in a special environment in a moment, so we’ll stick to the fetch API. Here’s what that looks like:

const askGPT = async (content, json_schema) => {
  try {
    const response = await fetch(
      '<https://api.openai.com/v1/chat/completions>', 
      {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'Authorization': 'Bearer ' + OPENAI_API_KEY
        },
        body: JSON.stringify({
          model: "gpt-4o-mini",
          messages: [{
            role: "system",
            content
          }],
          response_format: {
            type: "json_schema",
            json_schema
          }
        })
      }
    );

    if (!response.ok) {
      const errorData = await response.json();
      throw new Error(errorData.error?.message);
    }

    const data = await response.json();
    return data.choices[0]?.message?.content;
  } catch (error) {
    console.error('Error calling ChatGPT API:', error);
  }
};

A few things to note:

This example assumes we’ve already set a global constant holding our OpenAI API key. More on that in a bit.
This function takes in the prompt content and spits out the content of the response — the extra lines are mostly error handling.
The other parameter of this function is a JSON schema that the LLM’s response must adhere to. This makes it easy to JSON.parse the output of this function and extract structured data.

Let’s design a function that takes in some searchable attributes on our product record, plus the product reviews, and asks the LLM to output an array of keywords that describe the product but aren’t covered by the already-searchable attributes.

const prompt = (title, description, reviews) => (
`This is a product on an ecommerce website:
${title} | ${description}

Look through the reviews for this product and generate a short array of keywords that describe the product in ways not described in the product title or description. The keywords are for a search engine, so they should be somewhat similar to keywords used in search queries. Here are the reviews:

${reviews.join("\\n")}`
);

I’m shirking the normal indentation rules so the prompt doesn’t have extra tabs that could throw off the consistency of the output.

We also need to define the schema the LLM’s output should follow.

const schema = {
  name: "review_keywords",
  schema: {
    type: "object",
    properties: {
      review_keywords: {
        type: "array",
        description: "An array containing strings, each one containing a keyword that describes the product in a way that the reviews do, but the product title or description doesn't.",
        items: {
          type: "string"
        }
      }
    },
    additionalProperties: false,
    required: [
      "review_keywords"
    ]
  }
};

Or in other words, the output should look like this:

{
	"review_keywords": [
		"keyword 1",
		"keyword 2",
		...
	]
}

Now to put it all together, we’ll create a transform function that takes in a whole product record and returns the same record supplemented with the review keywords.

const transform = async (record, helper) => ({
  ...record,
  review_keywords: JSON.parse(
    await askGPT(
      prompt(
        record.title,
        record.description,
        record.reviews
      ),
      json_schema
    )
  )
    .review_keywords
    .map(x => x.trim())
});

Testing in Node.js, this seems to work really well. I first tried with a real rug product and only my three fake reviews (which indirectly implied it was good for dog households without mentioning “dog” directly). It was able to identify “pet-friendly” as a good review-based keyword, as well as several other keywords dog owners would likely search like “washable” and “non-absorbent”. After adding in most of the real reviews for that product (which contained 6 reviews that mentioned dogs positively and directly), the added keyword set definitely reflects the general customer view:

[
  'dog-proof',
  'easy to maintain',
  'comfortable',
  'high-quality',
  'stylish pattern',
  'durable',
  'vibrant colors',
  'perfect for high traffic',
  'beautiful design',
  'indoor-friendly',
  'lightweight',
  'washable',
  'timeless design',
  'great price',
  'soft underfoot',
  'no chemical smell',
  'modern style',
  'breathable fabric',
  'holds its shape',
  'creates a cozy atmosphere'
]

Customers read reviews because they want to know if other similar customers also enjoyed the product. Highlighting products with relevant reviews like this makes your customers feel like they’re being listened to. But how can we actually weave this into our production data pipeline?

Algolia Data Transformations

Many of our customers find themselves wanting to automate the data input pipeline. Usually, it’s because their data is primarily stored elsewhere, with Algolia just being an index of the searchable attributes.

Seeing this need, we built Data Transformations, a whole automated pipeline setup from your data source to your search engine. Natively from Algolia, you can continuously import data from almost any tool that will either (a) let Algolia know when the data changes, or (b) let Algolia check back on a schedule to see if the data has changed. I personally can’t think of any tool that doesn’t fall into one of those two categories. Often this doesn’t even require any code because Algolia has partnered with the data source developers to create connectors that keep it all in sync for you. Take a look:

Algolia has tons of connectors available stock

Wherever your data comes from, once its brought into the standard pipeline by a connect, you can modify it consistently at the record level. Just bring up a Transformation and paste in our logic from earlier, passing the helper object around between functions so we can pull the OpenAI API key from it à la helper.secrets.get("OPENAI_API_KEY").

Note that the fetch functions only works on Premium plans.

You’ll start seeing the magic happen when those new review_keywords start flowing into the index without any extra babysitting. Once the transformation is saved, every fresh review that lands in your primary database is automatically evaluated, distilled, and appended to the product record before any customer even gets to read it. In other words, your search relevance improves every time a customer clicks “Submit review” with almost zero overhead or maintenance.

How does this scale?

Good question. Here’s some points to keep in mind:

The majority of the reviews for any given product just express in natural language the customers’ feelings about the product in simple positive or negative terms. (Think reviews that just say “Loved it! 5 stars” or “terrible”.) Reviews that don’t contain specific information don’t help potential customers beyond the social proof that the average star rating also gives them — they just take up a lot more space. Of course, we could incorporate that star rating into our Custom Ranking equation since its far more concise and quantitative. But since only a few out of a hundred reviews might be specific enough for the LLM to extract meaning from, it makes sense that we could scale this up to thousands of reviews without our keyword list becoming too large to index. In fact, in the example we tried above, the 3 reviews out of 160 that mentioned dogs explicitly and the other 3 that hinted at it were enough to make “dog-friendly” or something like it the top keyword extracted.
NeuralSearch is flexible enough to surface results that match “pet-friendly” if we searched for “dog-proof”, so we don’t actually need to identify every single possible keyword. We just need a semantically broad description for each general category of responses, and the LLM can be told that if the number of keywords it generates starts to get out of hand. You could test this every once in a while by running the LLM query on a sample set of reviews, once with an added review that mentions something unique and once without.

With these considerations, we can be confident that our solution will scale nicely. NeuralSearch and the LLM will be able to keep up with whatever our reviewers throw at us.

Search, like baking cookies, rewards those with the foresight to scrape down the bowl. By converting unstructured reviews into structured, search-ready signals and letting Algolia’s Data Transformations pipeline keep the process chugging along, you unlock relevance that would otherwise stay stuck to the sides. The payoff is a richer experience for shoppers, a sneakily-ballooning conversion rate for your business, and a search engine that gets smarter with every customer story it reads.

ABOUT THE AUTHOR