Haystack US 2025

Earlier this spring, I had the opportunity to attend Haystack US 2025 in Charlottesville, VA—a conference focused entirely on search, relevance, and the expanding role of AI in how we retrieve and rank information. It was two days packed with deep technical content, thoughtful keynotes, and many real-world examples. As with most recent years, the conference leaned heavily into AI, but with a healthy dose of skepticism and pragmatism. We left with our heads full and our notebooks (okay, Google Docs) overflowing.

I compared notes with my colleague Tanya Herman, a Senior Product Manager here at Algolia, to share the highlights that stuck with us, with an eye toward what matters for devs building and scaling search today.

Search isn’t solved, it’s just getting interesting

A recurring theme across several talks is that search isn’t a narrow problem anymore. It’s a whole system, from data ingestion to ranking to user intent modeling. For instance, the Women of Search session on How Great Product Managers Build for Impact reminded us that success often feels invisible—when search is working, no one notices. But as developers and PMs, we must build for that invisible magic anyway.

From the keynote, Rick Hamilton (Focus Ultrasound Foundation) set the tone with Innovation in the Age of AI. He walked through the history of AI’s highs and lows, all the way to today’s transformer boom. His message: AI isn’t magic—it’s a general-purpose tool, like electricity. It’s up to us to figure out how to use it creatively and responsibly.

Keynote: Innovation in the Age of AI
Speaker: Rick Hamilton (Focused Ultrasound Foundation)
YouTube: https://www.youtube.com/watch?v=-iqnok0RnAQ

Lexical is having a moment

It’s easy to get excited about semantic search and embeddings (and we are), but John Berryman, in his presentation Lexical Love: Rediscovering the Power of Lexical Search in RAG, made a strong case for the humble keyword search. Lexical search is fast, explainable, and still the backbone of many high-performing systems.

Speaker after speaker reinforced the same idea: hybrid search—combining lexical with semantic—is where the real power lies. Daniel Wrigley from OpenSearch dove deep into this in From Static to Dynamic: Data Driven Query Understanding to Supercharge Hybrid Search. He showed how weighting or fusing the two types of results can drastically improve relevance, especially for ambiguous queries.

Lexical Love: Rediscovering the Power of Lexical Search in RAG
Speaker: John Berryman
YouTube: https://www.youtube.com/watch?v=hgHdgXbCrTY

From Static to Dynamic: Data Driven Query Understanding to Supercharge Hybrid Search
Speaker: Daniel Wrigley
YouTube: https://www.youtube.com/watch?v=H_3yl5PwAbw

LLMs can judge now (and they're not bad)

Moody’s presented an innovative use of LLMs to automate semantic evaluation in a RAG pipeline. Tanya was particularly interested in how structured prompts and predefined roles made LLM-based evaluations fast, repeatable, and scalable without fully replacing human review.

The idea is simple but powerful: instead of relying on human raters to judge search relevance (slow, expensive, inconsistent), let LLMs do it—with careful prompting, examples, and output formatting.

The results? Near-instant evaluations with impressive quality. Of course, there are still edge cases where human experts disagree (sometimes even with each other), but this approach shows promise—especially when paired with synthetic data or tuned evaluation personas.

Judge Moody’s: Automating Relevance Evaluation with LLMs
Speakers: Gurian Marks
YouTube: https://www.youtube.com/watch?v=iT3SCMPi8RQ

Speaking of personas…

The talk on Persona-Based Evaluation of Search Systems showed how you can model different user types (like visual thinkers or subject matter experts) using LLMs. Then, you can evaluate how well your search system works for that specific type of user. It’s a clever way to move past generic relevance metrics and focus on the people who really matter—your actual users. This enables focused improvements and better alignment with high-value user needs.

Persona Based Evaluation of Search Systems
Speaker: Uri Goren
YouTube: https://www.youtube.com/watch?v=44--JTG0aMg

Build for LLMs, don’t just bolt them on

One of the more forward thinking talks came from Mehul Shaw at Aryn. In Beyond RAG — Going from Search to Analytics on Unstructured Data they argued that RAG (retrieval-augmented generation) is just the first step. Real users aren’t asking isolated questions—they’re doing research, analysis, and synthesis.

Their system lets users ask natural questions, extract structured insights, and even control how the LLM executes its plan. It’s like building a lightweight BI tool on top of a language model.
The idea to build specifically for LLMs came up in several sessions (including the AMA with Doug Turnbull and Trey Grainger, authors of AI-Powered Search): if you’re serious about using LLMs, it’s time to rethink your architecture.

Reindexing content specifically for LLM use (e.g., chunking, metadata tagging, or optimizing for context windows) is no longer optional. An “LLM-first” approach—where the model is central to how you structure data and evaluate relevance—is quickly becoming best practice.

AMA with the Authors of AI-Powered Search
Speakers: Trey Grainger and Doug Turnbull
YouTube: https://www.youtube.com/watch?v=yHEkBlZIlqs

Beyond RAG - from search to analytics on unstructured data with Aryn
Speakers: Mehul Shah
YouTube: https://www.youtube.com/watch?v=K5GqCCMnUn0

Other highlights...

Chuck gave a lightning talk on the new Algolia MCP server. It was the first MCP talk at Haystack, a topic that will probably be much more popular next year!
In Building Relevance Formulas with LLMs, Kristian Aune demonstrated how Vespa.ai uses tensor-based ranking to personalize search results by transforming user preferences into numerical features. This approach supports flexible, fine-grained relevance tuning that adapts to real-world user needs without over-reliance on filtering alone.
The Supercharging Search in OpenSearch talk gave some great tips on tracing end-to-end user behavior and building continuous relevance feedback loops.
In Automatic FAQ: Building a Multi-Agent System to Extract Insight from User Discussions, Stefan Webb at Zilliz showed a neat way to chain agents together for deep research, using vector DBs like Milvus under the hood.
We loved the experiments with 4D sparse vectors in the MiniCoil talk by Thierry Damiba—a super clever way to bring context to traditional BM25 ranking.
Finally, Women of Search Presents: How Great Product Managers Build for Impact had a fantastic reminder that search is a system, not a feature. Misalignment at any layer can break the whole experience.

Final thoughts

Across every session we attended, the message was clear: search is no longer a single-layer system. It spans infrastructure, machine learning, user experience, and product strategy.

If you’re building search, content discovery, or generative interfaces, the message is clear:
- Think hybrid, not binary.
- Design for LLMs, not around them.
- Get creative with evaluation—you’ve got more tools than ever.
- And most of all, stay curious.

We had great hallway chats, learned a ton, and left energized to try out some new ideas. Big thanks to OpenSource Connections for putting together a stellar lineup!

Explore the complete session list at Haystack US 2025.

Interested in talking more about hybrid search, semantic relevance, or LLM-powered evaluation workflows? Feel free to connect!

ABOUT THE AUTHORS