Other Types
  1. All Blogs
  2. Product | Algolia
  3. Ai | Algolia
  4. E-commerce | Algolia
  5. User Experience | Algolia
  6. Algolia
  7. Engineering | Algolia

The lawyer of tomorrow: supporting Algolia step into the AI era

Published:

At DevCon 2025, Laure Carli, former Deputy General Counsel at Algolia, presented on “The Lawyer of Tomorrow: Supporting Algolia Step into the AI Era.” 

The talk described the evolving role of in-house counsel in an AI-driven organization. She framed her mission as preventing value destruction by protecting a company’s core assets—its intellectual property and its commercial ability to distribute products. 

“When I joined Algolia as a legal counsel in 2018, I quickly found myself supporting engineering teams (among other things) with product launches, intellectual property protection, and privacy compliance. In 2024, a shift in technology occurred with the launch of Algolia’s vector search and the spread of coding assistants among R&D teams. The adoption of these innovations at Algolia challenged my previous knowledge and assumptions. As a lawyer, I realized I risked becoming rapidly irrelevant if I kept looking at today’s technology with yesterday’s eyes. From exposing our model on Hugging Face to navigating the legal nightmare of vibe-coding, this is the story of how I partnered with Algolia teams to transform into the lawyer of tomorrow.”

Watch the complete presentation here:


Auto-generated transcript: 

Hi everyone. I'm super happy that I was invited to talk today at Devcon. I chose to tell you the story of becoming what I call the lawyer of tomorrow as opposed to the lawyer of yesterday, and how that happened supporting the Algolia teams step into the AI era.

My name is Lali. I am Deputy General Counsel at Algolia, which means that I've been working on anything legal related at the company for the past seven years — from commercial to product, privacy, and of course, AI.

Very well, you may say, but what does a legal counsel really do? In terms of my mission, I like to put it this way: my role is to prevent value destruction in the company. In the SaaS industry, the value of a company mainly lies in two things — first, the company's product and IP, and second, the company's ability to distribute a product on the market, which ultimately materializes when signing deals with customers.

Let me take you through two of my most important journeys recently at Algolia. The first one was about putting an LLM on the Hugging Face platform, and the second one was about guiding engineers through, let’s say, the challenges of coding co-pilots and other vibe coding tools.

For this first story, I will assume that we're all familiar with the notion of a large language model, or an LLM, and that we all know that Hugging Face is used for LLM publication the same way GitHub is used for software and projects publication.

Before we dive into the LLM topic in itself, I would like to offer a reminder of the different options that a company has to distribute a software on a market. We call this licensing models. The most popular licensing model is probably open source, where the code is public and free. When the company goes open source, they generate revenue out of associated services like support or consultancy.

At the other end of the spectrum, you have companies who decide to go full proprietary — where the code is confidential, fully owned. These companies generate revenue from licensing fees when customers use the software. In between these two options, you have a whole variety of what I call dual licenses, where parts of the code are free and some others are enterprise-gated, meaning reserved for enterprise customers who pay a fee.

When it comes to Algolia’s search engine, both the keyword and the neural search engines, we decided to go proprietary since the beginning — until one day. And that day is when our AI teams came to me with a specific request. They said, “We’d like to put our vector embedding model on Hugging Face. Is that okay?”

And as much as I tried to show enthusiasm here, I'm not going to lie to you — that generated a slight sense of panic in me. I instantly thought, “What about our IP in the LLM? What about competitive advantage?” because there's a lot of R&D that goes into developing these models. “And what about any trade secrets that may be in the models? And before anything, why is it so important to the teams that we share the model?”

Well, their answers were super interesting. They told me, “Laura, it's all about flexing. It's all about being able to demonstrate the performance of your LLM and to compare it with competitors.” That was my first learning here. While it makes little sense to benchmark classic software algorithms in the era of LLMs, there is inherent value in being able to flex via benchmark.

So now that I know, I cannot unknow it. If the business wants it, I have to find a way and support them. So I decided to take myself back to school to understand what an LLM is really made of.

As much as an algorithm is made of code and follows what we call a deterministic logic — if A then B, if B then C — an LLM is made entirely different. An LLM is made of parameters, weights, and a little bit of code, fair enough. But it mostly follows a logic of the most likely output to occur based on a given context and a given training. This logic is called probabilistic because of the likelihood and the probability rationale that it relies on.

Let's dig into the LLM secret sauce for a second. An LLM is built with a specific architecture. The weights are a result of observing and learning from training data. In other words, the value of an LLM lies in both its components and in the data that it was trained on. It's like a student, if you wish — a student is only as good as the lessons they were taught or the books they read, regardless of their brains or the way their neurons are organized.

So based on that finding, we agreed with the teams that we would only share the model’s architecture and the weights, but keep the training data confidential. That was lesson number two. In the era of LLMs, the value of what you create does not only lie in code — it also lies in all the components that allow you to replicate LLM behavior, including the training data sets.

And still I wondered: if we share the model free of use on Hugging Face, why would our customers pay for the service? Again, to understand, I decided to take myself back to the fundamentals of our technology.

When you use Algolia neural search, you use various blocks of Algolia technology put one after another that we call end-to-end AI search. It starts with query understanding, then goes to content retrieval, and ends with relevant ranking of the results. To make a search engine, you need all three blocks. Search cannot happen with only one or two of them.

So, is Algolia really giving it all? Let's have a closer look at the search engine components. At the very left, you have the trained LLM — the trained vector embedding component. If you follow, it is the one that we would like to be able to flex about. But down the line you have all other blocks of technology like retrieval and ranking blocks that Algolia is also pretty amazing at, like quantization, caching, or hashing.

It means that our competitive advantage also lies in these blocks of technologies. That's where I figured out we're not giving the entire engine. We're not giving the whole motor, if you want. We're only giving parts of it. As we’re going to publish the vector embedding side on Hugging Face, we decided that we would keep all the blocks down the line confidential to protect our competitive advantage.

So in the end, it is possible to demonstrate the performance of a model with your competitors while preserving trade secrets and competitive advantage. But if you want to support this effort — and that's my big learning here — you have to know your tech stack. And that is valid also for lawyers, and maybe primarily for lawyers out there.

The same lesson applies to the other journey I would like to tell you about, which is my experience supporting the teams with the use of vibe coding tools. And for a lawyer, it started out as a little bit of a nightmare.

Let's first agree on a definition of vibe coding. Vibe coding refers to the use of natural language prompting to guide an LLM to generate, refine, or debug code. For the purpose of this presentation, we will also include inline code suggestions offered by tools like GitHub Copilot.

Again, I was living a peaceful, happy life at Algolia until one day I surveyed the company for use of AI tools. And the question that comes most is about coding assistance. Teams are asking, “Is it fine to use GitHub Copilot, Claude, or vibe coding tools?”

As much as I was willing to support — super cool, let me check — many questions rushed to mind. Do we even own the code that is generated by the tool? What happens if the code is contaminated by open source? Also, what the heck is vibe coding?

As I looked for answers, all I saw was market dissonance. On one side, Y Combinator is going full speed on AI, bragging that a quarter of their cohort companies have code bases generated with AI. On the other hand, the highly respected National Venture Capital Association is updating their shareholder template agreement to prohibit that companies use AI to develop core software. That is because these investors are concerned that the companies in which they put money won't own the software that they create.

So, lesson number four here: no one knows what to do — which is a bit concerning, but also exciting. It means I need to go back and do one of my favorite things — go and figure it out for myself.

So I am back to school learning about software IP ownership. Remember, Algolia has decided to license its solution on a fully proprietary basis. So let's have a closer look at what it really takes to claim copyright on a piece of work.

To claim copyright ownership, you need to gather three main elements. The first one is originality — it needs to be your software, your work, not a copy of someone else's. The second one is tangibility — meaning you have to materialize it on a support like a piece of paper or a hard drive. It doesn't matter, but we have to be able to seize it.

And the third one, which is the most important here, is authorship. Authorship is the human involvement in the creation of the work. And bad news for us here: as of today, judges around the world consider that an AI is not an author. Thus, AI-generated work cannot benefit from copyright protection.

Let's deep dive into the judges’ reasoning with practical examples. What you see here is a painting generated with AI. It actually won an art contest in Colorado. It was created using 147 prompt iterations with Midjourney, and still copyright was denied. The judge estimated that the user could not sufficiently control or predict the outputs to be the author of the painting.

And to know where it's headed at a global level, I find it useful to refer to the 2025 report by the US Copyright Office, stating that AI-generated content may be eligible for copyright only if the human input in the creative process is significant.

You're probably wondering what it means for human input to be significant. Well, it makes two of us. And there's no guidance on this matter at the moment. But we can still learn one thing here: if you don't have control on the creation process, then you don't have ownership.

Let's take this one step further with the landmark case of a comic book. The Zarya of the Dawn comic book was created using Midjourney for the images, but the text was originally written by its author. And what's super interesting here is that despite images being denied protection, the entire comic book was recognized with copyright. Why? Because the author, Kris Kashtanova, selected and controlled the overall arrangement of text and images to create the comic book.

And that's our main takeaway: content arrangement matters regardless of what you used to create the components of your work. A little bit of AI here, human input there — how you overall arranged your work matters to obtain copyright.

So with that, coming back to my engineers and AI coding tools — where does that leave us? I now know that coding tools are not great for IP protection, but I also know that control and arrangement matter.

So with a group of Algolia engineers who volunteered to help me define best practices, we sat down and tried to decide what we’re going to do. And here’s where we landed to try and demonstrate control on the code.

First of all, review the outputs. Always perform a human review of your outputs to be able to understand your code as if you wrote it. Also, as you write your code, try and document your choices so that if one day you're asked, you can explain that you made the decision for this output as opposed to this other one.

And last, try to remain in control of your tool settings. Some tools out there offer options to protect yourself from contamination by open source or to avoid or opt out from training of the tool with your content.

When it comes to Algolia core projects — the most sensitive ones — we also agreed on the following: it is fine to insert some code suggestions or vibe coding in the project, but engineers should at all times decide for the overall code logic and the overall architecture of the project. This way, similar to the Zarya of the Dawn comic book, they can benefit from copyright protection of their entire project.

I'm not saying we found the solution, but I am happy that we got to partner to try and figure it out for ourselves.

That's it for me. And what I love most about this journey is that I wasn't alone traveling it. As long as I was humble to recognize that I didn't have all of the answers to the challenges, I found Algolia engineers and product people every step of the way who really cared, who were eager to share and to learn with me. Together we were able to figure out solutions that made everyone happy in their mission.

Thank you so much for listening to me. I hope you enjoyed the presentation as much as I enjoyed giving it, and I will stick around to answer any of your questions.

Recommended

Get the AI search that shows users what they need