Building privacy-first AI features: implementing GDPR compliance in conversational AI systems

When building AI-powered applications that handle user data, GDPR might seem like a painful legal requirement. If you see things this way, you might look at privacy law compliance as an afterthought. We saw this differently: it's rather a fundamental design principle that shapes how you architect your entire system. At Algolia, we've always prioritized data privacy in our Conversational AI platform, but when we developed our personalized agent memory system in Agent Studio, we knew additional integration work was needed to maintain our GDPR-first approach in production.

The challenge: balancing AI capabilities with privacy rights

Algolia's conversational AI platform, which includes Agent Studio (our AI agent platform) and Shopping Guides, processes and stores various types of user data to provide personalized experiences. The platform combines large language models with Algolia's search capabilities to create intelligent agents that can answer questions, provide product recommendations, and maintain context across conversations.

However, with great AI capabilities comes great responsibility. Under GDPR, users have the right to:

Access their data (Article 15): Users can request all personal data an organization holds about them
Data portability (Article 20): Users can obtain their data in a structured, machine-readable format
Erasure (Article 17): Users can request deletion of their personal data

The challenge lies in implementing these rights in a graceful degradation manner, ensuring that we have stellar personalized experience when memory is enabled AND data retention is active AND users haven't disabled it for that request; while allowing a still-great, not-memory-personalized experience, otherwise.

Unlike traditional databases where user data might be stored in neat relational tables, AI systems often distribute user information across multiple storage layers: conversation histories, semantic memories, episodic memories, and various indexes.

As a side note, if you're navigating GDPR compliance yourself, check out Algolia's GDPR Explorer—a fantastic project by our colleagues Clément Denoix, Kevin Østerkilde, and Haroen Viaene that helps make sense of complex GDPR requirements through search.

Our approach: from lab to production with regulatory constraints

We implemented a two-pronged approach that addresses both the immediate GDPR requirements and provides users with granular control over their AI experience. This journey took us from having working memory features in the lab to shipping them to real users while maintaining the privacy standards we've upheld from day one.

1. Enhanced GDPR endpoints with memory support

The first major enhancement was extending our existing GDPR endpoints to include the AI system's memory layer. Previously, our user data export only covered conversation histories. Now, it provides a complete picture of all user data stored across the platform.

class UserDataResponse(TypedDict):
    conversations: List[ConversationFullResponse]
    memories: List[MemoryRecord]

@user_data_router.get("/{user_token}")
async def get_data_by_user_token(
    user_token: str,
    user_data_service: UserDataServiceDep,
    memory_service: MemoryServiceManagementDep,
) -> UserDataResponse:
    """Retrieves all memories, conversations and their messages for the given user token."""
    conversations, memories = await asyncio.gather(
        user_data_service.get_conversations_by_user_token(user_token),
        memory_service.get_memories_by_user_id(),
    )
    return UserDataResponse(
        conversations=conversations,
        memories=memories,
    )

The implementation uses Python's asyncio.gather() to fetch both conversation data and memory records concurrently, reducing response times for what could be large datasets.

2. Secure memory retrieval architecture

One of the most interesting technical challenges was implementing secure memory retrieval. We chose to store AI memories in a centralized backend using specialized credentials, separate from regular user API access. This architecture provides better security and performance but requires careful handling for GDPR operations.

def get_memory_management_service(application_id: ApplicationIdHeader, user_token: UserTokenDep) -> MemoryService:
    """
    Dependency to get the MemoryService instance.
    
    Uses user_token from path param for user data export/deletion endpoints
    where user_token is the target user identifier
    """
    return MemoryService(
        app_id=settings.memory_app_id,  # Memory backend app (centralized)
        api_key=settings.memory_api_key,  # Memory backend key (with admin access)
        index_name=index_name,
        user_id=user_token,
    )

This approach uses elevated privileges specifically for GDPR operations while maintaining strict access controls. The system validates requests through existing authentication mechanisms before granting access to the memory management service.

3. Privacy-aware query parameter

Beyond compliance, we wanted to give users real-time control over their privacy. We implemented an optional query parameter that allows users to disable memory features on a per-request basis:

# User can disable memory for a specific completion
curl -X POST "https://api.algolia.com/1/agents/{agent_id}/completions?memory=false" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"message": "Hello"}'

Technical implementation details

Safe filter construction

One critical aspect of GDPR implementation is ensuring query safety. User IDs in search filters need proper escaping to prevent injection attacks:

def _safe_filter_value(value: str) -> str:
    return json.dumps(value)

def build_memory_filter(
    agent_ids: list[str],
    user_id: str,
    memory_type: Optional[MemoryType] = None,
    include_app_wide: bool = True,
) -> str:
    # User filter with proper escaping
    if include_app_wide:
        user_filter = f"(userID:{_safe_filter_value(user_id)} OR userID:*)"
    else:
        user_filter = f"(userID:{_safe_filter_value(user_id)})"
    # ... rest of filter construction

Using json.dumps() for filter value escaping ensures that special characters in user IDs don't break query syntax or create security vulnerabilities.

Efficient memory retrieval

Memory retrieval for GDPR purposes required implementing robust processing since users might have thousands of memory records. Our system processes memories in batches of 1,000 records, ensuring memory-efficient operations even for power users:

async def get_memories_by_user_id(self) -> list[MemoryRecord]:
    """Retrieve all memories associated with a given user ID."""
    memories: list[MemoryRecord] = []
    page = 0

    while True:
        result = await self.storage.search_memories(
            queries=["*"], agent_ids=[], limit=1000, page=page, include_app_wide=False
        )
        hits = result.get("hits", [])
        if len(hits) == 0:
            break

        records = self.engine.transform_existing_memories(hits)
        memories.extend([r.to_record() for _, r in records])
        page += 1

    return memories

Notice the include_app_wide=False parameter. This ensures that GDPR requests only return data specifically associated with the user, not general application-wide memories or templates.

Memory lifecycle management

Our deletion implementation follows a two-phase approach: first retrieve all user memories to get their IDs, then perform bulk deletion:

async def delete_memories_by_user_id(self) -> None:
    """Permanently delete all memories associated with a given user ID."""
    memories = await self.get_memories_by_user_id()
    ids_to_delete = [memory.object_id for memory in memories if memory.object_id]

    if ids_to_delete:
        await self.delete_memories(ids_to_delete)

This approach ensures that we only delete records we can verify belong to the user, providing an audit trail and preventing accidental data loss.

Performance and scale considerations

Implementing GDPR compliance in AI systems requires careful attention to performance. Our approach addresses several scaling challenges:

Concurrent Operations: Using asyncio.gather() allows simultaneous retrieval of conversation and memory data, reducing overall response time.
Batch Processing: Our processing system handles memories in manageable chunks, preventing memory exhaustion on systems with large user datasets.
Efficient Filtering: By leveraging Algolia's powerful filtering capabilities, we can efficiently query user-specific data without scanning entire indexes.
Caching Strategy: We recently added native caching to Agent Studio, including memories when present, ensuring fast response times while maintaining data freshness.

Lessons learned and best practices

1. Separate operational and user data

Our centralized memory backend architecture, while adding complexity, provides better security and performance for both normal operations and GDPR compliance. The separation of concerns makes it easier to implement proper access controls.

2. Provide granular control

The optional memory disable parameter demonstrates that privacy features don't have to be binary. Users appreciate fine-grained control over how their data is used, even within a single session.

3. Test with real data volumes

GDPR operations need to handle edge cases like users with thousands of interactions. Always test your implementation with realistic data volumes to identify performance bottlenecks.

Looking forward

So to sum it up, we shipped a privacy-first memory system that handles GDPR compliance gracefully while maintaining the intelligent experiences users expect. What's next? Now that the foundation is there, we can move to the exciting stuff! Will the next big thing be selective memory deletion, allowing users to forget specific interactions while preserving others? Or perhaps data minimization controls that automatically clean up based on user preferences? Stay tuned for our next blog posts along this engineering journey.

Conclusion

It's cool to research and build state-of-the-art memory systems in the lab. It's even better when they reach real users in the field while respecting their privacy. The gap between prototype and production wasn't as wide as we expected, but the privacy considerations made the journey worthwhile.

The key is to approach privacy not as a constraint, but as a design principle that guides architectural decisions. When privacy is built into the foundation of your AI system, compliance becomes a natural outcome rather than a retrofit.

For teams building similar systems, start with clear data governance policies, implement proper access controls, and always provide users with transparency and control over their data. The investment in privacy-first architecture pays dividends in user trust and regulatory compliance.

ABOUT THE AUTHOR