Industrializing a sovereign LLMaaS: scalability, security… and an AI that helps create itself

TalkAI in production

2025-12-10 | 09:00 AM - 09:00 AM | Auditorium Niv-2

Information

Can we create a sovereign generative AI service... with the help of AI itself? This is the challenge we took on by industrializing a sovereign LLMaaS, hosted on a SecNumCloud IaaS, intended for sensitive clients (sovereign sectors, healthcare, etc.). This feedback presents the challenges encountered in putting into production a reliable, scalable, measurable, secure LLM product—and partially built using AI. The concrete challenges we addressed: • Support for heterogeneous GPUs (A100, L40S, H200, Apple M4…) with dynamic scheduling according to workloads • Token-based billing, with a distributed counting system integrated into the loadbalancers • Detailed performance measurements (latency, tokens/s, efficiency per model) • Intelligent load distribution, with 4 active routers + full monitoring • Integration into a SecNumCloud environment, with IAM, audit, strict isolation, and sovereign storage What’s special about the project? AI contributed to its own creation! We used LLMs to: • Generate pieces of infra-as-code (Kubernetes manifest, adaptive proxies) • Produce dashboards and monitoring scripts • Help diagnose errors in production • Automate certain routing or resource allocation decisions This project forced us to cross AI, security, observability, governance, and DevSecOps culture in a real production context. We will share successes, limitations, tools, and most importantly, lessons transferable to other enterprise AI projects.