Freelancer profile translated to English.

Description

🚀 AI Platforms & LLMOps Architect | From idea to truly operational AI

I help companies transform generative AI into a reliable, secure, and profitable service capable of operating at scale.

My expertise lies in designing and operating production LLM and RAG inference platforms, built for demanding contexts: high volume, strict SLAs, sensitive data, and integration with existing IT systems.

🌟 What I bring

Industrializing AI, not just demonstrating it

Transitioning from PoC to an operational platform: inference performance, high availability, controlled costs, and real operability.

Useful RAG for the business

Reliable, traceable, and explainable augmented search engines, adapted for regulatory, financial, or medical use cases.

A complete LLMOps approach

Model CI/CD, prompt and dataset governance, drift monitoring, quota management, and cost optimization.

Robust architectures

On-premise or cloud multi-GPU infrastructures, Kubernetes/OpenShift, vLLM/Triton, scaling and resilience strategies.

📌 Examples of impact

- Banking group LLM platform: >150k users, controlled latency, p99 SLA, secure multi-site operation.

- Clinical AI platform: traceable decision support on health data, compliance, and practitioner adoption.

- Business agents: automation of complex reports and high-value document search.

🎯 My promise

To deliver a production GenAI platform with:

- A solid and scalable architecture

- Operational governance

- Controlled operation

- Managed costs

- Value-generating business applications

Industry field of expertise

Languages

French
Native or bilingual
English
Native or bilingual
German
Conversational

Workplace preferences

Can work on-site

Paris (up to 50km)

BNPP
AI Platform Architect & Owner
BANKING AND INSURANCE
August 2025 - Today (10 months)
Montreuil, France
Group AI Platform Architecture & Operation
— Design, deployment, and operation of the BNP group's AI inference platform, providing LLM and ML capabilities to all entities (standardized and custom models).
— Operation of a multi-site on-premise GPU cluster via HyperShift, hosting dedicated AI, HA, and inter-site redundant OpenShift clusters.
— Implementation of OpenShift AI clusters integrating Kubernetes, SDN, Service Mesh, Operators, Prometheus, Grafana, Alertmanager, Loki, Jaeger, Pipelines, RBAC, and Network Policies.

Scalability & Performance
— Sizing of multi-GPU nodes for models from 7B to 600B parameters, MIG optimization, scheduling, NUMA, and NVLink topologies.
— Operation under industrial constraints: tens of thousands of concurrent users, >150k MAU, strict SLAs, optimized TTFT, p99 latency < 3s.
— Advanced scaling, batching, and prioritization strategies on shared non-production clusters and dedicated production clusters.

Serving & Critical Workloads
— Serving of LLMs, embeddings, and financial ML models (scoring, forecasting, anomaly detection) on shared infrastructure and isolated, encrypted production environments.
— Design of strong network, compute, storage, and secrets isolation for sensitive contexts.

Storage & Resilience
— HA NAS hybrid architecture + shared local storage for performance and fault tolerance.
— Multi-site redundancy, DRP, backups, and service continuity.

Governance & Ecosystem
— Structuring product governance: roles, committees, offer lifecycle, service catalog, and internal contracting.
— Vendor and critical dependency management.
— Operation of the Red Hat ecosystem: OpenShift, OpenShift AI, HyperShift, Quay, ACM, ArgoCD, Pipelines, Service Mesh, Keycloak, ODF.
— Alignment with group standards for security, compliance, observability, and operations.
OpenShift Kubernetes LLMOps LLMs Governance
KPMG (SA)
Lead Data Scientist - LLM
CONSULTING AND AUDITS
October 2024 - August 2025 (10 months)
Courbevoie, France
LLM / RAG Agents
— Design of advanced RAG agents (ReAct, Multihop, Plan-Search-Respond) for Risk Management, Audit, Business Analysis, and IFRS using Python, Haystack, LangGraph, DSPy, LiteLLM, Pydantic, Azure OpenAI, Mistral.
— Production deployment of a multi-risk report generation agent (climate, geography, human rights) via LangChain, Tavily, GPT-4o, and Llama 3.1.
— Multi-level indexing strategies, peripheral context management, hybrid search (chunk, embeddings, full-text).
— Indexing of images and non-textual content in documents (GPT-4o, YOLO, Azure OCR, ColPali).

Architecture / MLOps
— Industrialization of CI/CD for Data Science projects: build, tests, packaging, deployment, and monitoring of ML/LLM pipelines.
— Co-design of the Azure AI foundation with the IT department: Azure ML, AKS, Blob, Functions, and Durable Functions.
— Inference architectures combining streaming, batch, and event-driven orchestration via queues and message buses.
— Distributed asynchronous pipelines (fan-out/fan-in, retry, idempotence, fault tolerance).
— Azure ML model deployment: autoscaling, versioning, blue/green, canary, rollback.
— SOTA evaluation stack: context relevancy/recall, ATS, nDCG@k with dedicated pipelines.
— Setup of agent store, config store, and dataset store for governance.
— Tracking of LLM costs by user/use case with quotas and alerting.

Lead Data Science
— Technical leadership of a team of 4 Data Scientists.
— Management of DSLP backlog + Scrum in Azure DevOps (KANBAN, boards by use case).
— Creation of a dedicated AI codebase following Python/DS best practices: uv, pre-commit, Makefile, DevContainer, Ruff.
— Comprehensive documentation of algorithms, metrics, and indexing.
— Unit, integration, and E2E testing strategy.
— Code quality: pylint, black, isort, bandit, safety, ruff, mypy, coverage integrated into CI/CD.
— Use case qualification with program management.
Tech Lead LLM Data Scientist LLMOps Production deployment Team coordination
STEALTH CLINICAL CONTEXT
Lead LLMOPs – Platform Architect
BIOTECH
August 2024 - November 2025 (1 year and 3 months)
Paris, France
Clinical AI Platform / GenAI Architecture
— Design and industrialization of a decision support platform for patients with chronic kidney disease, operated in production under health data constraints (security, sovereignty, compliance).
— End-to-end architecture: ingestion, normalization, pseudonymization, RAG engine, LLM stack, inference layer, business API, and user interfaces.
— Multi-source medical RAG engine leveraging patient records, biology, and clinical repositories (FAISS/Qdrant, biomedical embeddings, hybrid retrieval, reranking, longitudinal context management).
— Clinician interface similar to a decision support chat with context visualization, response justification, and feedback (Gradio).
— Product management: roadmap, iterations, user workshops, and impact measurement on decision quality.

LLM Engineering & Governance
— Fine-tuning of Llama-3 8B, Mistral 7B, Qwen on medical corpora (Transformers, PEFT, QLoRA/LoRA, TRL).
— Supervised alignment and RLHF pipelines with human-in-the-loop.
— Comprehensive governance: dataset/model/prompt versioning, metrics, audits, and traceability of clinical decisions.
— Responsibility framework: confidence thresholds, human fallback, controlled refusal, and medico-legal traceability.

Inference Platform & Operations
— HA bare metal platform based on vLLM (multi-model, continuous batching, KV cache, tensor parallel, GPU scheduling) and Infinity for large-scale embeddings.
— Kubernetes orchestration of AI/data services: API, vector store, PostgreSQL, monitoring, MinIO encrypted storage, CI/CD, and audit logs.
— Operational processes: SLA, technical and business monitoring, incident management, and service continuity.
Platform Architecture RAG LLM Fine-tuning Sovereign AI Bare Metal