You're seeing this page as if you were . The main menu is still yours, though. Exit from immersion
Henri BertrandHB

Henri Bertrand

AI Architect | LLMOps | GenAI | Agents | RAG

€750/day
Paris, FR
8-15 years

Average response time: 24 hours

Freelancer profile translated to English.
Back to original language

About Henri

🚀 AI Platforms & LLMOps Architect | From idea to truly operational AI

I help companies transform generative AI into a reliable, secure, and profitable service capable of operating at scale.
My expertise lies in designing and operating production LLM and RAG inference platforms, built for demanding contexts: high volume, strict SLAs, sensitive data, and integration with existing IT systems.

🌟 What I bring

Industrializing AI, not just demonstrating it
Transitioning from PoC to an operational platform: inference performance, high availability, controlled costs, and real operability.

Useful RAG for the business
Reliable, traceable, and explainable augmented search engines, adapted for regulatory, financial, or medical use cases.

A complete LLMOps approach
Model CI/CD, prompt and dataset governance, drift monitoring, quota management, and cost optimization.

Robust architectures
On-premise or cloud multi-GPU infrastructures, Kubernetes/OpenShift, vLLM/Triton, scaling and resilience strategies.

📌 Examples of impact

- Banking group LLM platform: >150k users, controlled latency, p99 SLA, secure multi-site operation.
- Clinical AI platform: traceable decision support on health data, compliance, and practitioner adoption.
- Business agents: automation of complex reports and high-value document search.

🎯 My promise

To deliver a production GenAI platform with:
- A solid and scalable architecture
- Operational governance
- Controlled operation
- Managed costs
- Value-generating business applications
  • French

    Native or bilingual

  • English

    Native or bilingual

  • German

    Conversational

Can work on-site
Paris (up to 50km)

Experience

  • BNPP
    AI Platform Architect & Owner
    BANKING AND INSURANCE
    August 2025 - Today (10 months)
    Montreuil, France
    Group AI Platform Architecture & Operation
    — Design, deployment, and operation of the BNP group's AI inference platform, providing LLM and ML capabilities to all entities (standardized and custom models).
    — Operation of a multi-site on-premise GPU cluster via HyperShift, hosting dedicated AI, HA, and inter-site redundant OpenShift clusters.
    — Implementation of OpenShift AI clusters integrating Kubernetes, SDN, Service Mesh, Operators, Prometheus, Grafana, Alertmanager, Loki, Jaeger, Pipelines, RBAC, and Network Policies.

    Scalability & Performance
    — Sizing of multi-GPU nodes for models from 7B to 600B parameters, MIG optimization, scheduling, NUMA, and NVLink topologies.
    — Operation under industrial constraints: tens of thousands of concurrent users, >150k MAU, strict SLAs, optimized TTFT, p99 latency < 3s.
    — Advanced scaling, batching, and prioritization strategies on shared non-production clusters and dedicated production clusters.

    Serving & Critical Workloads
    — Serving of LLMs, embeddings, and financial ML models (scoring, forecasting, anomaly detection) on shared infrastructure and isolated, encrypted production environments.
    — Design of strong network, compute, storage, and secrets isolation for sensitive contexts.

    Storage & Resilience
    — HA NAS hybrid architecture + shared local storage for performance and fault tolerance.
    — Multi-site redundancy, DRP, backups, and service continuity.

    Governance & Ecosystem
    — Structuring product governance: roles, committees, offer lifecycle, service catalog, and internal contracting.
    — Vendor and critical dependency management.
    — Operation of the Red Hat ecosystem: OpenShift, OpenShift AI, HyperShift, Quay, ACM, ArgoCD, Pipelines, Service Mesh, Keycloak, ODF.
    — Alignment with group standards for security, compliance, observability, and operations.
    OpenShift Kubernetes LLMOps LLMs Governance
  • KPMG (SA)
    Lead Data Scientist - LLM
    CONSULTING AND AUDITS
    October 2024 - August 2025 (10 months)
    Courbevoie, France
    LLM / RAG Agents
    — Design of advanced RAG agents (ReAct, Multihop, Plan-Search-Respond) for Risk Management, Audit, Business Analysis, and IFRS using Python, Haystack, LangGraph, DSPy, LiteLLM, Pydantic, Azure OpenAI, Mistral.
    — Production deployment of a multi-risk report generation agent (climate, geography, human rights) via LangChain, Tavily, GPT-4o, and Llama 3.1.
    — Multi-level indexing strategies, peripheral context management, hybrid search (chunk, embeddings, full-text).
    — Indexing of images and non-textual content in documents (GPT-4o, YOLO, Azure OCR, ColPali).

    Architecture / MLOps
    — Industrialization of CI/CD for Data Science projects: build, tests, packaging, deployment, and monitoring of ML/LLM pipelines.
    — Co-design of the Azure AI foundation with the IT department: Azure ML, AKS, Blob, Functions, and Durable Functions.
    — Inference architectures combining streaming, batch, and event-driven orchestration via queues and message buses.
    — Distributed asynchronous pipelines (fan-out/fan-in, retry, idempotence, fault tolerance).
    — Azure ML model deployment: autoscaling, versioning, blue/green, canary, rollback.
    — SOTA evaluation stack: context relevancy/recall, ATS, nDCG@k with dedicated pipelines.
    — Setup of agent store, config store, and dataset store for governance.
    — Tracking of LLM costs by user/use case with quotas and alerting.

    Lead Data Science
    — Technical leadership of a team of 4 Data Scientists.
    — Management of DSLP backlog + Scrum in Azure DevOps (KANBAN, boards by use case).
    — Creation of a dedicated AI codebase following Python/DS best practices: uv, pre-commit, Makefile, DevContainer, Ruff.
    — Comprehensive documentation of algorithms, metrics, and indexing.
    — Unit, integration, and E2E testing strategy.
    — Code quality: pylint, black, isort, bandit, safety, ruff, mypy, coverage integrated into CI/CD.
    — Use case qualification with program management.
    Tech Lead LLM Data Scientist LLMOps Production deployment Team coordination
  • STEALTH CLINICAL CONTEXT
    Lead LLMOPs – Platform Architect
    BIOTECH
    August 2024 - November 2025 (1 year and 3 months)
    Paris, France
    Clinical AI Platform / GenAI Architecture
    — Design and industrialization of a decision support platform for patients with chronic kidney disease, operated in production under health data constraints (security, sovereignty, compliance).
    — End-to-end architecture: ingestion, normalization, pseudonymization, RAG engine, LLM stack, inference layer, business API, and user interfaces.
    — Multi-source medical RAG engine leveraging patient records, biology, and clinical repositories (FAISS/Qdrant, biomedical embeddings, hybrid retrieval, reranking, longitudinal context management).
    — Clinician interface similar to a decision support chat with context visualization, response justification, and feedback (Gradio).
    — Product management: roadmap, iterations, user workshops, and impact measurement on decision quality.

    LLM Engineering & Governance
    — Fine-tuning of Llama-3 8B, Mistral 7B, Qwen on medical corpora (Transformers, PEFT, QLoRA/LoRA, TRL).
    — Supervised alignment and RLHF pipelines with human-in-the-loop.
    — Comprehensive governance: dataset/model/prompt versioning, metrics, audits, and traceability of clinical decisions.
    — Responsibility framework: confidence thresholds, human fallback, controlled refusal, and medico-legal traceability.

    Inference Platform & Operations
    — HA bare metal platform based on vLLM (multi-model, continuous batching, KV cache, tensor parallel, GPU scheduling) and Infinity for large-scale embeddings.
    — Kubernetes orchestration of AI/data services: API, vector store, PostgreSQL, monitoring, MinIO encrypted storage, CI/CD, and audit logs.
    — Operational processes: SLA, technical and business monitoring, incident management, and service continuity.
    Platform Architecture RAG LLM Fine-tuning Sovereign AI Bare Metal

Recommendations

Youness M.YM
Thomas Moreau BisottiTM
Teddy ToussaintTT
+1
Youness M. and 3 other people have recommended Henri

These freelancer profiles also match your criteria

AgathaA

Agatha Frydrych

Backend Java Software Engineer

4.7

(3)

2

BaptisteB

Baptiste Duhen

Fullstack developer

4.6

(4)

5

AmedA

Amed Hamou

Senior Lead Developer

4

(2)

7

AudreyA

Audrey Champion

Web developer

4.3

(3)

4

Education

  • Master 2 Embedded Deep Learning
    Université de Cergy-Pontoise
    2017
    Master 2 Deep Learning Embarquée

Skill set

Categories