You're seeing this page as if you were . The main menu is still yours, though. Exit from immersion
Maximiliano VillanuevaMV

Maximiliano Villanueva

Senior AI Engineer | LLM Systems, RAG & Production

€315/day
Barcelona, ES
8-15 years

Average response time: 1 hour

About Maximiliano

AI Engineer with 10+ years of experience in software engineering and 4+ years focused on building production AI systems.

I specialize in designing and deploying efficient and secure AI solutions, including LLM-based applications, RAG systems, autonomous agents, fine-tuning workflows, and MLOps pipelines.

I have worked as both a consultant and technical lead in early-stage startups, helping teams design AI-driven products from scratch, integrate AI into existing workflows, and scale systems used in production.

Key achievements:

- Reduced AI operational costs by up to 70% by optimizing inference pipelines, model usage, and infrastructure design.
- Designed and deployed multiple AI agents (internal and customer-facing) handling thousands of daily requests with ~95% task success rate.
- Led technical AI strategy and architecture decisions to align product development with business objectives.
  • Spanish

    Native or bilingual

  • Catalan

    Native or bilingual

  • English

    Fluent

Remote only
Primarily works remotely

Experience

  • Deepdots
    AI Tech Lead
    DIGITAL AND IT
    January 2025 - Today (1 year and 5 months)
    Copenhagen, Denmark
    • • Led the full lifecycle of production LLM systems (Danish↔English translation, summarization and information extraction), from business definition to deployment: self-hosted open-source models on GCP Cloud Run on a single NVIDIA L4 GPU (24 GB VRAM), serving ~15K requests/day at ~1s latency per request.
    • • Cut inference costs by 70% by migrating from third-party APIs to self-hosted open-source models, selecting and sizing candidates (Qwen3 14B, Mistral Small) based on multilingual quality and VRAM footprint.
    • • Built an observability and cost-control layer with LiteLLM as a unified gateway: per-request logging, token/latency/spend tracking, fallbacks and weekly cost reporting per product line.
    • • Designed RAG systems and agentic workflows that turned customer pain-points into production features, improving retention (from non-returning users to recurring usage every 1–2 days).
    • • Defined the AI strategy and owned the technical leadership of development, aligning model and infrastructure decisions with commercial objectives.
    Tech stack: Python, Gemini, OpenAI, Google ADK, LangChain, LiteLLM, vector databases, FastAPI, GCP Cloud Run, Docker, open-source models (Qwen, Mistral).
    Python Google cloud artificial intelligence MLOps
  • Saber
    Machine Learning Engineer
    DIGITAL AND IT
    January 2024 - January 2025 (1 year)
    Amsterdam, Netherlands
    • • Sole ML decision-maker in a startup environment: designed the platform's RAG architecture (naive, dense and hybrid retrieval strategies) and the agentic workflows with LLM orchestration.
    • • Optimized RAG pipeline consumption, reducing cost and tokens per query through retrieval and context-management improvements.
    • • Built a daily-signals feature generated from data collected each day, orchestrating collection, processing and automated delivery.
    • • Reported directly to the CEO, acting as technical advisor on the AI product roadmap and feasibility assessments.
    Tech stack: TypeScript, Node.js, Python, OpenAI, GCP, MongoDB.
    Typescript artificial intelligence Google cloud MongoDB Back-End development
  • Sciling
    Machine Learning Engineer & Python Developer
    TECH
    January 2022 - January 2024 (2 years)
    Valencia, Spain
    • • Developed NLP pipelines based on embeddings and classifiers (some trained and deployed by me) for information extraction in a regulated medical domain.
    • • Built RAG systems achieving recall@K >90% in a specific domain, combining dense and hybrid retrieval.
    • • Implemented knowledge graphs for expert-system information extraction.
    • • Developed a conversational chatbot in the medical domain (IBM Watson Assistant + Speech-to-Text).
    • • Fine-tuned open-source models for specific use cases; backend services with Python, FastAPI and Docker.
    Docker RAG Machine learning MLOps Back-End development

Recommendations

Be the first to recommend Maximiliano

Help this freelancer shine by sharing your experience working together.

These freelancer profiles also match your criteria

AgathaA

Agatha Frydrych

Backend Java Software Engineer

4.7

(3)

2

BaptisteB

Baptiste Duhen

Fullstack developer

4.6

(4)

5

AmedA

Amed Hamou

Senior Lead Developer

4

(2)

7

AudreyA

Audrey Champion

Web developer

4.3

(3)

4

Education

  • MSc
    Universidad Europea
    2025
    MSc
  • MLOps Specialization
    DeepLearning.AI
    2023
    MLOps Specialization

Skill set

Categories