Freelancer profile translated to English.

Description

💡 Data Engineer | Big Data & Cloud Pipelines Expert

4 years of experience in designing, optimizing, and deploying distributed data pipelines (Spark, Airflow, Trino, Kubernetes, S3, BigQuery).

👉 I work to make your processes reliable and faster, reduce your cloud costs, and implement scalable and documented solutions.

✅ Performance & Scalability: reduction of processing times up to -40%, compute costs halved.

✅ Reliability & Governance: implementation of automatic recovery mechanisms and robust monitoring.

✅ Collaboration & Delivery: seamless integration with your Product, Backend & Data teams for adapted and operational workflows.

🎯My goal: transform your pipelines into performance and growth levers, with a results-oriented approach and strong technical autonomy.

Languages

French
Native or bilingual
English
Fluent
Spanish
Basic
Arabic
Native or bilingual

Workplace preferences

Can work on-site

Paris 12e Arrondissement (up to 50km)

Realytics
Data Engineer
TELECOMMUNICATIONS
November 2023 - August 2025 (1 year and 9 months)
Paris, France
Participation in the modernization and scaling of Realytics' analytical pipelines, as part of the BEE product, measuring the impact of TV campaigns on real-time web sessions.

End-to-end design of Big Data pipelines, from ingestion to delivery (PySpark, Trino, Hive), including Hadoop administration aspects (HDFS management, Spark job monitoring, Spark job optimization).
Direct contribution to the migration to Airflow on Kubernetes with Helm: implementation of dynamic triggers, Spark worker configuration, DAG supervision.
Implementation of automated recovery and restart mechanisms in case of incidents (fine-grained error management).
Daily RUN support: monitoring Airflow executions, analyzing Spark logs, detecting and resolving production anomalies (partition corruption, SparkSQL errors, S3 connectivity loss).
Regular interactions with Backend, Product, Frontend, and Data Analyst teams to adapt workflows to their constraints and synchronize deployments.
Continuous deployment via Jenkins and ArgoCD, writing Ansible playbooks to standardize initialization and testing tasks.
Advanced use of Linux (CLI, Cron, memory management, system logs) to analyze abnormal behaviors.
Proactive approach to technical choices and Spark optimization (partitioning, shuffle tuning, broadcast join).

Results:
Reduction of processing times by approximately 40%, with compute costs halved.
Improved reliability of processes: 95% success rate for critical DAGs.
Strong autonomy in resolving production incidents and contribution to internal documentation.

Technical Environment: PySpark, Trino, Hive, Spark SQL, HDFS, S3, Airflow, Helm, Jenkins, ArgoCD, Docker, Kubernetes, Ansible, Linux, Git, Grafana, Jira.
Airflow Kubernetes Hive PySpark Hadoop
ZELROS
Data Engineer
TECH
October 2022 - October 2023 (1 year)
Paris, France
Implementation of an analytical pipeline on GCP to support customer recommendations in the insurance sector.
Deployment of a complete pipeline in production: ingestion from Cloud Storage, processing, and populating BigQuery tables.
Performance optimization through BigQuery partitioning, ensuring response times suitable for a real-time engine.
Production technical support: troubleshooting cloud permission issues, scheduling errors, and incoming data anomalies.
Collaboration with Product and Backend teams to ensure functional consistency of exposed data.
Implementation of unit tests (Pytest), an alerting system, and participation in functional testing phases.
Contribution to CI/CD maintenance (GitHub Actions, dependency management via Poetry, code quality control with Ruff).
Results:
Stable production pipeline with an SLA < 30 min.
Zero critical errors after implementing automated tests.

Technical Environment: GCP, BigQuery, Cloud Storage, Airflow, Python, GitHub Actions, Ruff, Poetry, Unix.
Cloud GCP big query Airflow ruff Bash
Apneal
Apnea Data Engineer
HEALTH AND WELLNESS
May 2022 - September 2022 (4 months)
Paris, France
Participation in the development of a data pipeline for a sleep apnea screening device, including preparing data from SQLite databases and polysomnography files, orchestrating S3 ingestion/export flows, processing physiological signals, and industrializing modules via a documented Python package (Sphinx) deployed on AWS (S3, EC2, SageMaker).
Amazon Web Services AWS EC2 Python