Freelancer profile translated to English.

Description

Data Engineer / Tech Lead (13 years), specialist in Cloudera environments, Spark/Structured Streaming, Databricks (AWS & Azure), Delta Lake, Kafka, and Airflow. Designs and operates cloud and on-prem data platforms, industrializes batch & streaming pipelines, and implements quality/observability practices. Performance expert: end-to-end diagnosis and tuning, latency reduction, and cost optimization (FinOps). Former Java developer; daily practice of Python and Java, team leadership, and dissemination of best practices.

Languages

French
Native or bilingual
English
Fluent

Workplace preferences

Can work on-site

Paris (up to 50km)

Confidentiel
Tech Lead Data Engineer
ENERGY AND UTILITIES
June 2023 - Today (3 years)
Paris, France
Tech Leadership & Architecture:

Cross-functional leadership of Data teams: disseminating Software Craftsmanship culture, design patterns, and industrialization standards (CI/CD, tests, code reviews, documentation).
Methodological support for the transition to a Data Mesh organization (domain-based governance, team accountability, and Data Product delivery).
Design and delivery of a generic ingestion framework and a cross-functional application dedicated to data quality (data profiling, monitoring, and alerts).

Engineering, Migration & Big Data at Scale

Design and optimization of critical data pipelines handling billions of records per day and terabytes of data (sectors: gas/electricity valuation, billing, measurements).

Strategic migration & modernization of legacy Cloudera architectures (Spark 2.4) to a modern Cloud ecosystem (AWS S3, Glue Catalog, Delta Lake, Unity Catalog).

Spark, Databricks & FinOps Expertise

Advanced Spark/Databricks performance optimization (job tuning, partitioning strategies, shuffle reduction, drastic processing time reduction).

Technical audits and FinOps approach: drastic reduction of AWS/Databricks costs through consumption monitoring and right-sizing of clusters.

Technologies:

Data & Cloud: Databricks, Spark, Delta Lake, AWS (S3, Glue, Lambda, CloudWatch, EventBridge), Unity Catalog, Cloudera (Hadoop).
Streaming & Orchestration: Structured Streaming, Kafka, Airflow, dbt.
Languages & Dev: Python, Java, SQL, GitLab CI/CD, Parquet, Avro.
Databricks AWS S3 Java Python
SGSS
Senior Data Engineer
BANKING AND INSURANCE
May 2021 - Today (5 years)
Paris, France
➢ DataHub Foundation Project:

Setting up a Lakehouse platform from scratch on Azure, with a data-centric strategy.
Development of a multi-channel ingestion tool (batch/SpringBatch, SFTP/Spring Integration, CDC/Informatica, streaming/Kafka).
Development of a configurable processing engine (batch & streaming); consolidated views and ELT pipelines (Spark/Hive).
Execution of POCs to validate technical choices.
Automated unit and integration tests.
➢ Reporting Island Project:

Migration and redesign of the "Îlot" application to Azure (dismantling the on-prem datalake).

Development of an Oracle → Azure ingestion tool.
Consolidated views and normalization using datasets conforming to the enterprise model.
Business Datamarts and exposure via API.
Automated integration and unit tests.
➢ CSDR Project:

Intervention on the performance of Spark jobs and their orchestration.
Optimization of Spark processing (skew, partitioning, cache).
Elimination of bottlenecks and FinOps approach.
Reduction of the workflow from approximately 10 hours to ~1 hour.

Technologies Used:

AKS (Azure Kubernetes Service), Databricks, Azure HDInsight (managed Spark cluster), ADLS Gen2, Delta Lake, Spark/Structured Streaming, Kafka, Airflow, Docker, Azure PostgreSQL, Azure Key Vault, Spring (IoC, Integration, Batch), Scala/Java/Python, PySpark, Pandas, Poetry, pyenv, Zeppelin, Jupyter/VS Code, Elastic Stack (Elasticsearch, Kibana), Grafana, Alerta.
Cloud Azure Kubernetes Python Java Kafka
SCOR
Data Engineer
BANKING AND INSURANCE
February 2019 - June 2021 (2 years and 4 months)
Paris, France
➢ SOLEM Project:

Implementation of a real-time processing platform to generate customer recommendations.
Construction and normalization of data to produce reliable datasets.
Implementation of an integration and deployment process (CI/CD).
Development of unit and integration tests.
Audits of existing applications.
Guarantee of daily data accuracy and availability.

Technologies Used:

Scala, Java, Kafka, Spark Streaming, Tomcat, Git, Oracle, Redis, Docker, Oozie, Azure, Power BI, Jupyter, Avro, Parquet, Jenkins, Python, Kubernetes, Apache Sqoop
Kafka Java