Freelancer profile translated to English.

Back to original language

Description

Data Engineer since 2016, including three years of consulting, I am now looking for new challenges.

Over these past few years, I have had the opportunity to work on different aspects of the data universe:

- Management of on-premise Hadoop and AWS infrastructures

- Client support on the use of big data and data science technologies

- Implementation and onboarding of tools (Airflow, H2O, Jupyter, Zeppelin, Docker, Janus Graph, etc.)

- Development of a Spark processing framework

- Creation of dashboards and data feeds

I am looking for a data engineer role that will allow me to continue consolidating my developer skills.

Languages

English
Conversational
French
Native or bilingual

Workplace preferences

Can work on-site

Paris (up to 10km)

Bpifrance
Data Engineer
BANKING AND INSURANCE
June 2022 - August 2024 (2 years and 1 month)
Maisons-Alfort, France
Complete overhaul of the Market Data department at BPI. Retrieve and manage external data used by various trading desks within BPI.
Migration of Python jobs to PySpark to optimize performance and scalability.
Implementation of CI/CD pipelines and Infrastructure as Code (IaC) to improve deployment efficiency and reliability.
Review and optimization of the use of various technologies (AWS Glue, MWAA, AWS Lambda, etc.).
Creation of BPI-WORKSPACE to facilitate development and access to resources for data engineers within our team (IaC + usage scripts).
Implementation of Data Vault for better data management and organization.
Setup and maintenance of observability tools to ensure effective system monitoring and tracking.
Cost optimization related to cloud service usage.
Adherence to security doctrines: Migration to HSM and implementation of assumed roles within our jobs.
Support for new Spark developers for quick and efficient integration.
Implementation of an internal framework for the department and participation in the development of a PySpark framework for BPI.
Populating and maintaining Mongo databases and Kafka topics.
PySpark Datavault Terraform AWS Glue AWS Athena MongoDB Kafka AWS Lambda Bash Jenkins Sonar Airflow
Wigglytrout Software
CTO
SOFTWARE PUBLISHING
September 2021 - May 2022 (8 months)
Creation of a notebook-type platform to assist security teams. All POCs were carried out on GCP with a Dockerized product:
- Securityhub is a notebook platform based on Zeppelin that allows creating, scheduling, and sharing notebooks. Installed on Kali Linux, this solution aims to make all tools available to security teams and improve information exchange with other teams.
- Integration of Zeppelin with authentication through an AD server via Apache Shiro.
- Kraken is a framework based on Trino/Presto, DBT, and Hive Metastore. Its purpose is to allow users to connect to a wide range of data sources: s3, gcp, BigQuery, Hive, HDFS, etc..
- Github Action: build and deploy a Docker image into Container Registry (GCP) and Dockerhub, deploy the image on Cloud Run, and perform a series of automated tests on the container.
- Data provision via Google Cloud Storage and querying through BigQuery.
Google Cloud Docker Zeppelin Java
Adaltas
Study Engineer
CONSULTING AND AUDITS
July 2019 - February 2022 (2 years and 7 months)
Boulogne-Billancourt, France
IT consultant with two main missions:

DATAKILI Paris – Big Data Engineer
- Development of Spark jobs in Scala
- Development of Java Spring jobs
- Modification of multi-tenant databases with Liquibase
- Correction and integration of client files
- Testing and deployment of the solution
- Implementation of metrics via Kibana

EDF R&D Paris/Saclay, Adaltas – InfraOps, Big Data Engineer
- Support and training for business teams, project support
- Deployment, operation, and supervision of HDP, HDF, Docker, and R clusters
- Deployment of new components: Airflow, HDF5, H20 AutoML
- Optimization and security of Hadoop clusters
- Study and implementation of visualization libraries integrated with Python/PySpark: Streamlit, Geospark
- Docker support in a Data Science environment (GPU integration, Conda, Jupyter, R)
- Automation of data ingestion pipelines with Airflow, PySpark, and Python
- Support on AWS tools
Hadoop Spark AWS Docker Python Linux Java Bash