Freelancer profile translated to English.

Description

Data engineer with solid experience in production environments, I help companies build reliable, scalable, and AI-ready data architectures.

My optimizations on data pipelines have reduced costs by several hundred thousand euros in production. This is what I bring to each mission: a rigorous technical vision, focused on performance and real business impact.

I work across the entire data chain: ETL/ELT pipeline design, orchestration, modeling, and data quality assurance. I have worked on complex, large-scale data projects, particularly on enterprise platforms like Palantir Foundry.

My differentiating asset: coming from a Master's in Data Science, I understand the needs of ML and AI teams upstream. I build pipelines designed for real-world use—feature stores, training data, monitoring—without data scientists having to rework them.

Key Skills:

— Data Engineering: Python, Spark, SQL, ETL/ELT, orchestration

— Platforms: Palantir Foundry, Databricks

- Cloud: GCP, AWS

— MLOps: Data CI/CD, model deployment, monitoring

— AI: LLM integration, RAG pipelines, data for AI agents

Industry field of expertise

Languages

French
Native or bilingual
English
Fluent

Workplace preferences

Can work on-site

Paris (up to 50km)

Société Générale
Data Engineer
BANKING AND INSURANCE
September 2023 - Today (2 years and 11 months)
La Défense, France
As part of this long-term assignment, I act as a Data Engineer in a large-scale production environment.
Key achievements:
→ Optimization of the data infrastructure, generating several hundred thousand euros in savings through the redesign of underperforming Spark pipelines and significant reduction in processing times and compute costs.
→ Design and development of scalable data pipelines (ETL/ELT) on Palantir Foundry, ensuring reliability and maintainability in production.
→ Optimization of Big Data workflows under Apache Spark and Python: reduction of execution times, improved memory management, and data partitioning.
→ Implementation of automated data quality control processes, reducing manual interventions and production errors.
→ Close collaboration with Data Science teams to provide reliable, documented, and reusable datasets, accelerating the production deployment of ML models.
→ Contribution to the overall data architecture with a focus on scalability, performance, and infrastructure cost reduction.
Technical environment: Palantir Foundry, Apache Spark, Python, SQL, ETL/ELT, Big Data, MLOps, Data Quality, Pipeline Orchestration.
Palantir Foundry Spark Python SQL MLOps
Societe Generale
Data Scientist
BANKING AND INSURANCE
March 2023 - September 2023 (6 months)
Paris, France
As part of my end-of-studies internship (Master's in Data Science), I worked on a synthetic data generation project, a topic at the intersection of data science, data privacy, and dataset quality.
Key achievements:
→ Design and development of a synthetic data generation engine capable of reproducing the statistical distributions of real data while ensuring the confidentiality and security of sensitive data.
→ Application of advanced Machine Learning and statistical modeling techniques to generate realistic artificial datasets, usable for testing and analysis phases without exposing personal data.
→ Implementation of an evaluation framework for synthetic data quality: similarity measurement, statistical fidelity metrics, and usability tests to ensure the reliability of generated datasets.
→ Integration of the generation pipeline into the company's internal testing environment, with complete technical documentation facilitating adoption by teams.
Impact: Reduced reliance on real data for testing phases, accelerated development cycles, and enhanced compliance with GDPR / Privacy by Design requirements.
Technical environment: Python, Airflow, Jenkins, Docker, Git, Machine Learning, Statistical Modeling, Synthetic Data Generation, Data Privacy, Data Quality, GDPR, Pandas, Scikit-learn.
Python Airflow Machine Learning Artificial Intelligence (AI) Jenkins

Be the first to recommend Soufiane

Help this freelancer shine by sharing your experience working together.

Agatha Frydrych

Backend Java Software Engineer

4.7

(3)

Baptiste Duhen

Fullstack developer

4.6

(4)

Amed Hamou

Senior Lead Developer

(2)

Audrey Champion

Web developer

4.3

(3)

Signup to reveal

Double Degree MSIAM Data Science
ENSIMAG
2023
Engineer
ENSIMAG - Grenoble INP
2023

Certified Palantir Foundry Data Engineer Professional
Palantir
2025
https://verify.skilljar.com/c/eg5yjx5ck8e8
Git PySpark Palantir Foundry Big Data Spark Python SQL
Databricks Certified Data Engineer Associate
Databricks
2026
https://credentials.databricks.com/cb89a33c-2183-447b-bcf8-4ff7c8536c43
Git PySpark Big Data Databricks Spark ETL SQL Python

Data Engineer

AI engineer

Soufiane Lemrabet

Data Engineer & AI Palantir Foundry

About Soufiane

Experience

Recommendations

These freelancer profiles also match your criteria

Education

Certifications

Skill set

Categories