You're seeing this page as if you were . The main menu is still yours, though. Exit from immersion
Soufiane B.SB

Soufiane B.

Data Engineer – Specialized in Big Data Pipelines

€300/day
Paris 12e Arrondissement, FR
3-7 years

Average response time: 1 hour

Freelancer profile translated to English.
Back to original language

About Soufiane

💡 Data Engineer | Big Data & Cloud Pipelines Expert

4 years of experience in designing, optimizing, and deploying distributed data pipelines (Spark, Airflow, Trino, Kubernetes, S3, BigQuery).
👉 I work to make your processes reliable and faster, reduce your cloud costs, and implement scalable and documented solutions.

✅ Performance & Scalability: reduction of processing times up to -40%, compute costs halved.
✅ Reliability & Governance: implementation of automatic recovery mechanisms and robust monitoring.
✅ Collaboration & Delivery: seamless integration with your Product, Backend & Data teams for adapted and operational workflows.

🎯My goal: transform your pipelines into performance and growth levers, with a results-oriented approach and strong technical autonomy.
  • French

    Native or bilingual

  • English

    Fluent

  • Spanish

    Basic

  • Arabic

    Native or bilingual

Can work on-site
Paris 12e Arrondissement (up to 50km)

Experience

  • Realytics
    Data Engineer
    TELECOMMUNICATIONS
    November 2023 - August 2025 (1 year and 9 months)
    Paris, France
    Participation in the modernization and scaling of Realytics' analytical pipelines, as part of the BEE product, measuring the impact of TV campaigns on real-time web sessions.

    • End-to-end design of Big Data pipelines, from ingestion to delivery (PySpark, Trino, Hive), including Hadoop administration aspects (HDFS management, Spark job monitoring, Spark job optimization).
    • Direct contribution to the migration to Airflow on Kubernetes with Helm: implementation of dynamic triggers, Spark worker configuration, DAG supervision.
    • Implementation of automated recovery and restart mechanisms in case of incidents (fine-grained error management).
    • Daily RUN support: monitoring Airflow executions, analyzing Spark logs, detecting and resolving production anomalies (partition corruption, SparkSQL errors, S3 connectivity loss).
    • Regular interactions with Backend, Product, Frontend, and Data Analyst teams to adapt workflows to their constraints and synchronize deployments.
    • Continuous deployment via Jenkins and ArgoCD, writing Ansible playbooks to standardize initialization and testing tasks.
    • Advanced use of Linux (CLI, Cron, memory management, system logs) to analyze abnormal behaviors.
    • Proactive approach to technical choices and Spark optimization (partitioning, shuffle tuning, broadcast join).

    Results:
    • Reduction of processing times by approximately 40%, with compute costs halved.
    • Improved reliability of processes: 95% success rate for critical DAGs.
    • Strong autonomy in resolving production incidents and contribution to internal documentation.

    Technical Environment: PySpark, Trino, Hive, Spark SQL, HDFS, S3, Airflow, Helm, Jenkins, ArgoCD, Docker, Kubernetes, Ansible, Linux, Git, Grafana, Jira.
    Airflow Kubernetes Hive PySpark Hadoop
  • ZELROS
    Data Engineer
    TECH
    October 2022 - October 2023 (1 year)
    Paris, France
    • Implementation of an analytical pipeline on GCP to support customer recommendations in the insurance sector.
    • Deployment of a complete pipeline in production: ingestion from Cloud Storage, processing, and populating BigQuery tables.
    • Performance optimization through BigQuery partitioning, ensuring response times suitable for a real-time engine.
    • Production technical support: troubleshooting cloud permission issues, scheduling errors, and incoming data anomalies.
    • Collaboration with Product and Backend teams to ensure functional consistency of exposed data.
    • Implementation of unit tests (Pytest), an alerting system, and participation in functional testing phases.
    • Contribution to CI/CD maintenance (GitHub Actions, dependency management via Poetry, code quality control with Ruff).
    Results:
    • Stable production pipeline with an SLA < 30 min.
    • Zero critical errors after implementing automated tests.

    Technical Environment: GCP, BigQuery, Cloud Storage, Airflow, Python, GitHub Actions, Ruff, Poetry, Unix.
    Cloud GCP big query Airflow ruff Bash
  • Apneal
    Apnea Data Engineer
    HEALTH AND WELLNESS
    May 2022 - September 2022 (4 months)
    Paris, France
    Participation in the development of a data pipeline for a sleep apnea screening device, including preparing data from SQLite databases and polysomnography files, orchestrating S3 ingestion/export flows, processing physiological signals, and industrializing modules via a documented Python package (Sphinx) deployed on AWS (S3, EC2, SageMaker).
    Amazon Web Services AWS EC2 Python

Recommendations

Be the first to recommend Soufiane

Help this freelancer shine by sharing your experience working together.

These freelancer profiles also match your criteria

AgathaA

Agatha Frydrych

Backend Java Software Engineer

4.7

(3)

2

BaptisteB

Baptiste Duhen

Fullstack developer

4.6

(4)

5

AmedA

Amed Hamou

Senior Lead Developer

4

(2)

7

AudreyA

Audrey Champion

Web developer

4.3

(3)

4

Education

  • Master in Data Science
    Université Paris Dauphine
    2022

Skill set (37)

Categories