You're seeing this page as if you were . The main menu is still yours, though. Exit from immersion
Hassina SaddedineHS

Hassina Saddedine

Data Engineer | Data Quality | AWS, GCP, IBM Cloud

€600/day
Nanterre, FR
3-7 years

Average response time: 1 hour

Freelancer profile translated to English.
Back to original language

About Hassina

Data & AI Engineer – Data Quality, Big Data & Cloud (AWS / GCP / IBM)

Data & AI Engineer specializing in data quality, reliability, and large-scale data valorization. I work across the entire data lifecycle, from collection to production, with a strong focus on Data Quality by design, governance, and industrialization of data pipelines in cloud environments (AWS, GCP).

Design and orchestration of robust and automated ETL pipelines integrating systematic quality controls: ingestion, cleaning, normalization, schema validation, and data traceability (AWS Glue, PySpark, Airflow, GCP Dataflow). Management of large and multi-source data (audio, text, sensors, documents), with storage and historization on S3, GCS, BigQuery, Redshift, and PostgreSQL. Implementation of monitoring and alerting to ensure completeness, consistency, and continuity of processing.

Experience in Data Science and AI projects (NLP, speech recognition, computer vision, embedded ML), with particular attention to the quality, consistency, and usability of training datasets. Training, evaluation, and deployment of models on AWS SageMaker and GCP AI Platform, integrating reliable and reproducible data preparation pipelines.

Accustomed to working in complex and constrained environments, I support business and technical teams in structuring reliable, auditable data ready for analytical, ML, and AI uses, while optimizing costs and performance through a FinOps and cloud-native approach.
  • French

    Native or bilingual

  • English

    Fluent

Can work on-site
Nanterre (up to 50km), Nanterre (up to 10km), Saint-Quentin-en-Yvelines (up to 10km), Paris (up to 20km)

Experience

  • BNP Paribas
    Data Engineer - Data Quality, description= **Scoping & business requirements:** Workshops with Product Owners and Data Scientists to define virtual assistant service requirements: Q&A business rules, data quality criteria, and IT constraints (banking security, DMZR, IBM COS, Elasticsearch). **Architecture & Design (DDD):** Design of a Domain / Application / Infrastructure architecture. Modeling of key entities (Document, Chunk, Embedding, IndexRecord) and implementation of a modular, scalable, and maintainable pipeline. **Ingestion & Data Quality (ETL):** Development of a complete ingestion pipeline from IBM COS: automatic format detection (CSV/JSON), robust parsing, normalization, quality controls, and data lifecycle (raw → parsed → enriched → indexed → dead_letter). **Data Quality & Reliability:** Definition and implementation of Data Quality rules (completeness, consistency, uniqueness, conformity). Anomaly detection (missing data, duplicates, errors), error management, and processing traceability. **Data Security & Access:** Development of a secure Python connector for IBM COS (DMZR) with dynamic credential retrieval via Vault and secure logging. **Structuring & Embeddings:** Implementation of a chunking strategy adapted to the banking context (semantic consistency, controlled sizes). Embedding generation with batch management, retries, and structured logs. **Elasticsearch Industrialization:** Creation and management of indexes, optimized mappings (custom analyzers, nested fields, multi-fields). Bulk indexing with partial error management and atomic alias switching without downtime. **Documentation & Agility:** Writing technical documentation on Confluence. Working in Agile Scrum methodology, managing technical user stories, and tracking via Jira.
    BANKING AND INSURANCE
    September 2025 - December 2025 (3 months)
    Montreuil, France
    Vault Python Domain Driven Design Elasticsearch IBM Cloud
  • Letxbe
    Data Engineer – Data Quality & Governance - Cloud AWS
    SOFTWARE PUBLISHING
    December 2023 - August 2025 (1 year and 8 months)
    Paris, France
    Data Scoping & Requirements
    Gathering requirements from business and technical stakeholders with a strong focus on data quality, reliability, and governance: business rules, security requirements, IT constraints, costs, and cloud service choices.

    Data Quality by Design
    Definition and implementation of data quality rules (completeness, consistency, uniqueness, schema conformity).
    Integration of automated quality controls in ingestion and indexing pipelines to detect anomalies (missing data, inconsistencies, partial errors).

    Data Platform & Infrastructure
    Deployment and industrialization of OpenSearch on AWS via Terraform: secured clusters (IAM, TLS/KMS), CloudWatch logging, multi-AZ private subnets, and VPC Endpoints ensuring data integrity and confidentiality.

    Reliable & Scalable Pipelines
    Design of Python indexing and search pipelines with systematic data validation: dynamic mappings, custom analyzers, nested fields, and consistency checks before exposure.
    Query optimization and low-latency API exposure.

    Data Migration & Reliability
    Migration from ArangoDB to OpenSearch: extraction, cleaning, transformation, and post-migration quality checks to ensure data completeness and conformity.

    Monitoring & Governance
    Proactive monitoring of data quality and freshness (alerts on errors, volumes, shards, snapshots).
    Securing flows via AWS Transfer Family (SFTP), SQS → Lambda → API automation, and FinOps tracking for sustainable data governance.
    Terraform Transfer Family Textract ArangoDB
  • Stellantis
    Data Engineer – Data Quality & Pipeline Industrialization (GCP | Autonomous Vehicles)
    AUTOMOBILE
    September 2021 - December 2023 (2 years and 2 months)
    Paris, France
    Data Scoping & Requirements
    Collaboration with Data, ML, and Vehicle Engineering teams to define data quality requirements for road test data: sensor stream reliability, temporal consistency, analytical and ML usability, volume and performance constraints.

    Ingestion & Data Pipelines (GCP)
    Implementation of automated pipelines for collecting, synchronizing, and transferring sensor data (video, audio, LIDAR, CAN logs) to Google Cloud Storage, orchestrated by Apache Airflow and triggered upon raw file reception.

    Data Processing & Data Quality
    Development of distributed processing with Dataflow to ensure data quality: cleaning (audio filtering, redundant frame removal), multi-sensor timestamp normalization, completeness and consistency checks, enrichment with metadata (vehicle ID, GPS, weather conditions).

    Reliability & Quality Controls
    Implementation of Data Quality rules on incoming and transformed data: automatic detection of corrupted, incomplete, or inconsistent data, isolation of non-compliant streams, and securing datasets used for analysis and ML.

    Storage & Structuring
    Structuring data in BigQuery (partitioned tables, controlled schemas), with monitoring of freshness, volumes, and traceability of flows from source to final datasets.

    Orchestration & Monitoring
    Complete pipeline orchestration with Airflow, integrating quality controls at each key stage, job monitoring, failure management, and automatic recovery to ensure processing continuity.

    ML Datasets & Deployment
    Preparation of reliable datasets for model training on Vertex AI, then deployment of validated models on embedded platforms (NVIDIA Jetson), using Docker, RTMaps, and ROS2 to ensure reproducibility and robustness.
    Big Data Docker GitLab Airflow Google Cloud

Recommendations

These freelancer profiles also match your criteria

AgathaA

Agatha Frydrych

Backend Java Software Engineer

4.7

(3)

2

BaptisteB

Baptiste Duhen

Fullstack developer

4.6

(4)

5

AmedA

Amed Hamou

Senior Lead Developer

4

(2)

7

AudreyA

Audrey Champion

Web developer

4.3

(3)

4

Education

  • Master 2
    Créteil
    2020
    Système distribués et technologies de la data science

Certifications

  • ROS
    Orsys
    2023
  • Hands-on Machine Learning with NVIDIA and AWS
    Coursera
    2023

Skill set

Categories