You're seeing this page as if you were . The main menu is still yours, though. Exit from immersion
Chen ZangCZ

Chen Zang

Data Engineer

€650/day
Paris, FR
15+ years

Average response time: 1 hour

Freelancer profile translated to English.
Back to original language

About Chen

Senior Data Engineer specializing in AWS cloud architectures, with strong expertise in designing scalable and event-driven data platforms. I focus on implementing event-oriented serverless architectures (EventBridge, SQS, Lambda) and advanced workflow orchestration via Step Functions (SFN) to manage complex and distributed processing.

Databricks expert, I design and industrialize high-performance batch pipelines in PySpark (ELT/ETL), optimized for processing very large data volumes in Lakehouse environments (Delta Lake). Accustomed to AWS environments (EMR, S3, DynamoDB, Redshift, MWAA), I build robust, automated, and multi-environment end-to-end data solutions, with a particular focus on performance, reliability, and scalability.
  • Chinese

    Native or bilingual

  • French

    Fluent

  • English

    Fluent

Can work on-site
Paris (up to 30km)

Experience

  • ENGIE
    Data Engineer
    ENERGY AND UTILITIES
    August 2024 - Today (1 year and 10 months)
    Paris, France
    Billing Orchestration of the billing system for offers (BSH+, BSH, BSMA, 100SPOT)

    - Design and implementation of a complete AWS infrastructure with Terraform, managing a multi-service architecture (Databricks Workflows, Lambda, API Gateway, EventBridge, DynamoDB, S3, Step Functions, SQS, KMS, CloudWatch…)
    - Building large-scale ETL pipelines on Databricks with PySpark for billing data processing
    - Implementation of an event-driven architecture (EventBridge + SQS + Lambda) for decoupling and orchestration of billing system components
    - Development of a Serverless data distribution layer with DynamoDB for high concurrency access
    - Design, development, and deployment of RESTful APIs via API Gateway and Lambda, exposing normalized data to other billing system components
    - Setup of a multi-environment CI/CD pipeline (dev/staging/preprod/prod) with GitHub Actions, ensuring reliable and repeatable deployments
    Spark Python Databricks AWS Event-driven architecture
  • Dalkia
    Data Solutions Architect
    ENERGY AND UTILITIES
    July 2023 - July 2024 (1 year)
    Paris, France
    - Design of the target architecture for IoT data: Definition of a Lakehouse on AWS for sensor streams (temperature, pressure). Specification of differentiated ingestion (initial, incremental, replay) via Spark/EMR, structured storage in S3 Standardized, deduplication via Kafka offset, and hourly partitioning. Writing the technical design document detailing the layers (*raw* → *standardized*), S3 buckets, and IAM roles.
    - Data warehouse governance and industrialization: Comparative audit of Provisioned Redshift (for scheduled ETLs) vs. Serverless (for business self-service). Writing a technical design document detailing the governance strategy: fine-grained access control (users, roles, IAM policies), manual Workload Management (WLM) configuration, and transactional merge mechanism to ensure historical integrity during incremental updates or replays.
    - Project support and technical alignment: Facilitating workshops with Dev, PO, Urbanization, and Business teams to translate needs into technical specifications. Solution validation via PoCs (PySpark, Airflow) and design of generic Airflow DAGs with anti-concurrency locking.
    Cloud AWS PySpark Python Apache Kafka Amazon Redshift
  • Education Zhixing
    Big Data Engineer
    EDUCATION AND E-LEARNING
    February 2022 - May 2023 (1 year and 3 months)
    Shanghai, China
    - Design and deployment of a data warehouse from scratch: Layered modeling (ODS, DIM, DWD/DWM/DWS) to centralize business data (visits, intentions, registrations, attendance). Management of slowly changing dimensions (SCD Type 2 via "zipper" tables) to ensure historical consistency. Development of 30+ tables and 10+ key metrics (conversion rate, retention, attendance), with daily incremental ingestion (~16 GB/day) automated via Airflow.
    - Implementation of a real-time recommendation system: Kafka → Spark Structured Streaming pipeline to analyze student responses in micro-batches. Dynamic calculation of metrics (Top questions by subject/level) and generation of personalized recommendations via a Spark MLlib ALS (Collaborative Filtering) model. Results exposed in MySQL for web and BI teams.
    - Optimization of the Big Data platform (Cloudera Hadoop): Advanced tuning of Hive (partitioning, vectorization, map joins, skew management) and Spark (repartitioning, memory tuning) to process 300k records/day/table without OOM. Automation of full/incremental ETLs (Sqoop, PySpark, Shell) on a 10-node cluster (200 TB raw).
    Spark Kafka Cloudera Hadoop Airflow Python

Recommendations

Be the first to recommend Chen

Help this freelancer shine by sharing your experience working together.

These freelancer profiles also match your criteria

AgathaA

Agatha Frydrych

Backend Java Software Engineer

4.7

(3)

2

BaptisteB

Baptiste Duhen

Fullstack developer

4.6

(4)

5

AmedA

Amed Hamou

Senior Lead Developer

4

(2)

7

AudreyA

Audrey Champion

Web developer

4.3

(3)

4

Education

  • Master of Computer Science, specialization in Distributed Systems and Applications
    Université de Paris VI
    2008

Skill set

Categories