About Chen
Chinese
Native or bilingual
French
Fluent
English
Fluent
Experience
- ENGIEData EngineerENERGY AND UTILITIESAugust 2024 - Today (1 year and 10 months)Paris, FranceBilling Orchestration of the billing system for offers (BSH+, BSH, BSMA, 100SPOT)- Design and implementation of a complete AWS infrastructure with Terraform, managing a multi-service architecture (Databricks Workflows, Lambda, API Gateway, EventBridge, DynamoDB, S3, Step Functions, SQS, KMS, CloudWatch…)- Building large-scale ETL pipelines on Databricks with PySpark for billing data processing- Implementation of an event-driven architecture (EventBridge + SQS + Lambda) for decoupling and orchestration of billing system components- Development of a Serverless data distribution layer with DynamoDB for high concurrency access- Design, development, and deployment of RESTful APIs via API Gateway and Lambda, exposing normalized data to other billing system components- Setup of a multi-environment CI/CD pipeline (dev/staging/preprod/prod) with GitHub Actions, ensuring reliable and repeatable deployments
- DalkiaData Solutions ArchitectENERGY AND UTILITIESJuly 2023 - July 2024 (1 year)Paris, France- Design of the target architecture for IoT data: Definition of a Lakehouse on AWS for sensor streams (temperature, pressure). Specification of differentiated ingestion (initial, incremental, replay) via Spark/EMR, structured storage in S3 Standardized, deduplication via Kafka offset, and hourly partitioning. Writing the technical design document detailing the layers (*raw* → *standardized*), S3 buckets, and IAM roles.- Data warehouse governance and industrialization: Comparative audit of Provisioned Redshift (for scheduled ETLs) vs. Serverless (for business self-service). Writing a technical design document detailing the governance strategy: fine-grained access control (users, roles, IAM policies), manual Workload Management (WLM) configuration, and transactional merge mechanism to ensure historical integrity during incremental updates or replays.- Project support and technical alignment: Facilitating workshops with Dev, PO, Urbanization, and Business teams to translate needs into technical specifications. Solution validation via PoCs (PySpark, Airflow) and design of generic Airflow DAGs with anti-concurrency locking.
- Education ZhixingBig Data EngineerEDUCATION AND E-LEARNINGFebruary 2022 - May 2023 (1 year and 3 months)Shanghai, China- Design and deployment of a data warehouse from scratch: Layered modeling (ODS, DIM, DWD/DWM/DWS) to centralize business data (visits, intentions, registrations, attendance). Management of slowly changing dimensions (SCD Type 2 via "zipper" tables) to ensure historical consistency. Development of 30+ tables and 10+ key metrics (conversion rate, retention, attendance), with daily incremental ingestion (~16 GB/day) automated via Airflow.- Implementation of a real-time recommendation system: Kafka → Spark Structured Streaming pipeline to analyze student responses in micro-batches. Dynamic calculation of metrics (Top questions by subject/level) and generation of personalized recommendations via a Spark MLlib ALS (Collaborative Filtering) model. Results exposed in MySQL for web and BI teams.- Optimization of the Big Data platform (Cloudera Hadoop): Advanced tuning of Hive (partitioning, vectorization, map joins, skew management) and Spark (repartitioning, memory tuning) to process 300k records/day/table without OOM. Automation of full/incremental ETLs (Sqoop, PySpark, Shell) on a 10-node cluster (200 TB raw).
Recommendations
Be the first to recommend Chen
Help this freelancer shine by sharing your experience working together.
These freelancer profiles also match your criteria
Agatha Frydrych
Backend Java Software Engineer
4.7
(3)
2
Baptiste Duhen
Fullstack developer
4.6
(4)
5
Amed Hamou
Senior Lead Developer
4
(2)
7
Audrey Champion
Web developer
4.3
(3)
4
Education
- Master of Computer Science, specialization in Distributed Systems and ApplicationsUniversité de Paris VI2008