Freelancer profile translated to English.

Description

I am a **Google Cloud Professional Certified Data Engineer**, passionate about transforming data into strategic assets. With over 5 years of experience, I excel in designing and optimizing robust and scalable data architectures.

What I offer:

Certified Expertise:As a Google Cloud Certified Professional Data Engineer, I master GCP tools and services, includingBigQuery, Dataflow, Dataproc, and Cloud Composer (Airflow).
Leadership in Design and Optimization:As a Data Engineer at EDF, I led the strategic migration of data infrastructures to GCP, optimizing performance and reducing costs. I designed medallion data pipeline architectures, enabling efficient data management across different layers (Bronze, Silver, Gold). For example, with very large data sources, I improved pipeline speed by over90%and reduced costs by the same ratio usingDataprocin batch.
Strategic Documentation:I write detailed technical architecture documents and develop data processing strategies, ensuring optimal long-term management. My approach guarantees clarity and traceability of technical decisions.
Tailored Support:Whether for migration to GCP, setting up new environments, or developing custom solutions, I support you throughout the process. At SEALK, I created a framework for data pipeline management, facilitating the integration of new technologies and optimizing data ingestion and transformation.

I am determined to transform your data challenges into strategic opportunities through a professional and certified approach. Together, let's make your data a true asset for your business!

Industry field of expertise

Languages

French
Native or bilingual
English
Native or bilingual
Arabic
Native or bilingual

Workplace preferences

Remote only

Primarily works remotely

EDF
Data Engineer
ENERGY AND UTILITIES
September 2023 - June 2025 (1 year and 9 months)
92800 Puteaux, France
Project Context
As a Data Engineer at EDF, I am leading the migration of data infrastructures to Google Cloud Platform (GCP). This project aims to reduce costs, optimize performance, and decrease the execution time of complex data pipelines, while managing large data volumes.

Responsibilities and Achievements
Architecture Design
I designed architectures tailored to business needs, considering data sources and types. Service choices were made to ensure a robust and scalable solution.

Architecture Examples
Medallion Data Pipeline:
Bronze Layer: Raw storage.
Silver Layer: Transformation into structured format.
Gold Layer: Data ready for analysis.
Batch Processing: Use of Dataproc to execute PySpark jobs on large data volumes.
Migration to GCP: Evaluation of sources, setup of the GCP environment, and migration orchestrated by Cloud Composer.
Documentation and Monitoring
Architectural Decisions
Architecture Decision Records (ADR) were created to document critical choices, such as adopting GCP for its scalability and using Dataproc for complex data processing.

Infrastructure and Tools
I used Terraform to manage infrastructure, including the configuration of GCS buckets and BigQuery. Setup was coordinated with several teams to deploy necessary environments.

Impact and Collaboration
The medallion architecture improved data access for analytical teams. GCP training sessions were organized to enhance the team's autonomy.

Team and Methodology
The project involved collaboration with Data Architects, Data Engineers, and DevOps, using the SAFe methodology to foster agility. Technologies used include Terraform, Cloud Composer, PySpark, and BigQuery.
Big Query Airflow Terraform Google Cloud Platform PySpark
Sealk
Data Engineer
PRIVATE EQUITY
June 2022 - July 2023 (1 year and 1 month)
Paris, France
Project Context:
As a Data Engineer, I designed and created a framework for data pipeline management, aiming to optimize ingestion and transformation while ensuring smooth integration with various environments.

Responsibilities and Achievements:
Framework Creation:
Developed a framework based on hexagonal architecture, allowing the isolation of application logic from external tools, facilitating testing and technological evolution.
Data Management:
Managed various file types (text, XML, CSV, JSON) from sources like LinkedIn and Creditsafe, and used databases such as MongoDB and Oracle DB.
Pipeline Setup:
Established synchronization chains between data sources and Google Cloud Storage (GCS), prepared pipelines for data transformation and model evaluation following a graph theory logic for pipeline sequences.
Using Apache Beam on Dataflow:
Implemented Apache Beam for real-time and batch data processing, creating robust and scalable pipelines.
Resource Optimization:
Clustering & partitioning on BigQuery tables.
Staging BigQuery tables.
Training and Support:
Trained Data Engineers on the framework and provided customer support to ensure successful adoption of solutions.
Impact:
This architecture improved continuous integration and data management, with positive feedback from clients. A junior Data Engineer was able to generate complex pipelines quickly, demonstrating the framework's effectiveness.
Team Collaboration:
Worked in SCRUM mode with a team of Data Architects and Data Engineers, supported by Google.
Technologies Used:
Orchestration: Cloud Composer
Processing: Apache Beam, DataFlow
Storage: BigQuery, Google Cloud Storage
Languages: Python
Databases: Oracle, PostgreSQL, MongoDB
Apache Beam Google Cloud Platform PySpark Airflow Cloud Architecture
Agence des Monts
Data Engineer
January 2021 - May 2022 (1 year and 4 months)
Tunisia
Project: AI-Powered SEO Optimized Article Generator (GPT Model)
As a Data Engineer, I developed an advanced system for generating SEO-optimized content, aiming to produce relevant articles and increase website traffic.

Responsibilities:
Web Crawling
Data Processing: Developed processing chains in Google Cloud Storage (GCS).
Data Pipelines: Designed pipelines for data transformation and evaluation.
APIs: Developed APIs for fine-tuning deep learning models on Google Compute Engine.
Data Warehouse: Designed a Data Warehouse on BigQuery for data analysis.
Dashboards: Visualization of SEO performance.
Impact:
Traffic Increase: Optimized content leading to a significant increase in traffic.
SEO Improvement: Better visibility of generated content.
Client Satisfaction: Positive feedback on article quality.
Project: Plagiarism Detection System
Promoted to Tech Lead, I supervised the development of a plagiarism detection system, reducing costs by 90% compared to SaaS solutions.

Responsibilities:
Needs Analysis: Feasibility study to define technical specifications.
Algorithm Development: Text search using Natural Language Processing (NLP) techniques.
Transfer to GCS and BigQuery with Python and PySpark.
Web Crawling via residential proxies
API Development with Socket.IO for real-time communication.
Impact:
Cost Reduction: Lowered plagiarism-related costs while maintaining quality.
Quality Improvement: Effective identification of plagiarism cases.
Client Satisfaction: Clients satisfied with the flexibility and efficiency of the solutions.
Technologies Used:
GCP, Google Cloud Scheduler, Apache Spark, PySpark, BigQuery, Python, Scala, Flask, GitLab, SonarQube, Nginx, Socket.IO.

Check out Firas's experience

Be the first to recommend Firas

Help this freelancer shine by sharing your experience working together.

Agatha Frydrych

Backend Java Software Engineer

4.7

(3)

Baptiste Duhen

Fullstack developer

4.6

(4)

Amed Hamou

Senior Lead Developer

(2)

Audrey Champion

Web developer

4.3

(3)

Signup to reveal

Google Cloud Professional Data Engineer certification
GCP
2024
Engineer's degree, Data science
Ecole Supérieure Privée d'Ingénierie et de Technologies - ESPRIT
2020
Engineer's degree, Data science

Google Cloud Professional Data Engineer certification
Google
2024
https://google.accredible.com/cce73ad1-6347-4c11-893a-ab24110c427c
Airflow Big Query Google Composer PySpark GCS Dataproc Spark Dataflow SQL Python

DevOps

Firas Ben Younes

Cloud Data Engineer GCP BigQuery Airflow Spark

About Firas

Experience

Recommendations

These freelancer profiles also match your criteria

Education

Certifications

Skill set

Categories