Freelancer profile translated to English.

Back to original language

Description

Data Engineer, I have carried out several missions in Big Data.

I have worked for La Banque Postale, Economie d'Energie, and MyMoneyBank.

A multidisciplinary profile, I adapt easily and produce quality deliverables.

Languages

French
Native or bilingual
English
Fluent

Workplace preferences

Can work on-site

Paris (up to 50km)

Mymoneybank
Data Engineer
BANKING AND INSURANCE
January 2020 - Today (6 years and 4 months)
Courbevoie, France
MyMoneyBank had to face the shutdown of its credit management software (FIBOS) and consequently, develop all the components to do it internally (with the collaboration of Sopra for the Cassiopae software). This project was named GROM, for Grand Raid d'Outre Mer.

The GROM project lasted 2 years for me. I was asked to get involved at all levels to enable the development of Spark Jobs for accounting processing (there are about fifty). These were sourced from a DataLake on EMR (Elastic Map Reduce) where Parquet files were available. In order to schedule all the processing, I participated in the development of workflows (DAGs) on Airflow.

In addition, outside the GROM project, there were needs for specific processing (without going through the DataLake) which led to the implementation of Java Batches for particular needs. These processes were in Java (Spring).

In more detail, I mainly acted on:
- The creation of more than sixty Scala Spark Jobs retrieving data from AWS S3 via AWS EMR; then filtering, formatting, aggregating and finally saving them in a database (AWS RDS);
- The creation (with the business team) of a non-payment calculation algorithm which is subsequently made available to the entire Finance team;
- The implementation of a database management process by historizing SQL scripts via Flyway;
- The creation of about ten Airflow DAGs (Python) for scheduling Spark Jobs and Java (Spring) Batches meeting needs, contributing to more than thirty Airflow DAGs maintained by the Accounting team;
- The execution of about ten Java Spring Batches retrieving data from a database to generate files that can be integrated into the accounting interpreter;

Malt limits the number of characters...
Scala Python Gitlab Hadoop Apache Kafka Apache Spark Apache Airflow Docker Kibana Amazon EMR Amazon RDS Apache Hadoop Spring boot SQL Hashicorp Vault
La banque postale
Data Engineer
November 2018 - July 2019 (8 months)
Ivry-sur-Seine, France
La Banque Postale wanted to launch the "Vision 360" project to have a complete overview of all its clients. The objective for them was therefore to recruit data engineers to work on feeding a DataLake.

In more detail, I mainly acted on:
- Implementation of NIFI Workflows: Apache NIFI is a task orchestrator that allows automating tasks with sequencing specific to needs. In my case, the need was to retrieve files (textual), validate them, transform them, and then ingest them into a DataLake (here HDFS);
- Implementation of an internal ingestion engine: Apache NIFI having its limits on volumes, I initiated the development of an internal ingestion engine (in Spark), allowing to read different file sources, validate them, transform them and load them into HDFS;
- Implementation of HQL scripts and Spark jobs for transforming data stored on HDFS and ingested into Hive;
- Resolution of production anomalies and data cleaning.
Apache Nifi Python Apache Spark Gitlab Apache Hadoop SQL Scala
Économie d'Énergie SAS
Data Engineer
September 2018 - October 2018 (1 month)
Economie d'énergie is a company that allows French people to carry out insulation work for a symbolic €1 (with government aid).
Having several clients, and therefore several documents, the goal was to categorize all of their documents by creating predictive models to target new clients. Documents of all types (forms, invoices, technical notices, etc.) were transmitted as scans or images.

Mission:
With 6 machines available, the objective was to classify 700,000 documents weighing between 500KB and 5MB.
The mission was divided into 2 parts: retrieving text from files (data engineering) and classifying files based on this text (data science).

I worked on the first part: extracting text from documents.
The first step was to create a Python program that took a file as input and could extract text from it: this is called OCR (Optical Character Recognition). The processing time for a file varied from 30 seconds to 5 minutes. It was therefore necessary to parallelize this.

To parallelize the processing on the 6 machines, I set up a Kafka broker to send messages (file location) in order to extract text from them. Docker containers were started on the 6 machines that listen to the Kafka topic to process the files. The text files were made available on an NFS so that the Data Scientist could retrieve them and continue with the second part.
Apache Kafka Docker Python Ansible Gitlab

Be the first to recommend Aimen

Help this freelancer shine by sharing your experience working together.

Agatha Frydrych

Backend Java Software Engineer

4.7

(3)

Baptiste Duhen

Fullstack developer

4.6

(4)

Amed Hamou

Senior Lead Developer

(2)

Audrey Champion

Web developer

4.3

(3)

Signup to reveal

Computer Engineering
ENSIIE - National School of Computer Science for Industry and Business
2018
Cycle ingénieur en spécialité Génie-Logiciel
Master 2 (M2) - DataScale
Université Paris-Saclay
2018
Gestion de données dans un monde numérique - Data Management in a Digital World (DataScale)

Docker Certified Associate
Docker, Inc
2019
https://credentials.docker.com/mf1yyoau

Data Engineer

Aimen Sijoumi

Big Data Engineer, UI-UX Designer, Full Stack Dev

About Aimen

Experience

Recommendations

These freelancer profiles also match your criteria

Education

Certifications

Skill set (31)

Categories