You're seeing this page as if you were . The main menu is still yours, though. Exit from immersion
Dioula D.DD

Dioula D.

Data Engineer - Data Scientist

€650/day
Paris, FR
3-7 years

Average response time: 1 hour

Freelancer profile translated to English.
Back to original language

About Dioula

Experienced data consultant, specializing in data engineering and NLP with in-depth knowledge of the Azure, Databricks, and Python technical ecosystem, and cloud computing principles and best practices. I have expertise in data product development, process optimization, and project management. I have successfully implemented advanced analytical solutions to improve business operations and customer experience in the retail, banking, and insurance sectors. Passionate about leveraging data to extract relevant insights and generate value.
  • French

    Native or bilingual

  • English

    Fluent

Can work on-site
Paris (up to 50km), Lille (up to 10km)

Experience

  • Cawoylel
    AI/Data Engineer
    RESEARCH
    May 2023 - Today (3 years and 1 month)
    Paris, France
    Cawoylel is an organization whose goal is to implement advanced language technologies (NLP) in the Fulani language.

    1. Collection and transformation of audio and text data from various sources and ingestion into Google Drive
    - Audio data alignment pipelines with their transcriptions
    - Creation of speech recognition datasets and deployment on Hugging Face in Open Source
    - Writing technical documentation

    2. Development of a collaborative web platform for data collection and annotation with Python - Flask - PostgreSQL
    - Dockerization and Deployment of the platform on AWS EC2
    - Implementation of CI/CD process with GitLab

    3. Extraction and alignment of translation data from PDF documents (Fulani - French dictionaries)

    - OCR with Google Cloud Vision to accurately extract textual data
    - Few-shot prompting with Langchain, Vertex AI, and Kor to align each Fulani word/phrase with its French translation

    3. Development of AI models for speech recognition and translation
    - Implementation of Windanam, a speech recognition solution trained on Meta's open-source MMS model
    - Deployment of the solution on a Hugging Face Space instance
    - Implementation of an interactive web application with Streamlit allowing users to interact with the model
    Pytorch Python Google Cloud Platform (GCP) Colab Hugging Face LLM ASR Whisper MMS flask AWS S3 PostgreSQL artificial intelligence CUDA Runpod Vastai Deep Learning Streamlit Langchain Vertex AI cloud vision Google Cloud Storage Gitlab CI/CD Amazon EC2
  • KPMG SA - France
    Experienced Consultant - Data Engineer - Science
    CONSULTING AND AUDITS
    March 2021 - May 2023 (2 years and 2 months)
    Paris, France
    1. Data Product Lead

    Technical lead for the development and implementation of a data quality tool in PySpark on Azure using the open-source Great Expectations library.

    Achievements:
    - Development of a data quality control and validation framework integrated into the client's data ingestion processes.
    - Development of dashboards allowing business users to consult the latest data quality indicators.
    - Timely delivery of high-value data quality products that identified 100% of problems/errors and reported their evolution over time.
    - Improved visibility of data quality for sales and business teams, giving them a 360-degree view of the data, leading to more informed decision-making and increased confidence in data-driven decisions.
    - Recommendation of new data sources to integrate, expanding the organization's customer understanding and enabling new use cases.

    Technical skills: Azure Databricks, Azure Data Lake, Azure Data Factory, PowerBI, Sengrid Email API, PySpark, Great Expectations (open-source python library).

    2. Data Engineer Consultant - Customer Data Platform Deployment

    Implementation and deployment of a Customer Data Platform (CDP), leveraging Azure Data Lake as the primary data source, for hyper-personalized digital marketing activities.

    Achievements:
    - Conducted use case evaluations and feasibility analyses with 5 different local markets, leading to the identification and prioritization of use cases maximizing business value.
    - Implemented PI Plannings with Mural to effectively plan, execute, and monitor all project iterations.
    - Automated data collection, transformation, and ingestion (batch, real-time) from various sources, including APIs and Azure EventHub, and ingested data into Azure Data Lake and the Customer Data Platform.
    - Segmentation and recommendation of personalized content based on unified data customer profiles.
    - Documented CDP governance and processes, ensuring consistency and best practices across different markets and future use cases.
    - Developed interactive visualizations to provide a comprehensive view of customer behavior and campaign results.

    Technical skills: ETL Pipelines, Databricks, Azure Data Lake, Azure Data Factory, Azure Event Hub, Azure SQL, Spark Streaming, Mural (collaboration tool), customer segmentation, recommendation systems, PowerBI.

    3. Data Science Consultant - Automated System for Analysis and Modeling of Laboratory Test Results
    Development of an automated solution for the analysis and modeling of laboratory test results from multiple providers.

    Achievements:
    - Strengthened quality control by implementing an automated email alert system triggered by defined warning and control limits.
    - Developed a trend modeling solution, improving forecasts by 15% and enabling accurate anticipation of upward and downward trends.
    - Implemented PowerBI dashboards providing business teams with actionable insights and relevant information.

    Technical skills: Azure Databricks, Pyspark, Azure Data Lake, Azure Data Factory, Sendgrid Email API, PySpark, Time Series Forecasting and Modeling, Power BI

    4. Data Science Consultant - Automated System for Analysis of Life Insurance Beneficiary Clauses
    Implementation of an algorithm automating the validation process of life insurance beneficiary clauses, improving the accuracy and efficiency of their validation by insurers.

    Achievements:
    - Reduced manual work time for beneficiary clause validation by over 90%.
    - Exceeded project expectations by achieving 94.33% performance, ensuring high accuracy in identifying and validating beneficiary clauses.

    Technical skills: Python, Google Cloud Platform, NLP, Spacy, Regex, Sentence Embeddings, Semantic Similarity, Tensorflow, Universal Sentence Encoding

    5. Data Consultant - Information Extraction
    Automation of the extraction of employment and temporary worker information from databases.

    Achievements:
    - Developed an automated employment information extraction process, reducing manual workload by 100%.
    - Extracted information such as first names, last names, start dates, job titles, job descriptions, etc.

    Technical skills: Python stack for data science, Named Entity Recognition, Data Mining

    6. Data Migration Consultant
    Carried out a data migration from local data sources of different markets to APIs.

    Achievements:
    - Rigorous management of compliance with regulations and conformity requirements, enabling the migration to be completed on time and without issues.
    - Development and implementation of scripts for retrieving data from Amazon S3, processing it, transforming it, and mapping it to the expected schemas at the API level.
    - Deployment of scripts capable of calculating and capturing differences between two data snapshots, allowing precise identification of additions, deletions, and updates within the dataset.

    Technical skills: Pyspark, Azure Data Lake, Azure Data Factory, Amazon S3
    Data science Data Engineer Machine learning Data visualization Microsoft Azure Databricks Customer data platform Data Quality PySpark Azure eventhubs Power BI Life insurance Retail Customer Data Platform Times series Sengrid API Great Expectations Azure Data Lake Mural Azure DevOps Python NLP Spacy TensorFlow Azure Synapse Analytics Customer analysis and segmentation Spark Streaming Azure Data Lake Azure Data Factory Amazon S3 Microsoft Azure Azure DevOps PowerBI
  • KPMG SA - France
    Transactions Services Analyst
    CONSULTING AND AUDITS
    July 2020 - September 2020 (2 months)
    Paris, France
    Participation in consulting missions using tools, best practices, and methods developed internationally by KPMG.

    Technical skills: Python, Excel, Microsoft PowerPoint
    Python Microsoft Excel Microsoft Powerpoint

Recommendations

Be the first to recommend Dioula

Help this freelancer shine by sharing your experience working together.

These freelancer profiles also match your criteria

AgathaA

Agatha Frydrych

Backend Java Software Engineer

4.7

(3)

2

BaptisteB

Baptiste Duhen

Fullstack developer

4.6

(4)

5

AmedA

Amed Hamou

Senior Lead Developer

4

(2)

7

AudreyA

Audrey Champion

Web developer

4.3

(3)

4

Education

  • Statistical modeling and application
    Institut Polytechnique de Paris - Télécom SudParis
    2021
    Statistiques, Probabilités, Machine Learning, Deep Learning, Optimisation

Certifications

  • Microsoft Azure AI 900
    Microsoft
    2021
  • Natural Language Processing Specialization
    Deeplearning.AI - Coursera
    2020

Skill set (78)

Categories