About Dioula
French
Native or bilingual
English
Fluent
Experience
- CawoylelAI/Data EngineerRESEARCHMay 2023 - Today (3 years and 1 month)Paris, FranceCawoylel is an organization whose goal is to implement advanced language technologies (NLP) in the Fulani language.1. Collection and transformation of audio and text data from various sources and ingestion into Google Drive- Audio data alignment pipelines with their transcriptions- Creation of speech recognition datasets and deployment on Hugging Face in Open Source- Writing technical documentation2. Development of a collaborative web platform for data collection and annotation with Python - Flask - PostgreSQL- Dockerization and Deployment of the platform on AWS EC2- Implementation of CI/CD process with GitLab3. Extraction and alignment of translation data from PDF documents (Fulani - French dictionaries)- OCR with Google Cloud Vision to accurately extract textual data- Few-shot prompting with Langchain, Vertex AI, and Kor to align each Fulani word/phrase with its French translation3. Development of AI models for speech recognition and translation- Implementation of Windanam, a speech recognition solution trained on Meta's open-source MMS model- Deployment of the solution on a Hugging Face Space instance- Implementation of an interactive web application with Streamlit allowing users to interact with the model
- KPMG SA - FranceExperienced Consultant - Data Engineer - ScienceCONSULTING AND AUDITSMarch 2021 - May 2023 (2 years and 2 months)Paris, France1. Data Product LeadTechnical lead for the development and implementation of a data quality tool in PySpark on Azure using the open-source Great Expectations library.Achievements:- Development of a data quality control and validation framework integrated into the client's data ingestion processes.- Development of dashboards allowing business users to consult the latest data quality indicators.- Timely delivery of high-value data quality products that identified 100% of problems/errors and reported their evolution over time.- Improved visibility of data quality for sales and business teams, giving them a 360-degree view of the data, leading to more informed decision-making and increased confidence in data-driven decisions.- Recommendation of new data sources to integrate, expanding the organization's customer understanding and enabling new use cases.Technical skills: Azure Databricks, Azure Data Lake, Azure Data Factory, PowerBI, Sengrid Email API, PySpark, Great Expectations (open-source python library).2. Data Engineer Consultant - Customer Data Platform DeploymentImplementation and deployment of a Customer Data Platform (CDP), leveraging Azure Data Lake as the primary data source, for hyper-personalized digital marketing activities.Achievements:- Conducted use case evaluations and feasibility analyses with 5 different local markets, leading to the identification and prioritization of use cases maximizing business value.- Implemented PI Plannings with Mural to effectively plan, execute, and monitor all project iterations.- Automated data collection, transformation, and ingestion (batch, real-time) from various sources, including APIs and Azure EventHub, and ingested data into Azure Data Lake and the Customer Data Platform.- Segmentation and recommendation of personalized content based on unified data customer profiles.- Documented CDP governance and processes, ensuring consistency and best practices across different markets and future use cases.- Developed interactive visualizations to provide a comprehensive view of customer behavior and campaign results.Technical skills: ETL Pipelines, Databricks, Azure Data Lake, Azure Data Factory, Azure Event Hub, Azure SQL, Spark Streaming, Mural (collaboration tool), customer segmentation, recommendation systems, PowerBI.3. Data Science Consultant - Automated System for Analysis and Modeling of Laboratory Test ResultsDevelopment of an automated solution for the analysis and modeling of laboratory test results from multiple providers.Achievements:- Strengthened quality control by implementing an automated email alert system triggered by defined warning and control limits.- Developed a trend modeling solution, improving forecasts by 15% and enabling accurate anticipation of upward and downward trends.- Implemented PowerBI dashboards providing business teams with actionable insights and relevant information.Technical skills: Azure Databricks, Pyspark, Azure Data Lake, Azure Data Factory, Sendgrid Email API, PySpark, Time Series Forecasting and Modeling, Power BI4. Data Science Consultant - Automated System for Analysis of Life Insurance Beneficiary ClausesImplementation of an algorithm automating the validation process of life insurance beneficiary clauses, improving the accuracy and efficiency of their validation by insurers.Achievements:- Reduced manual work time for beneficiary clause validation by over 90%.- Exceeded project expectations by achieving 94.33% performance, ensuring high accuracy in identifying and validating beneficiary clauses.Technical skills: Python, Google Cloud Platform, NLP, Spacy, Regex, Sentence Embeddings, Semantic Similarity, Tensorflow, Universal Sentence Encoding5. Data Consultant - Information ExtractionAutomation of the extraction of employment and temporary worker information from databases.Achievements:- Developed an automated employment information extraction process, reducing manual workload by 100%.- Extracted information such as first names, last names, start dates, job titles, job descriptions, etc.Technical skills: Python stack for data science, Named Entity Recognition, Data Mining6. Data Migration ConsultantCarried out a data migration from local data sources of different markets to APIs.Achievements:- Rigorous management of compliance with regulations and conformity requirements, enabling the migration to be completed on time and without issues.- Development and implementation of scripts for retrieving data from Amazon S3, processing it, transforming it, and mapping it to the expected schemas at the API level.- Deployment of scripts capable of calculating and capturing differences between two data snapshots, allowing precise identification of additions, deletions, and updates within the dataset.Technical skills: Pyspark, Azure Data Lake, Azure Data Factory, Amazon S3
- KPMG SA - FranceTransactions Services AnalystCONSULTING AND AUDITSJuly 2020 - September 2020 (2 months)Paris, FranceParticipation in consulting missions using tools, best practices, and methods developed internationally by KPMG.Technical skills: Python, Excel, Microsoft PowerPoint
Recommendations
Be the first to recommend Dioula
Help this freelancer shine by sharing your experience working together.
These freelancer profiles also match your criteria
Agatha Frydrych
Backend Java Software Engineer
4.7
(3)
2
Baptiste Duhen
Fullstack developer
4.6
(4)
5
Amed Hamou
Senior Lead Developer
4
(2)
7
Audrey Champion
Web developer
4.3
(3)
4
Education
- Statistical modeling and applicationInstitut Polytechnique de Paris - Télécom SudParis2021Statistiques, Probabilités, Machine Learning, Deep Learning, Optimisation
Certifications
- Microsoft Azure AI 900Microsoft2021
- Natural Language Processing SpecializationDeeplearning.AI - Coursera2020