You're seeing this page as if you were . The main menu is still yours, though. Exit from immersion
Toni BadrTB

Toni Badr

Data Engineer | Azure | Databricks | Palantir

€600/day
Paris, FR
8-15 years

Average response time: 1 hour

Freelancer profile translated to English.
Back to original language

About Toni

Are your data pipelines slow, expensive, or difficult to industrialize? I design and deploy robust, production-ready Azure/Databricks architectures.
What I bring concretely:

Design of batch and streaming ETL/ELT pipelines (ADF, Databricks, PySpark)
Delta Lake architecture (Bronze/Silver/Gold), data quality, and cost optimization
DataOps industrialization: monitoring, partitioning, performance
Experience with Palantir Foundry for demanding environments
Recent projects: Azure serverless platform for real-time Bloomberg ingestion (Finance/CORUM), multi-source Denodo data virtualization (Orange Business Services), ADF pipelines for Cloud migration (CNAS).
Stack: Databricks, Spark/PySpark, Python, SQL, ADF, Synapse, Azure, Kafka, Denodo, Palantir
Available for build / migration / optimization data platform missions — Paris
  • Arabic

    Native or bilingual

  • English

    Fluent

  • French

    Fluent

Can work on-site
Paris (up to 50km), Lille (up to 10km)

Experience

  • CORUM
    Data Engineer
    BANKING AND INSURANCE
    January 2026 - Today (5 months)
    Paris, France
    Design and development of an end-to-end Azure serverless platform for ingesting, processing, and exposing market data from the Bloomberg Data License API, covering investment, valuation, and portfolio tracking needs.
    Development of Azure Functions in Python to automate Bloomberg flows (DataRequest, HistoryRequest), with OAuth2 / JWT HS256 authentication, asynchronous polling, retry policy, and exponential back-off for long-running requests.
    Optimization of large financial data retrieval (CSV, CSV.gz, ZIP) with streaming read, Python parsing, schema normalization, and quality controls: detection of missing values, forward-fill on business days, and complete traceability of REAL / FORWARD_FILLED / FALLBACK statuses.
    Automation of quantitative processing on financial historical data: calculation of returns, NAV, valuation, and temporal aggregations in Python, producing datasets directly usable by Finance, Risk, and Investment teams.
    Incremental ingestion of financial data: after each Bloomberg execution, data is uploaded incrementally to Azure SQL via Azure Data Factory (ADF) pipelines, with delta management and flow orchestration between environments.
    Daily feeding of an SFTP: a dedicated ADF pipeline consumes data stored in Azure SQL and generates a file daily, automatically deposited on the target SFTP, ensuring reliable and scheduled delivery to consuming systems.
    Storage and exposure of data in Azure Cosmos DB and Azure SQL, with collection modeling, SQL queries for interrogation and aggregation, and development of stored procedures to encapsulate critical business processes.
    Containerization of Azure Functions with Docker and multi-environment deployment (dev / preprod / prod) via YAML Azure DevOps CI/CD pipelines.
    Microsoft Azure Bloomberg Python SQL Data Engineer
  • cnas
    Data Engineer
    TRAVEL AND TOURISM
    June 2025 - December 2025 (5 months)
    Guyancourt, France
    Analysis, redesign, and security of Azure integration flows for the Voyagiste project following the migration of SharePoint sources to SFTP (FileZilla), within an Azure Cloud environment.
    Design and development of Azure Data Factory (ADF) pipelines, including Data Flows for ingestion, transformation, and automatic orchestration of multi-format CSV and TXT files.
    Centralization of data in Azure Data Lake Storage Gen2 (ADLS) through the implementation of a standardized landing zone, ensuring schema consistency.
    Implementation of data quality rules (cleaning, typing, normalization, consistency checks) directly within ADF Mapping Data Flows to ensure the reliability of the Azure Data Lake.
    Advanced management of ingestion errors (inconsistent schemas, corrupted files, missing data) via logging, alerting, and exception handling mechanisms in Azure Data Factory.
    Support and maintenance of historical Talend flows, correction of incident tickets, and impact analysis in coordination with the RUN team.
    Support for the technical transition from Talend to Azure Data Factory, ensuring service continuity and gradual scaling of Azure processes.
    Contribution to the High-Level Design (HLD/HLDF) of the Azure integration architecture, in collaboration with the Data Architect, integrating principles of Cloud scalability, maintainability, and evolvability.
    Azure Data Factory DBeaver Talend Azure Databricks Data Engineer
  • Projet personnel
    Data Engineer - LLM
    TELECOMMUNICATIONS
    May 2025 - September 2025 (4 months)
    Paris, France
    - Collection, ingestion, and preparation of textual data from Goodreads and Project Gutenberg (titles, authors, genres, summaries, ratings) via structured Python pipelines, with HTML cleaning, field normalization, UTF-8 encoding, and advanced corpus structuring to ensure the quality of data used by LLMs.
    - Generation of semantic embeddings via OpenAI text-embedding-ada-002 for vector representation of book meaning, tone, and style, combined with large-scale indexing using FAISS for high-performance semantic search across thousands of documents.
    - Design and implementation of a RAG (Retrieval-Augmented Generation) architecture with LangChain RetrievalQA, enabling LLMs to answer natural language queries contextually, accurately, and reliably, by leveraging structured knowledge bases.
    - Implementation of a semantic and business reranking system, combining embeddings, metadata (SQL: ratings, popularity, genres), and user context to improve the relevance, diversity, and personalization of generated responses.
    - Optimization of the LLM pipeline: adaptive chunking, dynamic context adjustment, fine-tuning of similarity thresholds, and prompt versioning to balance response quality and scalability.
    - Development of an interactive GenAI application with Streamlit, offering personalized recommendations, intelligent conversational exploration of the catalog, and a natural language querying interface.
    - Implementation of rigorous LLMOps practices: prompt versioning, query logging, continuous evaluation of response quality through relevance metrics, performance monitoring, and iterative improvement of production models.
    data-cleaning-and-preprocessing LLM Fine-tuning Langchain Data Engineer

Recommendations

These freelancer profiles also match your criteria

AgathaA

Agatha Frydrych

Backend Java Software Engineer

4.7

(3)

2

BaptisteB

Baptiste Duhen

Fullstack developer

4.6

(4)

5

AmedA

Amed Hamou

Senior Lead Developer

4

(2)

7

AudreyA

Audrey Champion

Web developer

4.3

(3)

4

Education

  • Analysis, Data Management, and Innovation
    Université Gustave Eiffel
    2022
    - Ingestion et transformation de données (ETL / ELT) - Conception de pipelines data batch - Traitements distribués Spark / Databricks - Modélisation analytique (facts, dimensions) - Requêtage et transformations SQL - Data Engineer - Hadoop - Power BI - Scrum - Azure Data Engineering - Databricks - Palantir - Python - SQL

Certifications

Skill set

Categories