Freelancer profile translated to English.

Description

Profile: Data & Cloud Expert – Databricks | Spark | Terraform | dbt

Senior Profile with over 15 years of experience in the IT field, including over 8 years specializing in data projects. Recognized expert in cloud and Big Data platforms, with in-depth mastery of Azure and Databricks environments, as well as Apache Spark, Terraform, and dbt technologies.

Capable of designing and implementing scalable, secure, and performance-optimized data architectures. Significant experience in migrating on-premise platforms to the cloud, industrializing data pipelines, governance via Unity Catalog, and complete automation via CI/CD (Azure DevOps).

Key skills:

Azure (Data Factory, Key Vault, Private Link, RBAC, Networking)

Databricks (SQL Warehouse, Unity Catalog, Delta Lake, Jobs orchestration)

Apache Spark (batch & streaming, performance tuning)

Terraform (infrastructure as code, Azure provider)

dbt (modeling, testing, documentation, CI/CD orchestration)

Cloud & data architecture, security, cost optimization

Languages

French
Native or bilingual

Workplace preferences

Can work on-site

Paris (up to 50km)

SAUR
Databricks Data Architect
ENVIRONMENTAL
April 2023 - Today (3 years and 2 months)
Paris, France
Project: DIAG360: Application framework for permanent diagnostics of sanitation systems
• Migration of an AKS Spark architecture to Azure Databricks
• Creation of an IaC Diagperm module that allows terraforming all Databricks resources (Account Storage, KeyVault, Private Link, Service Principal, Databricks Workspace, Databricks Sql Warehouse, Databricks Compute Cluster, Catalogs, schemas, Volumes, RBAC, etc..)
• Creation of Azure DevOps pipelines for automatic deployment of terraformed Azure & Databricks resources (provider: azurerm and databricks)
• Implementation of a hybrid strategy: Databricks job of dbt type for analytical data processing via Databricks SQL WareHouse and PySpark type via Compute Cluster for heavy and complex processing
• Development and deployment of a new IODA module according to the new medallion architecture and according to the Modern Data Stack (ELT)
• Orchestration of the launch of Databricks jobs of dbt and PySpark type according to the execution environment (dev, rec, prod) and via Databricks Asset Bundles (DAB), integrated into an event-driven architecture
• Design of an ingestion pipeline with Azure Data Factory to collect web data from piezometers and store it in a dedicated Azure Blob container.
...
Environment: Azure Databricks (Jobs, Clusters, SQL Warehouses, Unity Catalog, schema, Notebook, External Location, Volumes), Azure DevOps (repos, pipelines, releases), Azure Kubernetes Service, Lens, Azure Monitor, Azure Event Grid, Azure Functions, Azure Cost Management, Log Analytics, Spark 3.5, Python 3.10, DeltaLake 2.2, Azure Data Lake Storage Gen2, Azure Blob Storage, Azure Active Directory, Azure Portal, Azure Data Factory, IaC, Terraform HCL, dbt(Models, Sources, Macros, Tests), IoT, Power BI, Power BI Desktop, DAX, DirectQuery, DLT
Spark Cloud Azure Databricks PySpark Python
BNP Paribas
Spark Data Architect
BANKING AND INSURANCE
November 2019 - March 2023 (3 years and 5 months)
Paris, France
Project: STRESS TESTING RISK: Application framework for measuring the bank's resilience to financial and economic stress
• Implement and develop a new Flair module (Multi-Stress Management), implementing Spark/Scala/Java development best practices
• Study and implement a technical solution to ensure the scalability of the Icaap application (from 30K to 100K scenarios): code optimization, Spark config tuning
• Tuning the scalability and performance of Spark Engine applications (partitioning, data skew, broadcast, Spark config, DAG, JVM memory)

...
Environment: Spark 2.4.7, Scala 2.11.12, Spark SQL 2.4.7, HDP 2.6.5, CDP 7.1.7, Hive 3.1, Hadoop 3.1, HDFS 3.1, Yarn 3.1, ZooKeeper 3.5, Oozie 5.1, Hue 4.5, Knox 1.3, Ranger 2.0, Tez 0.9, Cloudera Manager 7.1, Python 2.7, Dataiku 10.0, PySpark, Jupyter, JDK 1.8, Spring 5.0.5, Maven 3.6, Git, Jenkins Pipeline, SonarQube
Spark Scala java Java/JEE Hadoop
Société Générale
Spark Data Architect
BANKING AND INSURANCE
September 2016 - October 2019 (3 years and 2 months)
Paris, France
Project: AXE-INTERNET: Cross-functional framework ensuring authentication for Internet and Mobile applications of Crédit du Nord
• Implement the Big Data application architecture for feeding the ALS alerting system, Configuration and development of the different components.
• Development of a module for fraud detection by analyzing application logs
• Development and planning of Spark jobs for extracting operational statistics from the Data Lake
• Redesign of the application log collection system based on SyslogNG, by storing and processing them within a Hadoop cluster
...
Environment: Redis, Redisson 3.10, Spark 1.6, SyslogNG, Log4J, Kafka Appender, Kafka 2.1, Spark Streaming 1.6, JDK 1.8, Scala 2.10, Spark submit, Hive 2.1, Hbase 1.2, Hadoop 2.6, HDFS, Yarn 5.0, ZooKeeper 3.4, Oozie 4.10, Maven 3, NoSQL, LDAP, WebLogic server 10.3
Spark Streaming Kafka Java Hadoop Jenkins