Freelancer profile translated to English.

Description

Nearly 30 years of experience in AI (Artificial Intelligence):

Recent years: Deep Learning, Data Science, Big Data

Key Skills: Deep Learning (Deep Learning: CNN, RNN, TensorFlow, PyTorch, etc.), Deep NLP (Natural Language Processing: BERT, ULM-FiT, ELMo, Transfer Learning, OpenNMT, OpenAI Transformer, AllenNLP, Stanford CoreNLP), Data Science (Apache Spark MlLib, PySpark with Optimus, Mahout, R, spaCy, Anaconda), Hybrid Model (predefined structures + neural network + weights / stochastic, e.g.:)B. LSTM (Long Short-Term Memory), GRU (Gated Recurrent Units), Attention, Feast AI), PMML, ONNX, OpenScoring.io, Deep Learning Intermediate State Storage + Models, Knowledge Representation and Inference, Semantics, Virtualization, Management with Docker, Kubernetes, Airflow, etc.

Professional Activity Start (1998 - 2010) during AI Downturn: Semantic Search, Web Content Scraping and Analysis, Discreet and Secure Communication, Text Watermarking, Competitive Intelligence

Key Skills: Stochastic, Statistical and Scientific Data Libraries, Semantic Web, Semantic Search with Ontologies/Thesauri/Lexical Data Structured with Stochastic Similarity Measures on Terms/Content, OWL, DAML+OIL, NLP Analysis with Formal Grammars such as HPSG, LFG, Chart Parser, Generative Lexicon, MRS (Minimal Recursion Semantics), Expert Systems, Constraints, AI Planning Systems/Workflow Management System (WMS), Data Mining, Business Intelligence (BI) with Relational and Object-Oriented Databases, Helpdesk Automation, Office Automation (OCR + ICR: e.g. Examination of Medical Bills, Insurance Claims, Text Element Proposals for Responding to Letters).

Languages

French
Native or bilingual
German
Native or bilingual
English
Fluent

Workplace preferences

Can work on-site

Strasbourg (up to 50km), Strasbourg (up to 100km), Bâle (up to 100km), Zurich (up to 100km)

Chemins de fer Allemands
IT Architect, Agile Coach and Technical Project Manager
TRANSPORTATION
April 2019 - Today (7 years and 2 months)
Francfort-sur-le-Main, Germany
Design of an Open Source SOC (Security Operations Center)
1. Agile Coaching: SAFe + Design Thinking, improving productivity and collaboration.
2. Requirements Engineering, Use Cases 2.0: Engineering of SIEM/SOC features in general and in the railway context. Analysis of cost/benefit aspects of use cases and their dependencies as input for value-based agile product management/product owner activities.
3. Research, Testing and Analysis of leading Open Source SIEM/SOC Systems: Apache Metron / HCP (Hortonworks Cybersecurity Platform), Apache Spot, dataShark, Alienvault OSSIM, Graylog, SIEMonster, Hunting ELK (HELK), Wazuh, MozDef, OSSEC, Prelude OSS, Snort, QuadrantSec Sagan, Suricata, OpenStack Vitrage.
4. Splunk: Installation, Configuration, Analysis and Connection to Input Sources, Creation of Splunk Analysis and Visualization Use Cases with SPL (Search Processing Language).
5. Creation of a general SOC architecture with scopes for minimal, basic, advanced and premium configurations with up to 100 components. Based on this, analysis and presentation of opportunities/costs/risks to meet requirements and use cases towards management and engineering groups.
6. Vision of the future of SOC architecture based on Apache Metron + Kafka + Spark + Elastic/ELK stack (ElasticSearch, LogStash, Kibana) and design of its component architecture - preferably with open source tools to reduce costs. Numerous concrete suggestions for improving the SOC (Security Operations Center), creating a new SOC architecture with AI elements: Big Data/Data Science approach for attack/malware/APT detection with machine learning and focus on reducing false positives. Visualization concept for suspected attack cases with respective security contexts via Design Thinking.
7. Open Source SOC PoC (Proof of Concept): Collection of requirements / use cases, creation of the architecture based on 3 pillars: log processing with Solr/Elastic, Open Source SOC elements (RegEx, Match Expressions with Spark, Kafka, Solr etc.) as well as an AI pillar composed of Data Science and regulatory-based AI with Spark as well as Deep Learning with TensorFlow and PyTorch.
8. Creation and coordination of the Open Source SOC PoC project plan and architecture with the general management of the railway company (CISO, technical office), creation of 7 job profiles and interviews for recruitment and hiring based on this.
9. Acquisition of a PC and server with Deep Learning GPUs as well as cloud access.
10. Design of the introduction of Docker/Kubernetes for TensorFlow and PyTorch-Machine-Learning: Comparison with the containerd alternative with GRPC, Docker Registries with YAML for Kubernetes, Flannel (Layer 3 network configuration). Kubernetes Tools: kubelet (primary node agent), kube-proxy, Container Runtime, (High Availability) HA endpoints, kubernetes-ha, Kube-apiserver, kubeadm, cluster autoscaler, scheduler, Helm (Kubernetes Package Manager, Microservices), Tiller (Helm server part), Ingress (load balancing, SSL termination, virtual hosting), kube-keepalived-vip (Kubernetes virtual IPs using keepalived), Kubespray (deployment of a production-ready Kubernetes cluster). Analysis of Kubernetes and airflow failures for risks and derivation of best practices/recommendations.
11. Optimized scheduling concepts regarding maximum performance and throughput for Apache Spark through caching with Alluxio, data locality optimization and minimization of data-intensive operations: Custom Spark Scheduler / Spark Scheduler / Backup Task / Backup Task / SubDAG Combiner for Dynamic Workflows (in-memory optimization), Deep Learning Pipelines, Horovod, TensorFlowOnSpark, TensorBoards, TensorFrames, Data Lineage Optimizations.
12. Development of a comprehensive test management concept to improve the stability of developed code with a focus on data acquisition, AI, DevOps, CI/CD pipeline (Continuous Integration/Continuous Deployment with Jenkins and Sonar(Qube)), metadata and IT security to channel and improve code through development and integration tests and pre-production environments, and to improve it.)
13. Analysis of technologies that could succeed Deep Learning, such as Hierarchical Temporal Memory (HTM), Graph Neural Networks / Memory / Transformers (Convolutional Neural Networks), including their freely available implementations and Probabilistic Logic Networks (PLNs): [Naive] Bayesian Belief Networks (BNN), Markov Logic Networks (MLN), Conditional Random Fields (CRF), Directed Graphical Models (DGM), Statistical Relational Learning (SRL), Stochastic Grammars and OR (AOGs/SAOG), Probabilistic Relational Models (PRMs), Markov Logic Networks (MLN), Relational Dependency Networks (RDN), Bayesian Logic Programs (BLP), Probabilistic Graphical Models (PGM), Markov Random Fields (MRF), Contextual Markov Graphical Models (CGMM), Hidden Markov Models (HMMs), Human Brain Neurons (HBNs).
14. Development of a new Explainable AI (XAI) method that can replace Deep Learning by combining and advancing several other models and techniques.
15. Grant application prepared for the federal government's AI grant program for IT security: Innovative ideas developed, latest AI, Data Science and Big Data procedures and further developments proposed for the detection of unusual behavior/attacks/malware as well as the latest NLP procedures for automatic analysis of text attacks and malware descriptions on the Internet or in emails/wikis as well as the application of Cyber Grand Challenge elements via Deep Learning, RNNs, CNNs. Development of the corporate strategy and business plan for the separate commercialization of the planned innovations.
16. Creation of security concepts for Windows and Linux PCs and servers through numerous security settings, more logging, etc. and installation of up to 50 analysis and monitoring tools such as Sigar, Config. Discovery, File Integrity Checker (Afick), CCG Tools: BinaryAnalysisPlatform bap, angr, s2e, KLEE, Strace, ZZUF, BitBlaze.
17. Design of classic data science analyses concerning suspicious activities with GBM (Gradient Boosting Machine), XGBoost, CatBoost, LightGBM, Stacked Ensembles, Blending, MART (Multiple Additive Regression Trees), Generalized Linear Models (GLM), Distributed Random Forest (DRF), eXtremely Randomized Tree (XRT), Labelling, Bootstrap Aggregating (Bagging), ROC/AUC (Receiver Operating Characteristic).
18. Analysis of the best Deep Learning implementations in the respective sub-domains: ResNet, ResNext, DenseNet, MSDNet (Multi-Scale DenseNet), RepMet, EfficientNet as well as the following NLP implementations (e.g., for extracting structured descriptions from text IoCs - Indicators of Compromise): BERT, FastBert, SenseBERT, RoBERTa, GPT, GPT-2.
19. Design/Development of Neural Deep Learning Architectures for TensorFlow, Keras, PyTorch with these elements: (De)Convolution, [Dimensional] [Min/Max/Mean] [Dimensional] [Min/Max/Mean] (Un-)Pooling, Activation Functions, ReLU (Rectified Linear Units), ELU (Exponential Linear Unit), SELU (Scaled Exponential Linear Unit), GELU (Gaussian Error Linear Unit), SNN (Self Normalizing Network), LSTM (Long Short-Term Memory), GRU (Gated Recurrent Unit), Differentiable Associative Memory (Soft RAM/Hash Table), Episodic Memory, Memory Networks, Self-Attention, Self-Attention, Self-Attention, (Masked Multi) Self-Attention, NAC (Neural Accumulator), NALU (Neural Arithmetic Logic Unit), Squeeze-and-Excitation (SE) / SENet, SPN (Sum-Product Network), VAE (Variational Auto-Encoder), FCLs (Fully Connected Layers), PLNs (Probabilistic Logic Networks), GANs (Generative Adversarial Networks), Capsule Networks, gcForest, Differentiable Programming, Neural Architecture Search (NAS), Differentiable Neural Networks, [Transposed] (De)Convolutions, ETL (Extract, Transform, Load) with Input/Output Integration, (Layer) Normalization, Softmax, Deep Machine Learning, Episodic Memory, Differentiable Associative Memory, Large Memory Layers with Product Keys, Deep Q-Learning (Double), Semi-Supervised Learning (SSL), Various (Add, Concatenate, Segment, Linearize, (Conv.), Reinforcement Learning, Q-Learning, Convolutional Models/Learning, Google Dopamine.
20. Design of Deep Learning Architectures for the following Use Cases / Use Case Slices: Malware dissemination across security zones, detection of malware behavior (checking, propagation, hijacking), frequent attacks, including OS-API attacks, code injection, etc., CPU cycles stolen by malware, possibly by hooks in event queues to detect their execution, ROP (Return Oriented Programming) with ROPNNN variant on standard libraries by comparison with usual entry points to evaluate; meta-level models: analysis of network metadata, detail level: analysis of user data to exploit code/data, etc., current threats, known IoCs, neural analysis of Afick/tripwire data, encryption detection and key exchange.
21. Detailed Comparisons: Elastic with Solr and major JavaScript frameworks: React, Angular and Vue.js, respective native frameworks (Ionic etc.) as well as the electronic platform and major clouds: Amazon AWS, Google GCP and Microsoft Azure as well as Docker/Kubernetes, Websockets vs REST, GraphQL vs Odata vs ORDS, comparison of appropriate DBs, e.g., for interval analyses, AWS RedShift vs Athena.
22. Detailed design of Solr aspects: SolrCloud/HDP Search, integration with Apache Ranger + Sentry + Atlas, optimized performance SolrJ client with parallel queries, distributed indexing, index sharding, index splitting and rebalancing (also during runtime), cross data center replication (CDCR), Solr security (Kerberos, AD login, SASL, SSL), versioning with Avro & LDP (Linked Data Platform) and Apache Marmot-ta/RFC 7089 Extended Cluster vs Synchronized Multi-Cluster, scaling, definition of Solr Index Identifier (UID), High Availability (HA) and Disaster Recovery (DR) mechanisms, Solr HA, Load Balancing Concept (HW-based via F5, Ping against SolrCloud Node, solr healthcheck, Zookeeper, ContentQuery related Test Collection, SolrJ Client), replication, design of overlay networks (SDN, Software-Defined Networking).
23. Design of Amazon AWS cloud architecture with cloud migration concept and from monolithic to microservices approach, risk avoidance strategy, virtualization, efficient JavaScript UI with React, cloud security concept, microservice architecture, microservice versioning strategies, optimized data exchange, use of AWS Storage Gateway, Redshift, DDD (Domain-Driven Design) and Bounded Contexts, Product Line Architecture, Single-Sign-on Concept (SSO), etc.
24. Research and analysis of available data on security incidents and hacking for classic machine learning (Spark MLlib etc.) and deep learning (TensorFlow, PyTorch). There are about a hundred different sources, but with varying quality labeling, different conversion and adaptation efforts, etc.
25. Generation of our own IT security data for machine learning (ML) via fully equipped Linux and Windows environments (PC, vmWare), in which about 50 PenTesting tools like MetaSploit, Auto-Sploit etc. were executed. Instructions for normalizing and linking the data thus created as well as external data. Creation/extraction of regular expressions and generation of similar attacks/payloads based on this.
26. Design + Development of a control library in Scala for recognition and AI, which monitors and controls all key SOC elements.
27. Design + Development of a Scala UI and query library, which visualizes intelligent analyses in the Kibana dashboard with React and performs query mapping in SQL, HQL, Solr and similar dialects via Apache Drill with Drillbits. Here, we have largely recreated Splunk's SPL (Search Processing Language) as our OPL (Open Processing Lanaguage). It is essentially SQL extended by information on representation in the UI.
28. Research/Analysis/Extension of current ideas/tools to technical friction points in the (sub-)projects or direct proposal of solutions:
a. Analysis of semantic tools, symbolic AI and explainable AI for the AI funding program for security as well as for new work packages: KL-ONE: Protégé, LOOM, Knowledge Engineering Environment (KEE), Pellet, RacerPro, FaCT+++ & HermiT, Non-Linear Planner, CBR (Case-Based Reasoning), RDF (Resource Description Framework)/SPARQL (SPARQL Protocol and RDF Query Language), OpenCog (AtomSpace, Atomese, MOSES/MetaCog, Link-Grammar), induction/deduction technology such as OWL/OWL-DL (Ontology Web Language Description Logics), leading implementation: Apache Jena OWL, HPSG (Head-driven Phrase Structure Structure Grammar) Parsing: DELPH-IN PET Parser, Enju, Grammix, Stanford CoreNLP, OpenEphyra, Frame-Logic, Explainable AI with LOCO (Leave-One-Covariate-Out).
b. NLP (Natural Language Processing) / Computational Linguistics Research & Evaluation: Analyze/parse natural scene images as well as textual analysis of image captions/descriptions from the Internet to train image processing models (Stanford CoreNLP approach); classify incident tickets / texts into categories/realities; maintenance / lessons learned: Analysis of textile reports from technicians on IT/driving problems and autonomous driving difficulties (misclassifications/reactions) for NLP-level knowledge/feedback.
Tools/Algorithms: OpenAI GPT/GPT-2 (Generative Pre-trained Transformer), Facebook XLM (Cross-lingual Language Model Pretraining), Facebook PyText (NLP Modeling Framework, on PyTorch), Google BERT (Bidirectional Encoder Representations from Transformers), Combined Multi-Task Model NLP, Pre-training of complete (language/deep learning) models with hierarchical representations, attentional models, DLNLP (Deep Learning NLP: Embed, Encode, Attend, Predict), HMTL (Hierarchical Multi-Task Learning Model), semi-supervised learning algorithms for creating proxy labels on unlabeled data, BiLSTM, SalesForce MetaMind approach, Deep-Mind, Deep-Mind, Deep Transfer Learning for NLP, pre-trained language models, word embeddings / bag-of-words, Sequence-to-Sequence Models, memory-based networks, contrastive learning, reinforcement learning, semantic role identification, representation learning, text classification with TensorFlow estimators, word2vec, vector space model/feature mapping for integration, skip-grams, Seq2seq Encoder Decoder, ULM-FiT, ELMo, OpenAI Transformer / GPT, Google BERT, BERT, BERT, Transfer Learning, OpenAI Transformer, spaCy + Cython for Acceleration, genSim, OpenNMT (Neural Machine Translation), AllenNLP (on PyTorch), OpenNLP, Amplification Learning for learning correct classifications/label assignments/Qestions & Answers, Deep Latent Variable Models, Visual Commonsense Reasoning, Model Diagnostic Meta-Learning (MAML), Multi-Hop Thinking, Attention Masks for (Self-Attention) GANs (SAGAN), TensorFlow Lingvo (NLP sequence models), OpenEphyra (part of IBM Watson).
c. For NLP Generation: OpenAI GPT/GPT-2 (Generative Pre-trained Transformer), Facebook XLM (Cross-lingual Language Model Pretraining), Google BERT (Bidirectional Encoder Representations from Transformers).
d. AI / AI / Data Science / Big Data: Algorithms and Tools: Vs LSTM. GRU, Feast AI Feature Store, K8s Sidecar Injector, TensorFlow 2.0 (Update/Migration Benefits), Tensor Comprehensions, New Ordinary Differential Equations, Visual Common Sense Reasoning, Deep Learning, RNN, CNN for Autonomous Cars / Temporally Coherent Virtual 3D City Generation Logic, Deep Labeling for Semantic Image Segmentation with Keras / TensorFlow, Design Models for Deep Learning, RNN, CNN Architectures, DeepMind (Kapitan, Scalable Agent, Learning to Learn, TF Reinforcement Learning Agents), Via QoS (QoS Load Management), Fusion.js (JS infrastructure supporting React, Redux & pre-configured optimized boilerplate, hot module reloading, data-oriented server-side rendering, packet splitting, plugin architecture, observability, I18n), Horovod (distributed training framework for Tensor Flow, Keras, PyTorch), Ludwig (train and test deep learning models without coding), AresDB (GPU-powered real-time analytics engine from Uber), Uber's Sparse Blocks Network (SBNet, TensorFlow algorithm), Google Reinforcement Learning framework based on TensorFlow, Kubernetes Operator for Apache Spark, FastAI Deep Learning, Polygon-RNN++, Flow Framework: Agile Product-to-Product Process, IntelAI OpenVINO (inference service component for AI models), IntelAI Nauta distributed computing environment for DL model training, TensorFlow Extended (TFX), Salesforce Einstein TransmogrifAI (Machine Learning Automation with AutoML), OpenCV (Open Computer Vision Library), GluonCV, Angel-ML (processing of higher-dimensional ML models), Acumos AI (design, integration and deployment of AI models; AI Model Marketplace), (Paddle Framework EDL: Elastic Deep Learning: optimizes deep learning work and latency in the cluster: Kubernetes Controller and fault-tolerant deep learning framework: Paddle Paddle & Tensor Flow), Pyro (Deep Probabilistic Programming Language), Jaeger (distributed tracing system from OS, optimized for microservices).
e. Suggestions for accelerating deep learning e.g. with recent publications (e.g., model compression, use of hardware properties) as well as integration of domain knowledge / semantics / rules / decision tables / ontologies / explainable AI knowledge results into deep learning; Development of optimized hybrid learning models (deep learning [reinforcement] combined with classic learning methods).
f. Concept for AIops (Artificial Intelligence Operations) / AI-based operational optimization in the context of metadata management and acquisition:
i. Concept for the implementation of a CMS / Security Management System (SGMS) to minimize human errors in script programming / execution: all relevant hard-coded parameters have been stored in a separate CMS database or minimally in an environment-specific configuration / property files extracted, parameter sets for the development environment, one for the test environment, .... into the production environment (Python NetworkX, Snowflake, ...).
ii. Concept for scaling and accelerating AI workloads, managing complex workloads, accelerating the development and deployment of statistical models, pre-optimizing AI workload platforms: Data acquisition and preparation, data modeling and training, data provisioning and operationalization, integration of machine learning with pre-established plans for chef, puppet, airflow and automatic storage capacity provisioning, Predictive memory optimization (in hyperconverged environments), AI that configures hyperconverged application acceleration hardware, password and PII (Personally Identifiable Information) discovery to know when to start workloads with high CPU/GPU requirements and durations (which are currently unavailable). For example, deadlocks, timing issues or other jobs may have to wait), when to start deep learning or IE jobs with lower priority and when to transfer resources to high-priority jobs or workloads, when to start diagnostic collection processes after warnings, errors or failures, ....
29. NLP (Natural Language Processing) analysis of log and web content:
a. Extraction of IoC (Indicator of Compromise) content in continuous text in STIX format for further semi-automated processing, such as automated file hash lookup, analysis and blocking of open ports and incoming/outgoing connections.
b. Semantic categorization (problem category, error severity and possible effects/risks, urgency) and textual NLP analysis of log content with genSim, spaCy and partly also with Google BERT, GPT, Graph-ConvNets with Octavian, Google Sling, TensorFlow graph_nets & gcn (Graph Convolutiontional Networks), PyTorch Geometric.
30. Data Science consulting as well as management and conversion concepts for machine learning models with ONNX (Open Neural Network Exchange: High-performance optimizer and inference engine for machine learning models and converter between TensorFlow, CNTK, Caffe2, Theano, PyTorch, Chainer formats).
DS Approach (Data Science) A mix of anomaly detection, principal component analysis, nearest neighbor methods, neural networks, time series analysis + seasonality analysis, anomaly detection, association, maximum likelihood estimator, random forest, gradient boosting, CatBoost, LightGBM, SHAP (SHapley Additive Explanations), stacked ensembles, blending, MART (Multiple Additive Regression Trees), AutoML, Auto-Keras, Dopamine, Generalized Linear Models (GLM), Distributed Random Forest (DRF), eXtremely Randomized Tree (XRT), Labelling / Labeling, Bootstrap Aggregating (Bagging), Receiver Operating Characteristic / AUC, Cubist (extension of Quinlan's M5 model tree), C4.5, Association Analysis, (Non)linear Regression, Multiple Regression, Priority Analysis, Classification Analysis, Link Analysis Networks; TensorFlow + Keras and PyTorch - also for semantic security analysis: labeling and supervised learning for correct classification, distributed hyperparameter tuning. partial dependency plots [model leaks, if statements in if statements, ....]; Model storage in PMML with OpenScoring.io and HBase / MapR-DB + Apache Phoenix, metadata visualization, KPIs, UMAP dimension reduction, STN-OCR.
Libraries / Tools Docker, Kubernetes, Scala, Python, Airflow, Kubeflow, CeleryExecutor, RADOS + Ceph, TensorFlow stack with Keras, AutoKeras or PyTorch + Auto-PyTorch + AddOns (Ignite, PennyLane, Geometric), About Horovod, Apache Spark Stack with Spark Stark, Spark SQL, MLlib, GraphX, Alluxio, TransmogrifAI, Alluxio, TensorFlowOnSpark, PySpark with Optimus, Jupyter, Zeppelin, PyTorch, MXNet, Trainer, Keras, Horovod, XGBoost, CatBoost, RabbitMQ, ONRX, Hydrhibing, Server (Agility Continuous Testing Agility), Red Hat OpenShift, Elastic / ElasticSearch, MS Azure Hybrid Cloud, Kafka, Kafka-REST Proxy, KafkaCat Integration, Confluent, Ansible, OpenTSDB, Apache Ignite with TensorFlow / ML, CollectD, Python 3.x, Flask (Microframework Python: REST, UI), Coconut functional programming for Python, Robot Framework (ATDD), CNTLM, Samba, Nginx, Grafana, Jenkins, Nagios, Databricks (Spark, Kafka, Connectors) to R, TensorFlow, etc.), snowflake schema, Scik it-Learn, RHEL, Ubuntu, Scrum + Design Thinking + SAFE.
PenTesting Tools: AutoSploit, Metasploit, Burp Suite, Nexpose, Nessus, Tripwire, Impact CORE, Kali Linux, Snort, Bro, Argus, SiLK, tcpdump, WireShark, parosproxy, mitmproxy, nmap, Security Onion, Bro, Sguil, Squert, CyberChef, NetworkMiner, Silk, Netsniff-NG, Syslog-NG, Stenographer, osquery, GRR Rapid Response, Sysdig Falco, Fail2Ban, ClamAV, Rsyslog, Enterprise Log Search and Archive (ELSA), Nikto, OWASP Zap, Naxsi, modsecurity, SGU, Mimikatz, Impact CORE, Kali Linux.
Log Processing Tools: OpenSCAP, Moloch, ntopng, Wireshark plugins +, Fluentd message parser, SQL queries: SploutSQL, Norikra + Esper (Stream/Event Processing)
Cyber Grand Challenge (CGC) Tools: BinaryAnalysisPlatform bap, angr, s2e, KLEE, AFL (lop fuzzy lop), Strace, ZZUF, Sulley / boofuzz, BitBlaze, Shellphish / Mechaphish Tools: how2heap, fuzzer, driller, rex
Protocols: AES, RSA, SHA, Kerberos, SSL / TLS, Diffie-Hellman
DBs: HBase + Phoenix, Hive, PostgreSQL, Druid, Aerospike, Hive, Lucene / Solr / Elasticsearch, SploutSQL
NLP Stack with Google BERT / Sling, spaCy, GPT-2, Stanford CoreNLP, AllenNLP, OpenEphyra, DELPH-IN PET parser, Enju, Grammix
Logic / Semantic Tools: Protégé, LOOM, RDF (Resource Description Framework) / SPARQL, OpenCog, Apache Jena OWL, Frame Logic
OCR / ICR Libraries: Tesseract OCR Engine, OCRopus, Formcraft, Kofax KTM (Kofax Transformation Modules), STN-OCR
Other Security Tools: IDS / IPS, NetFlow and log collection and analysis tools, such as Snort, Suricata, Bro, Argus, SiLK, tcpdump or WireShark, Cuckoo Malware Analysis, Disassembler, Prometheus Monitoring, OCS Inventory NG, System Configuration + Activity Analysis: Sigar, Config. Discovery, File Integrity Checker (Afick), Apache Nifi Data Flow / Hortonworks, Elastic Stack (beats, logstash, Elasticsearch, Kibana, reaction + Kibana, Solr Stack (SolrCloud, SolrJ client, banana), Apache Drill Queries, UIs, Drillbits development, DSL (Domain Specific Language), Eclipse Parser, JavaCC, Antlr, Lex, yacc / bison, Flex, JFlex, GLR / LALR / LL Parser, Ansible, Juju, MAAS, Kubernetes / K8s + Docker, Minikube, Microk8s if applicable, Flash Incident Response, HDFS, Data Lake, Zookeeper, Hive, JDBC, Management Tools (Ambari, Ranger, etc.), Hadoop Secure Mode, SSO (Single Sign-On), Identity and Access Management (IAM / IdM), LDAP, Role Mapping, Kerberos, TLS, OAuth, OpenId Connect.
SOC SIEM Cybersecurity Computer Vision Kubernetes Docker Cloud Computing
BNP Paribas Personal Investors (Consorsbank, DAB)
Coach: Data Architecture, Data Science Aspects and Use Case Evaluation
BANKING AND INSURANCE
April 2017 - September 2017 (5 months)
Nuremberg, Germany
Prepare or purchase marketing decisions regarding an in-house programmatic advertising solution for cross-selling across various customer touchpoints, dynamic offering, NPS (Net Promoter Score) optimization, and DDS (Data Driving Sales) via the Data Management Platform (DMP).
1. Marketing Strategy Consulting through Design Thinking with Customer Journey Mapping and documentation of customer-company touchpoints or interactions, conveying relevant knowledge about the latest programmatic marketing approaches and the corresponding data science fundamentals. Introduction to Customer Data Platforms (CDP) and Marketing Automation Platforms (MAP). Tactical Intervention Team discussions (Strengths, Weaknesses, Opportunities, Threats) are initiated and chaired.
2. Research of potential vendors in the aforementioned areas, especially regarding Customer Data Platforms (CDP) and Marketing Automation Platforms (MAP) and contact with vendors: IBM Interact, Oracle Real-Time Decisioning (RTD), SAS Customer Decision Hub, Pega Customer Decision Hub, Adobe Marketing Suite/Cloud, Prudsys, SC-Networks Evalanche, PIA/Dymatrix DynaCampaign, DynaMine.
3. Development of Use Cases according to the Use Case 2.0 approach (including MVP - Minimal Viable Product) with the marketing team (focus on possible real-time needs / use cases) and evaluation of possible cash flows as well as various KPIs such as ROI, NPV (Net Present Value), IRR (Internal Rate of Return), WSJF (Weighted Shortest Job First), (Net Promoter Score), NBI (Net Banking Income).
4. Create a basic Hadoop architecture with effort estimation as a possible solution based on Apache Spark with streaming, Alluxio Caching, QBit Microservices, Aerospike DB, Cassandra DB, jBPM, Drools, Oryx 2, WEKA, MOA, Sqoop 1/2, SAS. It was subsequently used for purchasing to negotiate prices.
5. Consulting on possible data science algorithms around the KNIME system for customer segmentation and derivation of product affinities or potential customer interests and customer journeys: DynaMine, gradient boosting (xgboost), non-linear regression, random forest, C4.5, etc.
6. Consulting on the parallel project "Corporate Data Hub" based on Spark, Cassandra DB and PostgreSQL, especially regarding the possibilities of connection with marketing solutions and how they can be used as a PoC (Proof of Concept) for the data center.
7. Design of a product extension of Dymatrix DynaCampaign called HintLog: With minimal effort, all participants in bonus or marketing programs could receive messages if errors occurred or if they risked dropping out of the program due to detailed regulation: customers then generally had extended deadlines and thus the NPV value could be significantly increased by avoiding embarrassing situations (i.e., customer satisfaction).
8. Consulting on the introduction of the Use Case 2.0 approach, as well as the subsequent introduction of new Lean-Startup principles, as well as microservices, scalable architecture, connection to a mobile app and development of appropriate versions.
9. Review of existing BPM models in Camunda and extension of these models to Camunda with new use cases for marketing and campaigns.
10. Concept for semantic analysis and control of marketing campaigns based, for example, on customer interests, customer situation, current market trends as well as company interests, e.g., in the form of combined/concerted discount actions between different parts of the offering or to reiterate superior marketing statements in subordinate actions and to achieve overall consistency and rigor in statements. Recognized customer locations/segments, interests and support needs can be used as precisely as possible, so that they are appreciated by customers as useful and can be recommended later (product/service) on a trust basis.
11. Natural Language Processing (NLP): Analysis of customer feedback/sentiments with spacy.io in Python (Net Promoter Score (NPS) survey and improvement).
12. Participation in the Digital David project as a technology and NLP consultant, creation of a chatbot with IBM Watson technology (now online at consorsbank.de): Vision: Chatbot that knows all customer investment and banking preferences, including accounts, deposits, and WKN/ISIN numbers with charts / trends / dependencies and any search for investment opportunities (with RoboAdvisor in the background) and thus achieves high customer loyalty and significant sales revenue. My work: Analysis of expected dialogue script efforts (due to technically outdated features for chatbot developers) and the total cost of ownership (TCO) of the IBM-Watson solution and comparison with a new Open Source-based DLNLP (Deep Learning Natural Language Processing) architecture in the context of price negotiations for purchases: Elements of my open source chatbot architecture with DLNLP (Deep Learning Natural Language Processing) tools: Seq2seq, word2vec, ULM-FiT, ELMo, OpenAI Transformer / GPT, Transfer Learning, OpenAI Transformer, spaCy, Stanford Co-reNLP, AllenNLP and virtualization with Docker/Kubernetes for cloud training.
Time series analysis, anomaly detection, a priori analysis, supervised classification, association analysis, maximum likelihood estimator, customer segmentation techniques, e.g. according to Personas with KNIME, DynaMine, gradient boosting (xgboost), non-linear regression, Random Forests, C4.5.
Groupe Schwarz (Lidl & Kaufland), le plus grand groupe européen de vente au détail : BI & Analytique: en ligne et hors ligne
Coach: Architecture and Data Science
RETAIL (LARGE RETAILERS)
September 2017 - December 2017 (3 months)
Stuttgart, Germany
Development of platforms and environments for various predictive analytics sub-projects (especially for marketing effects and supply chain forecasts regarding required quantities/prices, etc.)
1. For familiarization & coaching purposes: assessment of the current situation regarding tools, algorithms and IT environments; cooperation in the creation of Ab Initio Graphs/Lineages as an ETL pipeline with integration of Teradata BTEQs/ActiveBatch/SQL, R, Python, Spark, Hive, SAP, Microstrategy.
2. Big Data Architecture Consulting: R on Spark with SparklyR vs SparkR, Hive/Beeline Query Optimization, integration with Teradata QueryGrid/Teradata Connector for Hadoop (based on Sqoop).
3. Design and Development of AbInitio ETL Pipelines with GDE/TRMC/EME, Express>It (BRE), Conduct>It (CC), Query>It, Metadata Hub (EME).
4. Suggestion and consulting regarding the definition/selection of BI and Analytics Use Cases: Promotions (special offers/price changes (PV)), dynamic pricing, cook-chill scheme, category management, pallet factor, parcel sorting, purchasing missions, purchasing planning, logistics planning, re-shipment/return planning.
5. Participation in predictive modeling processes for logistics and prediction of the effect of special discount offers and various advertising measures.
6. Consulting on the choice of a workflow management tool: Oozie, ActiveBatch, Azkaban (LinkedIn), Airflow (Airbnb), Scripting.
7. Authorization concept with Apache Ranger, rights database and LDAP for Hortonworks Hadoop.
8. Development of cross-platform packaging, versioning, deployment and dependency management for Python, R, Big Data (Spark, Hive, etc.), Teradata, SAP, Ab Initio, MicroStrategy with Conda/Anaconda, Python, sbt, Java 9, Java Platform Module System (JPMS) = Project Jigsaw, etc.
9. Created virtualization concepts for all tools with VMware, Docker, Kubernetes and Rancher, including network connectivity, debugging, tracking and monitoring.
10. Creation of a comprehensive 400-page test management concept including ETL and BI tests with IT security for 6 test environments: Python, R, Big Data (Spark, Hive, etc.), Teradata, SAP, Ab Initio, MicroStrategy and continuous integration/deployment with Jenkins and Sonar (Qube).
Random Forest, Gradient Boosting (GBM, xgboost), Cubist (extension of Quinlan's M5 model tree), Time Series Analysis, Association Analysis, (Non-linear) Regression, Multiple Regression, Anomaly Detection, A Priori Analysis, Basket Analysis, Supervised Classification, Link Analysis Networks, Maximum Likelihood Estimators, Multi-level classic and fraud detection method (see separate section).
OpenStack, Docker, Kubernetes, Rancher, Python, R, Big Data (Spark, Hive, etc.), PySpark with Optimus, Teradata, SAP, Ab Initio, Microstrategy, MS Visio, Scrum, Large Scale Scrum (LeSS).