About Stefan
German
Native or bilingual
English
Fluent
Experience
- -Natural Language Processing (NLP), Knowledge Graphs, Graph-RAGOctober 2024 - July 2025 (9 months)Results:Development of a program for automated information extraction from text documents (e.g., websites, PDF documents, ...), aggregation of information into a graph database (Knowledge Graph), and querying of information using a generative language model (Graph-RAG). The program can independently decide which information from the documents is unknown by querying the knowledge base and add it to the knowledge base. All information can be collected, or the user can restrict the topic areas.Methods:
- Extraction:Named Entity Recognition and extraction of relations between entities from documents using either a generative language model or (more efficient) NER models (if topic areas are specified).
- Knowledge Graph Creation:Collected information from documents is compared with the current knowledge base and, if not already known, added to the database (either as new nodes/edges or as additional information to existing nodes/edges). If the information is already known from other documents, this increases the reliability of the information.
- Graph-RAG:Querying information from the database is possible through natural language using a generative language model that can translate natural language into a database query and generate an answer from the collected information.
- _Natural Language Processing (NLP), Large Language Models (LLMs), Chatbots.SOFTWARE PUBLISHINGApril 2024 - July 2024 (3 months)Results:Development of a Llama-3 chatbot that can use various tools (customer databases, web search, chat, ...) to answer user questions. The bot can be hosted entirely on the customer's own servers, which also enables the processing of sensitive data.Methods
- Data Pipeline: Pytorch-Lightning module to transfer text documents into a vector space using a (small) language model (Jina-V2 text embeddings) and store them together with any (even nested) metadata in a vector database.
- Chatbot: LangGraph agent with Llama-3 LLM to answer user questions. The agent has various tools available to gather information about the question from the connected databases and the internet, or to ask follow-up questions to the user if the question is unclear. Based on the collected information, the agent then creates an answer (Retrieval-Augmented Generation, RAG).
- _Computer Vision, Optical Character Recognition (OCR)SOFTWARE PUBLISHINGFebruary 2024 - April 2024 (2 months)Results:Module to recognize text in images and compare it with a customer-specific template.Methods
- OCR: Identification of text blocks in images & recognition of the text.
- Matching: Local similarity determination of the recognized text with a customer-specific template, taking into account possible OCR errors.
Recommendations
Be the first to recommend Stefan
Help this freelancer shine by sharing your experience working together.
These freelancer profiles also match your criteria
Agatha Frydrych
Backend Java Software Engineer
4.7
(3)
2
Baptiste Duhen
Fullstack developer
4.6
(4)
5
Amed Hamou
Senior Lead Developer
4
(2)
7
Audrey Champion
Web developer
4.3
(3)
4
Education
- M.Sc. BioinformaticsFriedrich Schiller University Jena2015