About

I am a data scientist at the Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP) in Hamburg. I work at the interface of drug discovery, medical data science and generative AI, with a focus on applying large language models, graph-based methods and large-scale analytics to biomedical and chemical data. I contribute to several large European and German-funded consortia in translational medicine and pharmacology, including the Innovative Health Initiative project SYNTHIA and the EU project Proxidrugs.

My academic background is in the MSc programme Intelligent Adaptive Systems at Universität Hamburg, Germany, where my research interests centered on generative AI (text and vision), MLOps and the use of large language models in bioinformatics and cheminformatics. My master’s thesis, “Leveraging LLMs for Enhanced Drug Discovery: Extracting Insights from Patents”, developed an LLM-driven retrieval-augmented generation pipeline for extracting qualitative and quantitative molecular information (e.g. protein–ligand relationships, dosage information, PK/PD parameters and experimental conditions) from regulatory documents, scientific literature and patents, and evaluated suitable metrics for novel extraction tasks without gold-standard data. The thesis can be accessed here.

My technical repertoire focuses on high-fidelity generative AI architectures, including large language models (Qwen, Llama, Mistral) and multimodal vision systems (GANs, VAE, GPT-4o). I specialize in engineering hallucination-resistant retrieval-augmented generation (RAG) pipelines for zero-tolerance medical environments, with an emphasis on reproducibility, grounding and clinical attribution frameworks. In medical data science, I leverage generative adversarial networks (GANs) and variational autoencoders (VAEs) for high-dimensional latent space exploration and synthetic data generation, while implementing algorithmic fairness to neutralize systemic bias in decision-support systems. I prioritize data interoperability using the OMOP Common Data Model (CDM) for heterogeneous healthcare datasets. My engineering stack includes CNNs and YOLO for diagnostic imaging, supported by resilient backends in Python (FastAPI, Django, Flask). I industrialize AI through robust MLOps and Infrastructure as Code (Terraform), including containerized orchestration via Docker and Kubernetes across Azure, GCP, AWS, German Edge Cloud (GEC) environments. My current work centers on translating these methodologies into secure, clinical-grade solutions for drug discovery and translational medicine.

Previously I worked as a data and analytics Werkstudent at Crossnative, where I developed and deployed client-specific LLM pipelines for business and finance. I also worked as a Werkstudent for ML development at Adalab, and as a machine learning engineer intern at Robofied and Technocolabs.

You can find more about my experience, publications and academic background on my CV, Google Scholar and GitHub.