Research

Multi-agent RL with an eye on production coordination

PhD at the University of Groningen. I study Dec-POMDPs and strategic world models, and connect that work to how production agents coordinate.

Research focus

Strategic world models for multi-agent deep RL (PhD thesis, University of Groningen)
Dec-POMDPs and decentralized control under partial observability
SeqPPO: ~3× sampling efficiency vs. MAPPO, HATRPO, and HAPPO in our benchmarks
Centralized training, decentralized execution (CTDE) with university collaborators
Incentive alignment and robustness applied to production multi-agent design
Sampled Policy Gradient extensions (MSc thesis): off-policy actor-critic with distributional RL

ScrollSearch: Decentralised Control Made Learnable
In preparation
Event-Triggered Interference Pricing: Pareto-Efficient Communication for MARL Distributed Spectrum Access
In preparation

Extensions of Sampled Policy Gradient for Continuous Action Control
MSc thesis, University of Groningen
View →
Gait learning using reinforcement learning
Advanced Computing (Springer, 2022)
View →
Autonomous Swarm Intelligence
IJEAM (2019)
View →

Sampled Policy Gradient and variants (MSc thesis implementations)
Realty hybrid RAG platform: natural-language queries over structured and vector data; broker turnaround from two days to minutes