BharathStaff AI Engineer · Production Multi-Agent PlatformsLinkedIn

Research

Multi-agent RL with an eye on production coordination

PhD at the University of Groningen. I study Dec-POMDPs and strategic world models, and connect that work to how production agents coordinate.

Research loopRESEARCH LOOP1Hypothesiscoordination under partial observability2Experimentmulti-agent policies3Comparebaselines + metrics4Inform producthandoff designBRIDGE TO ENGINEERINGDec-POMDP framingwho sees whatMessage protocolswhen agents talkProduction agentsrobust handoffsResearch informs coordination design. Consulting stays delivery-focused.

Research focus

  • Strategic world models for multi-agent deep RL (PhD thesis, University of Groningen)
  • Dec-POMDPs and decentralized control under partial observability
  • SeqPPO: ~3× sampling efficiency vs. MAPPO, HATRPO, and HAPPO in our benchmarks
  • Centralized training, decentralized execution (CTDE) with university collaborators
  • Incentive alignment and robustness applied to production multi-agent design
  • Sampled Policy Gradient extensions (MSc thesis): off-policy actor-critic with distributional RL

Work in preparation

  • ScrollSearch: Decentralised Control Made Learnable

    In preparation

  • Event-Triggered Interference Pricing: Pareto-Efficient Communication for MARL Distributed Spectrum Access

    In preparation

Published work

  • Extensions of Sampled Policy Gradient for Continuous Action Control

    MSc thesis, University of Groningen

    View →
  • Gait learning using reinforcement learning

    Advanced Computing (Springer, 2022)

    View →
  • Autonomous Swarm Intelligence

    IJEAM (2019)

    View →

Open source

  • Sampled Policy Gradient and variants (MSc thesis implementations)
  • Realty hybrid RAG platform: natural-language queries over structured and vector data; broker turnaround from two days to minutes