I currently work on reach-avoid control for stochastic systems
The core question is straightforward: once the initial set, target set, unsafe set, and probability requirement are fixed, how can we actually produce a controller with a formal guarantee instead of a policy that only looks good in simulation? My current pipeline has four steps: train a reference controller with reinforcement learning, approximate it with a PAC-guided polynomial controller, synthesize a stochastic barrier-like certificate through SOS / SDP, and then continue with controller or certificate iteration when needed.


