Task definition
I study infinite-horizon reach-avoid problems for discrete-time stochastic dynamical systems under explicit initial, target, unsafe, and probability requirements.
My current research focuses on reach-avoid control for stochastic systems. The system evolves under random disturbance, the controller should guide it toward a target set while keeping it away from unsafe regions, and the conclusion should come with a formal probability lower bound rather than rely on simulation alone. What matters most to me now is how learning, approximation, certification, and iteration can be connected into one usable research pipeline.
In one sentence: once the initial set, target set, unsafe set, and required probability threshold are fixed, how can we obtain a controller that both works in control terms and comes with a formal lower bound on the success probability?
I study infinite-horizon reach-avoid problems for discrete-time stochastic dynamical systems under explicit initial, target, unsafe, and probability requirements.
A reinforcement learning policy is first used as a workable starting point so the system already has a strong candidate controller.
The neural controller is then rewritten into a polynomial form that is more suitable for formal reasoning and optimization.
Stochastic barrier-like certificates together with SOS / SDP are used to compute the formal lower bound and indicate where the next iteration should go.
Fix the initial set, target set, unsafe set, and the probability requirement.
SAC is currently used to train a candidate controller that provides a strong starting point.
The learned controller is converted into a polynomial form that is easier to certify and optimize over.
Stochastic barrier-like certificates and SOS / SDP are used to produce the probability lower bound.
If the first result is still conservative, the next step is to update the controller or the certificate template rather than stop there.
Work such as Policy Verification in Stochastic Dynamical Systems Using Logarithmic Neural Certificates is very relevant because it shows how high-probability reach-avoid specifications can be verified once a neural controller is already given.
My present focus begins one step earlier. The controller is not assumed to be fixed. It needs to be learned first, then converted into a certifiable form, and only then pushed through certificate solving and possible iteration. What matters most to me is how these stages can be connected into one usable workflow.
I want to keep updating the controller when the first certified result is still conservative.
The trade-off among polynomial degree, sample size, and approximation error still needs to be improved.
Harder geometries and higher-dimensional systems will require stronger certificate templates and steadier solvers.
I also want to move toward more complex dynamics and tighter coordination between learning and verification.