Research

Reach-avoid control with probabilistic guarantees for stochastic systems

My current research focuses on reach-avoid control for stochastic systems. The system evolves under random disturbance, the controller should guide it toward a target set while keeping it away from unsafe regions, and the conclusion should come with a formal probability lower bound rather than rely on simulation alone. What matters most to me now is how learning, approximation, certification, and iteration can be connected into one usable research pipeline.

Problem setting

What is the concrete problem?

In one sentence: once the initial set, target set, unsafe set, and required probability threshold are fixed, how can we obtain a controller that both works in control terms and comes with a formal lower bound on the success probability?

Task definition

I study infinite-horizon reach-avoid problems for discrete-time stochastic dynamical systems under explicit initial, target, unsafe, and probability requirements.

Candidate controller

A reinforcement learning policy is first used as a workable starting point so the system already has a strong candidate controller.

Verifiable approximation

The neural controller is then rewritten into a polynomial form that is more suitable for formal reasoning and optimization.

Probability guarantee

Stochastic barrier-like certificates together with SOS / SDP are used to compute the formal lower bound and indicate where the next iteration should go.

Method chain

The current pipeline

1. Write the specification clearly

Fix the initial set, target set, unsafe set, and the probability requirement.

2. Learn a reference policy

SAC is currently used to train a candidate controller that provides a strong starting point.

3. Build a PAC-guided approximation

The learned controller is converted into a polynomial form that is easier to certify and optimize over.

4. Solve for certificates and bounds

Stochastic barrier-like certificates and SOS / SDP are used to produce the probability lower bound.

5. Iterate when needed

If the first result is still conservative, the next step is to update the controller or the certificate template rather than stop there.

What I care about

Why this pipeline matters to me

  • I am not interested in treating learning and verification as two disconnected tasks.
  • The real goal is to connect policy learning, verifiable approximation, certification, and update decisions into one process.
  • That is also why I care about controller complexity, certificate complexity, and the direction of the next iteration—not only whether one simulation run looks good.
Related literature

How this relates to verification-oriented work

Work such as Policy Verification in Stochastic Dynamical Systems Using Logarithmic Neural Certificates is very relevant because it shows how high-probability reach-avoid specifications can be verified once a neural controller is already given.

My present focus

Where my current work starts earlier

My present focus begins one step earlier. The controller is not assumed to be fixed. It needs to be learned first, then converted into a certifiable form, and only then pushed through certificate solving and possible iteration. What matters most to me is how these stages can be connected into one usable workflow.

Next step

What I want to improve next

Controller updates

I want to keep updating the controller when the first certified result is still conservative.

Better PAC approximation

The trade-off among polynomial degree, sample size, and approximation error still needs to be improved.

More expressive certificates

Harder geometries and higher-dimensional systems will require stronger certificate templates and steadier solvers.

More complex settings

I also want to move toward more complex dynamics and tighter coordination between learning and verification.