Navigating the Proximity-Safety Balance: Constraint Decomposition for Human Following in Pedestrian Crowds

Navigating the Proximity-Safety Balance:
Constraint Decomposition for Human Following in Pedestrian Crowds

Shiting Gong^1,*
Jianpeng Yao^2,*
Jinfeng Wang²
Marco Pavone³
Jiachen Li^4,†

¹University of Pennsylvania ²University of California, Riverside ³Stanford University ⁴Georgia Institute of Technology

^* Equal contribution ^† Corresponding author

Abstract

Following a target human in crowded environments involves an inherent conflict between staying close to the target and navigating safely among surrounding pedestrians and obstacles. This conflict becomes more severe in dense scenarios, where aggressive following risks collisions and conservative margins lead to target loss, especially when pedestrian behaviors are unfamiliar or unpredictable. Existing reinforcement learning (RL) methods typically encode these competing objectives into a single dense reward, but the resulting proximity-safety balance is implicit and difficult to adjust across conditions. To address this, we decompose the human-following task into a sparse task reward and independent cost constraints within a multi-constraint RL formulation, where each constraint is managed through cost thresholds with direct behavioral meaning rather than implicit reward weight ratios, allowing explicit and tunable control over the trade-off. We further quantify the prediction uncertainty of human motions and integrate these estimates into the RL costs to enhance safety under unpredictable conditions. Extensive experiments across both in-distribution and out-of-distribution settings demonstrate that our method achieves an effective proximity-safety balance compared to baselines. Real-robot deployment further validates the feasibility of our method in real-world scenarios. Real robot demonstrations are provided in the supplementary video.

Key Ideas and Contributions

Method overview — Overview of our method. The state is processed by a unified policy network and optimized with four critics under a constrained RL framework using PPO-Lagrangian.

Interaction among the critics and the generation of the action loss.

1) Constraint Decomposition for Explicit Trade-off Control: We propose a framework that decomposes human-following objectives into a sparse task reward and independent cost constraints, with the trade-off managed through behaviorally meaningful cost thresholds and jointly optimized using PPO-Lagrangian.
2) Uncertainty-Aware Safe Following: We integrate prediction uncertainty of human motions into the observation space and cost formulation, and design a unified policy network with spatial constraint reasoning to enable safe human following under uncertain and dynamic pedestrian environments.
3) Comprehensive Evaluation and Real-World Deployment: We extend CrowdNav with static obstacles and conduct extensive evaluations with varying crowd densities, pedestrian behaviors, and environment layouts, and deploy on a real robot under the ROS 2 setup to validate real-world effectiveness.

Quantitative Results

Table 1. In-Distribution Test Results
Method	SR ↑	CR ↓			TLR ↓	AFD
Method	SR ↑	Overall	Human	Obstacle	TLR ↓	AFD
SG-HA*	1.84%	81.52%	79.92%	1.60%	16.64%	1.91
SG-ORCA	17.68%	64.88%	27.84%	37.04%	17.44%	1.65
SG-MPC	30.96%	42.72%	30.08%	12.64%	26.32%	2.64
OGM-HEIGHT	52.32%	34.72%	21.44%	13.28%	12.96%	2.19
RL	68.00%	24.08%	15.76%	8.32%	7.92%	2.24
RL+ACI	71.60%	20.72%	12.80%	7.92%	7.68%	2.28
Ours	78.08%	16.16%	10.72%	5.44%	5.76%	2.35

Table 2. Effect of Constraint Tuning vs. Reward Weight Tuning
Intent	Method	SR ↑	CR ↓		TLR ↓	AFD
Intent	Method	SR ↑	Overall	Human	TLR ↓	AFD
Safety	Ours (δ_F=4.0, δ_H=3.2)	71.68%	18.24%	8.80%	10.08%	2.54
Safety	RL+ACI (w_H×2)	71.60%	20.72%	12.80%	7.68%	2.28
Following	Ours (δ_F=3.2, δ_H=4.0)	74.40%	22.40%	14.24%	3.20%	2.23
Following	RL+ACI (w_F×2)	63.92%	27.36%	15.44%	8.72%	2.26
Balanced	Ours (δ_F=3.6, δ_H=3.6)	78.08%	16.16%	10.72%	5.76%	2.35
Balanced	RL+ACI (w_F=w_H=1)	70.80%	21.12%	12.88%	8.08%	2.26

Table 3. Out-of-Distribution Test Results
Environment	Method	SR ↑	CR ↓			TLR ↓	AFD
Environment	Method	SR ↑	Overall	Human	Obstacle	TLR ↓	AFD
Corridor	SG-HA*	2.64%	88.64%	88.56%	0.08%	8.72%	1.46
	SG-ORCA	50.64%	47.20%	25.28%	21.92%	2.16%	1.48
	SG-MPC	48.88%	39.20%	37.04%	2.16%	11.92%	1.84
	OGM-HEIGHT	56.24%	37.60%	28.96%	8.64%	6.16%	1.50
	RL	77.52%	21.84%	21.04%	0.80%	0.64%	1.50
	RL+ACI	82.96%	16.72%	15.12%	1.60%	0.32%	1.53
	Ours	89.76%	8.64%	8.48%	0.16%	1.60%	1.78
15% Rushing Humans	SG-HA*	0.16%	93.68%	92.72%	0.96%	6.16%	1.93
	SG-ORCA	12.48%	81.76%	37.28%	44.48%	5.76%	1.39
	SG-MPC	15.76%	72.48%	57.84%	14.64%	11.76%	2.66
	OGM-HEIGHT	47.04%	38.08%	26.56%	11.52%	14.88%	2.22
	RL	61.60%	35.36%	27.36%	8.00%	3.04%	2.15
	RL+ACI	68.48%	28.48%	23.04%	5.44%	3.04%	2.18
	Ours	70.56%	22.72%	19.36%	3.36%	6.72%	2.23
SF Pedestrian Model	SG-HA*	0.72%	84.88%	84.08%	0.80%	14.40%	1.74
	SG-ORCA	22.24%	68.80%	14.40%	54.40%	8.96%	1.53
	SG-MPC	29.60%	45.92%	31.20%	14.72%	24.48%	2.43
	OGM-HEIGHT	35.84%	50.40%	22.88%	27.52%	13.76%	2.16
	RL	56.96%	40.96%	21.28%	19.68%	2.08%	2.15
	RL+ACI	60.64%	34.48%	21.68%	12.80%	4.88%	2.19
	Ours	64.32%	33.68%	20.08%	13.60%	2.00%	2.21
Groups	SG-HA*	0.08%	92.64%	92.64%	0.00%	7.28%	1.83
	SG-ORCA	27.68%	46.48%	46.48%	0.00%	25.84%	1.86
	SG-MPC	26.88%	64.08%	64.08%	0.00%	9.04%	2.43
	OGM-HEIGHT	36.32%	53.28%	53.28%	0.00%	10.40%	2.09
	RL	49.36%	46.64%	46.64%	0.00%	4.00%	1.87
	RL+ACI	56.00%	39.20%	39.20%	0.00%	4.80%	2.17
	Ours	59.68%	32.64%	32.64%	0.00%	7.68%	2.14

SR: success rate; CR: collision rate (overall / with humans / with obstacles); TLR: target-lost rate; AFD: average following distance. Best in each column is in bold.

Qualitative Results

Real-World Experiments

Zero-shot deployment on a ROSMASTER X3 robot following a target human through pedestrians and static obstacles.

Citation

@article{gong2026navigating,
    title={Navigating the Proximity-Safety Balance: Constraint Decomposition for Human Following in Pedestrian Crowds},
    author={Gong, Shiting and Yao, Jianpeng and Wang, Jinfeng and Pavone, Marco and Li, Jiachen},
    year={2026}
}