GATO

GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control

Research @ A2R Lab, Columbia University
Authors: Alexander Du, Emre Adabag, Gabriel Bravo, Brian Plancher

GATO Overview
Figure 1: GATO parallelizes across batches of trajectory optimization solves on the GPU through algorithm-software-hardware co-design. This approach enables real-time performance for batch sizes of tens to low-hundreds of solves.

While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing GPU-accelerated approaches often fail to bridge the gap between single-solve real-time performance and large-scale (non-real-time) batch throughput, or they rely on restricted model generalities.

We present GATO, an open-source, GPU-accelerated, batched TO solver co-designed across algorithm, software, and computational hardware to deliver real-time throughput for moderate batch size regimes (tens to low-hundreds). Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance.

Architecture & Design

GATO is designed to solve tens to low-hundreds of problems simultaneously. This is achieved via hierarchical parallelism both across and within underlying computations for efficient problem matrix formation, linear system solves, and line search iterate computations.

GATO Architecture
Algorithm-software-hardware co-design architecture.

Performance

We demonstrate the effectiveness of our approach through simulated benchmarks showing speedups of 18-21× over CPU baselines and 1.4-16× over GPU baselines as batch size increases.

Performance Scaling Performance Heatmap
Scaling analysis against CPU and GPU baselines (left) and throughput heatmap (right).

Case Studies

We highlight the benefits of real-time batched optimization through case studies demonstrating improved disturbance rejection and convergence behavior.

Case Study 1
Constant external force rejection during figure-8 tracking.
Case Study 2
Convergence behavior of batched $\rho$ values.
Case Study 3
External force disturbance rejection during pick-and-place.

Hardware Validation

The solver was validated on hardware using an industrial manipulator, demonstrating real-time capability and robustness.

Hardware Setup
View Columbia AI Summit Poster (PDF)

← Back to work