Alex Du - GATO

GATO

GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control

Research @ A2R Lab, Columbia University
Authors: Alexander Du, Emre Adabag, Gabriel Bravo, Brian Plancher

Figure 1: GATO parallelizes across batches of trajectory optimization solves on the GPU through algorithm-software-hardware co-design. This approach enables real-time performance for batch sizes of tens to low-hundreds of solves.

While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing GPU-accelerated approaches often fail to bridge the gap between single-solve real-time performance and large-scale (non-real-time) batch throughput, or they rely on restricted model generalities.

We present GATO, an open-source, GPU-accelerated, batched TO solver co-designed across algorithm, software, and computational hardware to deliver real-time throughput for moderate batch size regimes (tens to low-hundreds). Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance.

Architecture & Design

GATO is designed to solve tens to low-hundreds of problems simultaneously. This is achieved via hierarchical parallelism both across and within underlying computations for efficient problem matrix formation, linear system solves, and line search iterate computations.

Algorithm-software-hardware co-design architecture.

Performance

We demonstrate the effectiveness of our approach through simulated benchmarks showing speedups of 18-21× over CPU baselines and 1.4-16× over GPU baselines as batch size increases.

Scaling analysis against CPU and GPU baselines (left) and throughput heatmap (right).

Case Studies

We highlight the benefits of real-time batched optimization through case studies demonstrating improved disturbance rejection and convergence behavior.

Constant external force rejection during figure-8 tracking.

Convergence behavior of batched $\rho$ values.

External force disturbance rejection during pick-and-place.

Hardware Validation

The solver was validated on hardware using an industrial manipulator, demonstrating real-time capability and robustness.

View Columbia AI Summit Poster (PDF)

← Back to work