Balancing a Cart-Pole
For our first example, we'll simulate a cart-pole system balancing its pendulum in the upright equilibrium using a Linear Quadratic Regulator (LQR). Simulating controllers which balance and stabilise robotic systems is a common use-case for MuJoCo, so this is a good starting point. There are plenty of different ways to design stabilising controllers for the cart-pole system (eg: PID, MPC, reinforcement learning, etc...). Feel free to extend this example with your own controllers and play around with it!
Loading a model
Our first step is to load a MuJoCo model of the cart-pole system. You can find a copy of the standard cartpole.xml
file used by DeepMind here.
using MuJoCo
model = load_model("cartpole.xml")
data = init_data(model)
println("Initial position: ", data.qpos)
println("Initial velocity: ", data.qvel)
Initial position: [0.0; 0.0;;]
Initial velocity: [0.0; 0.0;;]
The system starts off with zero velocity, at the origin (data.qpos[1] == 0
) and with the pendulum perfectly upright (data.qpos[1] == 0
). This is the state we'll want to keep it in with our controller.
A quick review of LQR
There are many textbooks, tutorials, and course notes describing LQRs in considerable depth (see here for an example). We'll give a brief, high-level recap here. Suppose you have a system with state vector $x$ (eg: joint positions and velocities) and some control inputs $u$ (eg: forces and/or torques). We often write the system dynamics as $x_{t+1} = f(x_t, u_t)$, where the function $f$ describes how the states $x$ evolve at each time-step $t$.
In linear control design, our job is often to figure out what the control inputs $u_t$ should be to keep the system at some desired set-point $(x^*, u^*)$. For our cart-pole, we want it to be stationary and upright at the origin, so $(x^*, u^*) = ($zeros(4)
$, 0)$. Let's look at a linear approximation of the system about this set-point:
\[\delta x_{t+1} = A \delta x_t + B \delta u_t\]
Here, we've defined $\delta x_t = x_t - x^*$ and $\delta u_t = u_t - u^*$ as the differences between the current and desired states and controls (respectively). The matrices $A,B$ defining the linear model are the Jacobians
\[A = \frac{\partial f}{\partial x} \quad B = \frac{\partial f}{\partial u}.\]
Our task is to design a controller that will keep $\delta x_t$ and $\delta u_t$ small. In particular, we can define a cost-function that we want our controller to optimise. For an LQR, we define a quadratic cost
\[J(\delta x_t, \delta u_t) = \delta x_t^\top Q \delta x_t + \delta u_t^\top R \delta u_t\]
to be minimised, where $Q,R$ are positive-definite matrices of our choosing. With a little bit of matrix algebra and vector calculus based on the Bellman equation, it turns out that the optimal control law is $u_t = -K x_t$ where
\[K = (R + B^\top P B)^{-1} B^\top P A\\ P = A^\top P A + Q - A^\top P B (R + B^\top P B)^{-1} B^\top P A.\]
Thankfully, this last equation is a well-studied Riccati equation, so we can use packages like MatrixEquations.jl
or ControlSystems.jl
to solve it for us.
Linearising the dynamics
We can use the underlying C API for MuJoCo to easily linearise the dynamics for our cart-pole system. The function mjd_transitionFD
numerically computes the $A$ and $B$ matrices with a finite difference method through the MuJoCo simulator.
# Number of states and controlled inputs
nx = 2*model.nv
nu = model.nu
# Finite-difference parameters
ϵ = 1e-6
centred = true
# Compute the Jacobians
A = mj_zeros(nx, nx)
B = mj_zeros(nx, nu)
mjd_transitionFD(model, data, ϵ, centred, A, B, nothing, nothing)
@show A, B
(A, B) = ([1.0 -0.00020933539434596422 0.00999696728367769 7.112993351890879e-6; 0.0 1.0027918363759287 7.112993351890878e-6 0.009905136378663615; 0.0 -0.020933539434596422 0.9996967283677689 0.000711299335189088; 0.0 0.2791836375928739 0.0007112993351890878 0.9905136378663615], [0.0030327163223112376; -0.007112993351890878; 0.30327163223112374; -0.7112993351890878;;])
Note the use of mj_zeros
to initialise $A,B$. See Row vs. Column-Major Arrays for more details.
Designing a controller
Now that we have our linear model, all we need to do is compute our LQR gain matrix $K$ based on some cost-function with weights $Q,R$. The state vector for our cart-pole system is x = [cart_position, pole_angle, cart_velocity, pole_velocity]
. We'll choose $Q,R$ to heavily penalise any movement of the pole away from vertical. This may mean the cart slides around a little bit, but if our goal is to keep the pole upright this is fine.
using LinearAlgebra
Q = diagm([1, 10, 1, 5]) # Weights for the state vector
R = diagm([1]) # Weights for the controls
To compute $K$, we could directly use the lqr
function provided in ControlSystemsBase.jl
with K = lqr(A,B,Q,R)
. For those who don't want to load the rather large control systems package, we can directly use ared
from MatrixEquations.jl
, which is what lqr
calls under the hood.
using MatrixEquations
S = zeros(nx, nu)
_, _, K, _ = ared(A,B,R,Q,S)
@show K
1×4 Matrix{Float64}:
-0.467417 -5.59272 -0.843668 -1.44487
Simulate and visualise
After all that, we now have a stabilising controller for our cart-pole system! Let's test it out with the visualiser. First, we'll need a controller
function that takes in the MuJoCo model
and data
structs and sets the control variables.
function lqr_balance!(m::Model, d::Data)
state = vcat(d.qpos, d.qvel)
d.ctrl .= -K * state
nothing
end
This function captures non-const global variables and will take a performance hit (see Performance Tips) for more details. To remedy the performance hit, one can use functors instead, as described at the end of the Humanoid LQR example.
Now we can visualise the model and see how it goes.
init_visualiser()
visualise!(model, data; controller=lqr_balance!)
When we apply small forces to the pole or the cart, the controller immediately responds to the perturbation and returns the system to its equilibrium position. If the force is big enough, the system will become unstable and the pole will fall down. Feel free to play around with the weights in $Q,R$ to see their effect on the system too!