Jekyll2026-03-14T23:22:55+00:00https://woolfrey.github.io/feed.xmlJon WoolfreyPhD <br> Lean Six Sigma Black Belt <br> BE Mechatronics (Hons.) <br> Cert. IV Mechanical TradesGauge Repeatability &amp; Reproducibility2025-10-06T00:00:00+00:002025-10-06T00:00:00+00:00https://woolfrey.github.io/lss/six-sigma/gauge/gauge-rr/statistics/spc/2025/10/06/gauge-rr<![CDATA[

Whenever you measure something, there can be multiple sources of error: the measuring device, the thing being measured, the person or agent making the observation. If you want to be precise with your estimates, its important to understand where the error is coming from. In this post I show a gauge study I did on 2 sets of scales for measuring the weights of bags of M&Ms.

You can find the data I collected here:
gauge_study_kitchen_scales.csv
gauge_study_lab_scales.csv

🧭 Navigation

Value for Money?

The advertised weight for this small bag of M&Ms is 36 grams. However when I weighed it on my kitchen scales, I got about 37 grams.


Is the advertised weight accurate? Or are you measuring it incorrectly?

Is the bag heavier than advertised?

Or are my kitchen scales poor?

Or did I place it on the scales incorrectly?

Whenever something is measured, or “observed”, there can be natural variation in:

  • The object itself,
  • The measurement instrument being used, or
  • The observer taking the measurement.

The individual observation of a part can be more explicitly written as the sum of these factors:

\[y_{ijk} = \bar{y}_{\cdot\cdot\cdot} + o_i + p_j + (o\times p)_{ij} + e_{ijk} \tag{1}\]

where:

  • $\bar{y}_{\cdot\cdot\cdot}$ is the overally mean of a sample of $j$ parts,
  • $o_i$ is the deviation from the mean due to the $i^{th}$ observer,
  • $p_j$ is the deviation from the mean due of the $j^{th}$ object or part,
  • $(o\times p)_{ij}$ is the deviation of the $i^{th}$ observer measuring the $j^{th}$ part, and
  • $e_{ijk}$ is the error from the measuring device itself.

Importantly, we assume that all the deviations follow zero-mean Gaussian distributions:

\[\begin{align} o&\sim\mathcal{N}(0,\sigma_o^2) \tag{2a} \\ p&\sim\mathcal{N}(0,\sigma_p^2) \tag{2b} \\ (o\times p)&\sim\mathcal{N}(0,\sigma_{op})^2 \tag{2c} \\ e&\sim\mathcal{N}(0,\sigma_e^2). \tag{2d} \end{align}\]

The diagram below shows the composition of these error components.


(Random) measurement error can be decomposed in to repeatability, and reproducibility.

We can use a gauge study to to separate out these effects and answer some important questions about:

  1. Repeatability: Does the measuring device give consistent measurements for the same part?
  2. Reproducibility: Do different observers give consistent measurements of the same part?
  3. Is the instrument precise enough to detect differences between parts?

🔝 Back to top.

Setting Up a Gauge Study

In a crossed gauge study we have:

  • $n_o$ observers measure
  • $n_p$ parts, with
  • $n_r$ replicates or repeat measurements.

By having multiple observers measure the same parts we can filter out the bias that any individual may have, and account for interaction of operator and part.


In a crossed gauge study, every observer measures every object or part.

Every observer measures every part multiple times. It is also important to randomise the order in which observations are taken to reduce any bias or systematic effects.

The table below shows how a data collection form for a gauge study should look. Programs like Minitab can generate these automatically for you, though it is simple enough to create one in a spreadsheet.

Standard Order Randomised Order Part Number Observer Replicate Measurement
3 1 3 A 1 38
21 2 1 B 1 37
14 3 4 A 2 38
13 4 3 A 2 37
53 5 3 C 2 37

🔝 Back to top.

Calculations

1. Total Mean

We first begin by calculating the mean of all measurements:

\[\bar{y}_{\cdot\cdot\cdot} = \frac{1}{n_o n_p n_r}\sum_{i=1}^{n_o} \sum_{j=1}^{n_p}\sum_{k=1}^{n_r} y_{ijk} \tag{3}\]

This becomes our reference or anchor point for separating out the reproducibility and reliability.

2. Operator

We then compute operator means:

\[\bar{y}_{\cdot i\cdot} = \frac{1}{n_p n_r}\sum_{j=1}^{n_p}\sum_{k=1}^{n_r} y_{ijk}. \tag{4}\]

Then we compute the (sample) variance for all the operator measurements:

\[s_o^2 = \frac{1}{n_o - 1}\sum_{i=1}^{n_o}\big(\bar{y}_{i \cdot \cdot } - \bar{y}_{\cdot \cdot \cdot})^2. \tag{5}\]

3. Part

Next we compute the mean for each part:

\[\bar{y}_{\cdot j\cdot} = \frac{1}{n_o n_r}\sum_{i=1}^{n_o} \sum_{k=1}^{n_r} y_{ijk} \tag{6}\]

and its accompanying variance:

\[s_p^2 = \frac{1}{n_p - 1} \sum_{j=1}^{n_p} \left(\bar{y}_{\cdot j\cdot} - \bar{y}_{\cdot\cdot\cdot}\right)^2. \tag{7}\]

4. Operator-by-Part

Next we compute the mean effect between operators and parts:

\[\bar{y}_{ij\cdot} = \frac{1}{n_r} \sum_{k=1}^{n_r} y_{ijk} \tag{8}\]

Now, computing the variance for the operator-by-part interaction is complicated. But consider that, when we take the average, we eliminate measurement error $e_{ijk}$ since its mean is zero:

\[\bar{y}_{ij\cdot} = \bar{y}_{\cdot\cdot\cdot} + o_i + p_j + (o\times p)_{ij}. \tag{9}\]

We may then rearrange Eqn. (9) to obtain: \(\begin{align} (o\times p)_{ij} &\approx \bar{y}_{ij\cdot} - \overbrace{(\bar{y}_{i\cdot\cdot} - \bar{y}_{\cdot \cdot \cdot})}^{p_i} - \overbrace{(\bar{y}_{\cdot j\cdot} - \bar{y}_{\cdot \cdot \cdot})}^{o_j} - \bar{y}_{\cdot \cdot \cdot} \tag{10a}\\ &= \bar{y}_{ij\cdot} - \bar{x}_{i\cdot \cdot} - \bar{y}_{\cdot j \cdot} + \bar{y}_{\cdot \cdot \cdot} \tag{10b} \end{align}\)

Then the variance is computed as:

\[s_{op}^2 = \frac{1}{(n_o-1)(n_p -1)} \sum_{i=1}^{n_o}\sum_{j=1}^{n_p} \underbrace{(\bar{y}_{ij\cdot} - \bar{y}_{i\cdot\cdot} - \bar{y}_{\cdot j \cdot} + \bar{y}_{\cdot \cdot \cdot})}_{(o\times p)_{ij}} {}^2 . \tag{11}\]

5. Instrument

Finally we can compute the instrument errors as the difference of any individual observation from the operator-by-part mean: \(e_{ijk} = y_{ijk} - \bar{y}_{ij\cdot}. \tag{12}\)

The interpretation here is that $\bar{y}_{ij\cdot}$ averages out any errors from individual operators or parts.

We compute the (sample) variance of the instrument error as:

\[s_e^2 = \frac{1}{n_o n_p (n_r - 1)} \sum_{i=1}^{n_o} \sum_{j=1}^{n_p} \sum_{k=1}^{n_r} \left(y_{ijk} - \bar{y}_{ij\cdot}\right)^2. \tag{13}\]

6. Gauge R&R

Now having computed the variances for each source of error may determine the gauge repeatability and reliability.

The total variance of our gauge study $s^2$ is the sum of all the individual variances in which:

  • $s_e^2$ is the repeatability of the instrument,
  • $s_o^2 + s_{op}^2$ is the reproducibility, and
  • $s_p^2$ is the process, or part-to-part variation.
\[s^2 = \underbrace{ \overbrace{\,s_e^2\,}^{\text{repeatability}} +~ \overbrace{s_o^2 + s_{po}^2}^{\text{reproducibility}}}_{s_{grr}}+ ~~ s_p^2 \tag{14}\]

The sum of the repeatability and reproducibility is, as you would have guessed, the Gauge R&R.

📝 NOTE: Programs like Minitab can perform this analysis for you. But, being a masochist, I learnt and programmed the math myself.

🔝 Back to top.

How Good is Your Gauge?

Number of Distinct Categories

Normally when you perform a gauge study with a program like Minitab it prints out statistical information, including something called “number of distinct categories”.

According to ChatGPT, and a few sources I read on the internet, this is computed as;

\[NDC = \frac{\sqrt{2} s_p}{s_{grr}}. \tag{15}\]

A value of $NDC \ge 5$ is considered quite good.

Try as I might, I could not reverse engineer this equation, nor make any sense of the $\sqrt{2}$ on the numerator. From what I could determine, this is simply a heuristic.

In the proceeding section I will propose a more mathematically sound metric.

Better Metrics

Gauge Capability

For a process that produces goods or services, the capability is defined as the upper specification limit (USL) minus the lower specification limit (LSL), divided by 6 standard deviations of the process variation:

\[C_p = \frac{USL - LSL}{6\sigma} \tag{16}\]

A good process has $C_p \approx 1$. As shown in the diagram below, the specification limits are thus $\pm 3\sigma$ either side of the mean, or 99.7% of the process variation is within the specification limits.


A $6\sigma$ process has upper and lower specification limits 3 standard deviations either side of the mean $\mu$.

We can extend this idea to our gauge. We can take the ratio of the process standard deviation over 6 times the gauge R&R:

\[C_{gauge} = \frac{s_p}{6\cdot s_{grr}} \tag{17}\]

That is, our process variation is 6 times larger than Gauge R&R. An excellent gauge will have $C_{gauge} \approx 1$.


An excellent gauge will have 1/6 the of the process variation.

Gauge Resolution

Given the gauge variance, i.e. uncertainty in our measurement, what is the minimum value $\Delta p$ that it can detect? If we assume a Normal distribution, then we can use a z-score to determine to determine different confidence levels.

For example, a z-score of 1.96 equates to 95% probability, or confidence. We can solve the following equation to find the minimum detectable value with 95% confidence:

\[\begin{align} z = \frac{\Delta p}{s_{grr}} &= 1.96 \tag{18a}\\ \Delta p &= 1.96 \cdot s_{grr} \tag{18b} \end{align}\]

We can do this for other values as well.

Z-Score Confidence Level
1.645 90%
1.96 95%
2.576 99%

Any value below the gauge resolution we cannot be certain is due to measurement error.

🔝 Back to top.

Case Studies

Below are gauge studies conducted on 2 different sets of scales:

  1. My kitchen scales, and
  2. Expensive scales in my laboratory at work.

The gauge study consisted of measuring:

  • 10 bags of M&Ms ($n_p = 10$), between
  • 3 people ($n_o = 3$), with
  • 2 replicates ($n_r = 2)$

for a total of $10 \times 3 \times 2 = 60$ measurements. Measurement order was randomised for each study to reduce bias.

If you want to perform your own data analysis, you can find all the raw data in CSV format for:

  • The kitchen scales here, and
  • The lab scales here.

Kitchen Scales

I bought my kitchen scales from Kasanova when I was living in Italy. The resolution on the digital display is a minimum of 1g.


Bilancia elettronica Bambù da cucina, portata 5 kg

Below is a stem and leaf plot of the measurements taken during the gauge study:

Stem Leaves
3 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9
4 0 0 0

The median value is 37g; this is 1g higher than advertised. There is also an outlier of 40g.

The table below shows the results of the gauge study:

The estimated gauge capability $C_{gauge} \approx 0.2 < 1$ which is not good. Likewise, the number of distinct categories was estimated to be $2 < 5$ which again, is not good.

The estimated resolution is about 1.4 grams (99% confidence), so probably closer to 2g if we were to be conservative and round up.

The pareto chart below show the cumlative contribution for the sources of measurement variance. There is significant variance coming from the scales themselves; the repeatability.

When we plot individual observations by operator we can see that they line up with the 1g resolution of the scales themselves. Not much insight here.

Plotting the observations by part reveals another story. Despite the low resolution of the scales, the part measurements are quite inconsistent across operators. Look at bag no. 7 for example. There was a 2g discrepency between Operator A and Operator C. Operators weren’t even able to measure the same part consistently.

Conclusion: These scales are not good if we want to control the weight of M&M bags in a production line!

🔝 Back to top.

Lab Scales

The lab scales are from Kern & Sohn, and are about 20 times more expensive (thankfully I didn’t have to pay for them!).


Kern PCB Economy Precision Balance

The digital display has a resolution of 0.001g (remember this for later).

The stem and leaf plot below shows the distribution of measurements obstained during the gauge study. Immediately we can see, due to the higher resolution, a wider distriubtion of values. But interestingly, the median lies in the 37g range:

Stem Leaves
37 .136 .138 .138 .139 .140 .142 .143 .146 .146 .146 .146 .148 .507 .511 .516 .521 .523 .523 .532 .533 .533 .537 .542 .572 .842 .849 .850 .853 .854 .855 .885 .885 .894 .894 .894 .898 .945 .956 .962 .962 .964 .974
38 .117 .117 .118 .119 .121 .122
39 .429 .429 .433 .436 .442 .444 .475 .476 .476 .485 .486 .489

The table below shows the results of the gauge study:

The estimated gauge capability is $C_{gauge} \approx 20 \gg 1$, so this is a very good gauge. The distinct number of categories is also 170 $\gg$ 5, which supports this.

My estimated resolution of the gauge is 0.02g (99% confidence), so it seems that the 3rd decimal place.

The pareto chart below shows that, within the study, all the variance in the measurements were due to the differences in bags themselves. This is in stark contrast to the kitchen scales is not useful.

When we look at observations by operator, we get consistent distributions:

And when we examine operator-by-part, we can see that all operators are measuring each part consistently. So, unlike the kitchen scales, this measuring device does not appear to be susceptible to operator idiosynchrasies.

Conclusions: These scales are extremely precise. Probably too precise for weighing bags of M&Ms. It means we could use a potentially cheaper set of scales for controlling production.

🔝 Back to top.

Key Takeaways

A measurement or observation of a quantity can have multiple sources of variance:

  • The thing itself being observed,
  • The measuring device, or
  • The thing doing the observing (the observer).

By performing a systematic gauge study we can isolate these sources of error.

We have also seen that the interaction between operator and the measuring device can have an effect on measurement variance, as evinced by the study of my kitchen scales.

Moreover, it is important to select a gauge that is appropriate to the measuring task. My kitchen scales are sufficient for cooking at home, but probably not for a production line. Conversely, the lab scale are probably too_ precise for such a task

It seems that Mars Inc. (who produce M&Ms) are over-filling their bags. Value for money!

I also now have a lot of chocolate to eat.


Statistics can be fun, and delicious.

I made the graphs for this post using my newly released tufteplotlib package for Python.

🔝 Back to top.

]]>
<![CDATA[Whenever you measure something, there can be multiple sources of error: the measuring device, the thing being measured, the person or agent making the observation. If you want to be precise with your estimates, its important to understand where the error is coming from. In this post I show a gauge study I did on 2 sets of scales for measuring the weights of bags of M&Ms. You can find the data I collected here: gauge_study_kitchen_scales.csv gauge_study_lab_scales.csv]]>
Nonlinear Feedback Control in 3 Easy Steps2025-07-05T00:00:00+00:002025-07-05T00:00:00+00:00https://woolfrey.github.io/feedback/control/nonlinear/quaternion/robot/2025/07/05/nonlinear_feedback_control<![CDATA[

Many real autonomous systems are nonlinear, so we need more sophisticated nonlinear control methods to regulate them. In this article I start by showing how energy gives a much easier, and intuitive approach for reasoning about the stability of dynamic systems. Using this as a framework, I then introduce Lyapunov stability as a method for nonlinear analysis. Finally, I apply it to quaternions for orientation feedback control.

📄 Download a PDF version.

🧭 Navigation

Thinking Like a Physicist

The swinging pendulum on a grandfather clock moves back and forth in perpetuity (well, almost). Normally in control theory, when we talk about stability, we talk about system states where it is not moving. Or, in the case of trajectory tracking, the tracking error converges to zero. But a swinging pendulum isn’t exactly unstable. Its rhythmic motion is deliberately controlled at a rate of 1Hz. How can we reason about this kind of stability?


The pendulum of a grandfather clock swings back-and-forth, consistently, at 1Hz

Force?

If we took a Newtonian approach to the pendulum we would write the equations of motion as:

\[\begin{align} \overbrace{ml^2}^{I}\ddot{q} &= -mgl\cdot\sin(q) \tag{1a} \\ \ddot{q} &= -\tfrac{g}{l}\cdot\sin(q) \tag{1b} \end{align}\]

where:

  • $m\in\mathbb{R}^+$ is its mass (kg),
  • $l\in\mathbb{R}^+$ is the length of the pendulum,
  • $g\in\mathbb{R}$ is gravitational acceleration, and
  • $q = [0, 2\pi]$ is the angle from vertical alignment


Physical modeling of a swinging pendulum.

Now, at this point, depending how much of a masochist you are, you can solve this nonlinear differential equation as:

\[q(t) = 2\cdot\arcsin\left(k\cdot sn(\omega t, k)\right) \tag{2}\]

where:

  • $k = \sin(\tfrac{1}{2}q(0))$,
  • $\omega = \sqrt{\tfrac{g}{l}}$, and
  • $sn(\cdot)$ is the Jacobi sine elliptic function.

Or we can do what most lazy academics do and abstract away all utility by assuming sin(q) ≈ q when q ≈ 0 such that:

\[\begin{align} \ddot{q} &\approx -\tfrac{g}{l}q \tag{3a}\\ \Longrightarrow q(t) &= A\cdot\cos(\omega t+\phi) \tag{3b} \end{align}\]

where $A$ and $\phi$ are obtained from the initial conditions $q(0),~\dot{q}(0)$.

Either way, we end up with some complicated, trigonometric functions that:

  1. Don’t give us a clear method for reasoning about stability, and
  2. Don’t allow us to reason abstractly about other nonlinear systems.

🔝 Back to top.

No, Energy

Clearly, brute forcing mathematics won’t get us anywhere, which signifies we should change strategy. There is another physics paradigm we can appeal to instead: energy. The total energy in the system is:

\[E = \tfrac{1}{2}ml^2\dot{q}^2 + mgl\left(1 - \cos(q)\right) \tag{4}\]

such that its time derivative is:

\[\begin{align} \dot{E} &= ml^2\dot{q}\ddot{q} + mgl\dot{q}\cdot \sin(q) \tag{5a} \\ &= -mgl\dot{q}\cdot\sin(q) + mgl\dot{q}\cdot\sin(q) \tag{5b}\\ &= 0. \tag{5c} \end{align}\]

From the conservation of energy, its time derivative is zero. And, if we were to add a tiny bit of damping $b$:

\[ml^2\ddot{q} = -mgl\cdot\sin(q) - b\cdot\dot{q}^2 \tag{6}\]

we would arrive at:

\[\dot{E} = -b\cdot\dot{q}^2 \le 0 ~\forall\dot{q}. \tag{7}\]

which is non-increasing. We can conclude that a system is stable if:

  1. Its energy is non-increasing, and
  2. A better kind of stable if the energy is decreasing.

🔝 Back to top.

The Hamiltonian

The combination of $q$ and $\dot{q}$ is known as state space, and is very common in system dynamics. But Sir William Rowan Hamilton took an alternative approach using $q$ (the configuration), and momentum $p$. Combined, these form phase space. The momentum for the pendulum is:

\[p = ml^2\dot{q}. \tag{8}\]

When combined with the configuration $q$ we have phase space. Furthermore, the Hamiltonian (i.e. total energy) of the system is:

\[\mathcal{H}(p,q) = \frac{1}{2}\frac{p}{(ml^2)} + mgl(1-\cos(q)). \tag{9}\]

Since the Hamiltonian takes two inputs and maps them to a single (positive) output $\mathcal{H}:\mathbb{R}\times\mathbb{R}\mapsto\mathbb{R}^+$, we can visualise this as a 3D surface. Moreover, the time derivatives of the phase space coordinates are just partial derivatives of the Hamiltonian:

\[\dot{q} = \frac{\partial \mathcal{H}}{\partial p}~,~ \dot{p} = -\frac{\partial\mathcal{H}}{\partial q} \tag{10}\]

These give a gradient vector which points in the direction that the system is changing, which we can plot on the energy surface. In the conservative case, we would see that a point on the phase space follows the same contour line (level set) along the surface, which corresponds to constant energy. A damped system will always move down from its current energy level.


Phase portrait of a swinging pendulum. A conservative system remains on the same level set (contour line). A dissipative system always moves below its current level set.

🔝 Back to top.

Lyapunov Stability

Aleksandr Mikhailovich Lyapunov published a thesis on A General Problem of the Stability of Motion1. As you might have guessed, there are some mathematical definitions named after him that classify the different degrees of stability. But, from my experience, asking a mathematician about concepts in control theory is the very definition of masochism.

masochism: (noun)
Asking a mathematician about control theory.

I’ll circumvent the torture by giving a straightforward explanation. First, suppose we have a configuration $\mathbf{x}\in\mathbb{R}^n$, and some positive, scalar function that is zero for $\mathbf{x} = \mathbf{0}$:

\[V(\mathbf{x}) \ge 0 ~\forall\mathbf{x}\ne\mathbf{0} \quad,\quad V(\mathbf{0}) = 0. \tag{11}\]

From this we may denote 3 levels of stability.

Stable in the Sense of Lyapunov:
If the Lyapunov function remains bounded with a finite region $\epsilon$, then the system is said to be stable:

\[V(\mathbf{x}(t)) \le \epsilon ~\forall t > 0. \tag{12}\]

For brevity, it is often referred to as Lyapunov stable. We can see from the pendulum example that the energy $E(q,\dot{q})$ is a natural choice for a Lyapunov function. In the undamped case, $\epsilon$ is its initial energy which remains constant (bounded) for all time.

Asymptotically Stable:
A system is asymptotically stable if we can show the time derivative of the Lyapunov function is non-increasing:

\[\dot{V}(\mathbf{x},\dot{\mathbf{x}}) \le 0. \tag{13}\]

More precisely, it is monotonically non-increasing. It can decrease, or stay flat, but never increase. For the damped pendulum, its energy is decreasing with time, so it is asymptotically stable.

Exponentially Stable:
This is a more advanced case of the previous. If we can show that the time derivative is proportional to its current value, then it must be exponentially decreasing:

\[\dot{V}(\mathbf{x},\dot{\mathbf{x}}) = -\alpha\cdot V(\mathbf{x}) ~\Longrightarrow~ V(\mathbf{x}(t)) = e^{-\alpha t}\cdot V(\mathbf{x}(0)). \tag{14}\]

It is strictly decreasing, hence will converge to zero faster than the asymptotic case.

We can see that these 3 definitions form a nested hierarchy (see figure below):

  • All exponentially stable systems are asymptotically stable, and
  • All asymptotically stable systems are Lyapunov stable.


Every exponentially stable system is asymptotically stable, and every asymptotically stable system is Lyapunov stable.

🔝 Back to top.

A 3-Step Process

In a previous article, I showed a 3 step process for solving feedback control of linear systems. The exact same process applies here. If we have a position or configuration vector $\mathbf{x}\in\mathbb{R}^n$ then:

  1. Define the error from some desired configuration $\boldsymbol{\epsilon} = \mathbf{x}_d - \mathbf{x}$,
  2. Evaluate the time derivative $\dot{\boldsymbol{\epsilon}} = ~?$
  3. Choose the input that forces the error to decay: $\dot{\boldsymbol{\epsilon}} = -\mathbf{K}\boldsymbol{\epsilon}~\Longrightarrow~\boldsymbol{\epsilon}(t) = e^{-\mathbf{K}t}\boldsymbol{\epsilon}(0)$.

For the non-linear case, we need only make a slight modification:

  1. Define the error from some desired configuration $\boldsymbol{\epsilon} = \mathbf{x}_d - \mathbf{x}$,
    • Formulate a Lyapunov function $V(\boldsymbol{\epsilon}) \ge 0 ~\forall\boldsymbol{\epsilon}$.
  2. Evaluate the time derivative $\dot{V}(\boldsymbol{\epsilon},\dot{\boldsymbol{\epsilon}}) = ~?$
  3. Choose the control input that forces the error to asymptotically decay $\dot{V}(\boldsymbol{\epsilon},\dot{\boldsymbol{\epsilon}}) \le 0$.

There are no rules or guidelines for choosing the Lyapunov candidate function, other than that it is positive. But as we saw from the pendulum example, an energy-like quantity is an excellent choice. It allows us to appeal to physics principles which gives intuitive results.

Another thing to consider is that energy is quadratic with respect to velocity, and in some cases, with respect to configuration as well. This is really nice because quadratic functions have a single, global minimum, and we can visualise the energy of 1D systems easily (see figure below). For example, a mass-spring-damper system has the energy:

\[E(x,\dot{x}) = \frac{1}{2}m\dot{x}^2 + \frac{1}{2}k(x-x_0)^2 \tag{15}\]

where $x_0$ is the resting position. It is quadratic in both position (configuration), and velocity. In a more abstract sense, it contains the sum-of-squares $x^2,~\dot{x}^2$. So, the sum-of-squared errors is often the best choice for a Lyapunov candidate function.



The energy in a mass-spring-damper system is quadratic with respect to both position, and velocity.

🔝 Back to top.

Quaternions

Quaternions are sophisticated mathematical objects that are used to represent orientation in 3D space. They are used in animation, videogames, aerospace, and robotics. In the latter two fields, orientation control is particularly important. Quaternions form a Lie group $\mathbb{H}$, and those which represent orientation are a subset of this $\mathbb{S}^3\subset\mathbb{H}$. Lie groups have specific rules for combining objects, which can make them highly nonlinear. In specific cases, Lyapunov stability is the most straightforward method for stability proofs.

A quaternion contains four elements, often represented as a scalar part and vector part:

\[\boldsymbol{v} = \begin{bmatrix} \eta \\ \boldsymbol{\varepsilon} \end{bmatrix} \in\mathbb{S}^3 \subset \mathbb{H} \tag{16}\]

which, in the case of orientation, we have:

  • $\eta = \cos(\tfrac{1}{2}\alpha) = [0, 1]$ as the scalar,
  • $\boldsymbol{\varepsilon} = \sin(\tfrac{1}{2}\alpha)\hat{\mathbf{a}} \in\mathbb{R}^3$ as the vector,
  • $\alpha = [0,2\pi]$ is the angle of rotation, and
  • $\hat{\mathbf{a}}\in\mathbb{R}^3 ~:~ \hat{\mathbf{a}}^T\hat{\mathbf{a}} = 1$ is the axis of rotation.

The unit norm condition (the Euler-Rodrigues parameters) is such that:

\[\eta^2 + \boldsymbol{\varepsilon}^T\boldsymbol{\varepsilon} = 1. \tag{17}\]

Before we can formulate the feedback control problem, we will need several important properties to exploit.

🔝 Back to top.

Properties of Quaternions

Closure:
This is a fundamental property of Lie groups. When we combine 2 elements in a Lie group, we get a 3rd. For the quaternion, we follow a unique arithmetic for combining rotations together:

\[\boldsymbol{v}_1\cdot\boldsymbol{v}_2 = \begin{bmatrix} \eta_1\eta_2 - \boldsymbol{\varepsilon}_1^T\boldsymbol{\varepsilon}_2 \\ \eta_1\boldsymbol{\varepsilon}_2 + \eta_2\boldsymbol{\varepsilon}_1 + \boldsymbol{\varepsilon}_1\times\boldsymbol{\varepsilon}_2 \end{bmatrix} \in \mathbb{S}^3\subset\mathbb{H}. \tag{18}\]

Identity:
This is the element of a group that results in no change. By multiplying a quaternion with the identity, we end up with the original quaternion. The identity of a quaternion contains zero in the vector component:

\[\boldsymbol{\iota} = \begin{bmatrix} 1 \\ \mathbf{0} \end{bmatrix}~\Longrightarrow~ \boldsymbol{v}\cdot\boldsymbol{\iota} = \boldsymbol{v}. \tag{19}\]

We can reverse engineer this to see that it equates to zero rotation: $\alpha = 2\cdot\arccos(1) =0$.

Inverse:
Applying the closure property to the inverse element of a Lie group leads to the identity. For quaternions, we negate the vector component, otherwise known as its conjugate:

\[\bar{\boldsymbol{v}} = \begin{bmatrix} \phantom{-}\eta \\ -\boldsymbol{\varepsilon} \end{bmatrix} ~\Longrightarrow~ \boldsymbol{v}\cdot\bar{\boldsymbol{v}} = \boldsymbol{\iota}. \tag{20}\]

Time Derivative:
Evaluating the time derivative for a quaternion involves appealing to L’Hopital’s rule, and the closure property. It’s a little complex, so the proof is outside the scope of this article. Simply stated, the time derivative is:

\[\begin{bmatrix} \dot{\eta} \\ \dot{\boldsymbol{\varepsilon}} \end{bmatrix} = \frac{1}{2} \begin{bmatrix} 0 & -\boldsymbol{\omega}^T \\ \boldsymbol{\omega} & -S(\boldsymbol{\omega}) \end{bmatrix} \begin{bmatrix} \eta \\ \boldsymbol{\varepsilon} \end{bmatrix} \tag{21}\]

where $\boldsymbol{\omega}\in\mathbb{R}^3$ is the angular velocity vector (rad/s), and:

\[S(\boldsymbol{\omega}) = \begin{bmatrix} \phantom{-}0 & -\omega_z & \phantom{-}\omega_y \\ \phantom{-}\omega_z & \phantom{-}0 & -\omega_x \\ -\omega_y & \phantom{-}\omega_x & 0 \end{bmatrix} \tag{22}\]

is a skew-symmetric matrix.

Quaternion Error:
Now we can give a proper definition to the quaternion error. We apply the closure and inverse between the desired and actual:

\[\boldsymbol{e} = \boldsymbol{v}_d\cdot\bar{\boldsymbol{v}} = \begin{bmatrix} \eta_e \\ \boldsymbol{\varepsilon}_e \end{bmatrix} = \begin{bmatrix} \eta_d\eta + \boldsymbol{\varepsilon}_d^T\boldsymbol{\varepsilon} \\ \eta\boldsymbol{\varepsilon}_d -\eta_d\boldsymbol{\varepsilon} - \boldsymbol{\varepsilon}_d\times\boldsymbol{\varepsilon} \end{bmatrix} \tag{23}\]

We can see that if $\boldsymbol{v} = \boldsymbol{v}_d$ then this leads to the identity.

An analogy is addition over real vectors $\mathbf{x}\in\mathbb{R}^n$ which also form a Lie group. To define error, we would use addition (closure) with subtraction (inverse):

\[\boldsymbol{\epsilon} = \mathbf{x}_d + (-\mathbf{x}). \tag{24}\]

When $\mathbf{x} = \mathbf{x}_d$, we get the identity element (zero):

\[\mathbf{x} = \mathbf{x}_d~\longrightarrow~\boldsymbol{\epsilon} = \mathbf{0}. \tag{25}\]

🔝 Back to top.

Feedback Control

This proof for quaternion feedback control is from a research paper you can find online2. To replicate it, we will follow my 3 step process. First, we define a Lyapunov candidate function as the sum-of-square errors:

\[\begin{align} V(\boldsymbol{e}) &= \left(\eta_d - \eta\right)^2 + \left(\boldsymbol{\varepsilon}_d - \boldsymbol{\varepsilon}\right)^T\left(\boldsymbol{\varepsilon}_d - \boldsymbol{\varepsilon}\right) \tag{26a} \\ &= 2 - 2\left(\eta_d\eta +\boldsymbol{\varepsilon}_d^T\boldsymbol{\varepsilon}\right) \ge 0 \tag{26b} \end{align}\]

Notice that with sufficient algebraic manipulation this reduces to the scalar component of the quaternion error.

Second, we take the time derivative, substituting in the quaternion velocity equation to obtain:

\[\dot{V}(\boldsymbol{e},\dot{\boldsymbol{e}}) = -2\dot{\eta}_e = \left(\boldsymbol{\omega}_d - \boldsymbol{\omega}\right)^T\boldsymbol{\varepsilon}_e. \tag{27}\]

Third, we choose our control input $\boldsymbol{\omega}$ so that this will asymptotically decay:

\[\boldsymbol{\omega} \triangleq \boldsymbol{\omega}_d + \mathbf{K}\boldsymbol{\varepsilon}_e \tag{28}\]

where $\mathbf{K}\in\mathbb{R}^{3\times 3}$ is a gain matrix. If we substitute this in, then:

\[\dot{V}(\boldsymbol{e},\dot{\boldsymbol{e}}) = -\boldsymbol{\varepsilon}_e^T\mathbf{K}\boldsymbol{\varepsilon}_e < 0 ~\forall \boldsymbol{\varepsilon}_e\ne\mathbf{0}. \tag{29}\]

If we design $\mathbf{K}$ so that it is positive definite (symmetric, with positive, real, eigenvalues) then this is guaranteed to be monotonically non-increasing. Thus the feedback control law is asymptotically stable. An easy choice for $\mathbf{K}$ is to make it a diagonal matrix.

The animation below shows a robot using quaternion feedback control. It is a standard part of RobotLibrary.


Quaternion feedback used to control the orientation.

A very important note here: both $\boldsymbol{v}$ and $-\boldsymbol{v}$ represent the same orientation with quaternions! This can cause your robot to suddenly spin 360$^\circ$ toward the desired orientation. Since quaternions can be represented as a 4D vector, we can check if they are point in the same direction using the dot product. If $\boldsymbol{v}_d \cdot \boldsymbol{v} < 0$, then simply use $-\boldsymbol{\varepsilon}_e$ in the feedback control law to spin the opposite direction.

🔝 Back to top.

  1. Lyapunov, A. M. (1892). The General Problem of the Stability of Motion. Kharkov Mathematical Society. Originally in Russian. English translation by A.T. Fuller, London: Taylor & Francis, 1992. 

  2. Yuan, J. (1988). Closed-loop manipulator control using quaternion feedback. IEEE Journal on Robotics and Automation, 4(4):434–440. 

]]>
<![CDATA[Many real autonomous systems are nonlinear, so we need more sophisticated nonlinear control methods to regulate them. In this article I start by showing how energy gives a much easier, and intuitive approach for reasoning about the stability of dynamic systems. Using this as a framework, I then introduce Lyapunov stability as a method for nonlinear analysis. Finally, I apply it to quaternions for orientation feedback control.]]>
Quantifying Quality2025-06-26T00:00:00+00:002025-06-26T00:00:00+00:00https://woolfrey.github.io/customer/six%20sigma/define/quality/lss/2025/06/26/quantifying-quality<![CDATA[

What does quality mean for your customer? A core tenet of the Lean Six Sigma project is empiricism, so it is necessary to articulate, and quantify exactly what customer means. The Critical to Quality (CTQ) tree is a useful tool and thinking exercise for developing quantifiable measures of quality. These become the basis for data collection in the Measure phase, but can also be useful as engineering specifications, and key performance indicators.

🧭 Navigation

What Is Quality, Precisely?

Lean Six Sigma (LSS) is a project management methodology used to improve the performance and efficiency of business and engineering processes. It combines the heuristics of the Toyota Production System with statistical process control from Motorola. An LSS project is divided in to 5 stages:

  1. Define the problem,
  2. Measure the current performance,
  3. Analyse root causes of the problem,
  4. Improve the system, and
  5. Control the process.

In the Define phase it is also necessary to articulate who the customer is of a process. In a previous post , I showed how to use the SIPOC tool in order identify the customer(s). Once identified, is is then necessary to speak with the customer in order to quantify what quality means to them.

The problem is, a customer might not always be clear or precise in what they mean. But, since LSS is based on empiricism, it is necessary to translate vague customer requirements in to quantifiable metrics. That way we can collect and scrutinise the data in the Measure and Analyse phases.

The Citical-To-Quality (CTQ) tree is a standard LSS tool used in the Define phase. It helps refine vague, or subjective notions about quality in to objective measurements.1

A CTQ tree contains 3 (or 4) components:

  1. What the customer needs,
  2. Drivers of quality,
  3. Requirements, or critical-to-quality-factors, and
  4. How the measurement is taken.


The structure of a CTQ tree.

🔝 Back to top.

Quality, Step-by-Step

1) Defining the Need

The first step is to provide a concise description of what the customer requires. It should be 1 sentence long, and ideally use adjectives to give a notion of quality factors. It should describe what quality is, but not how it is defined. Any lay person should have a basic understanding of what is being asked for.

Here are some bad examples I found on the internet, and what I think is a better definition:

Original My Definition
I need my paycheck. A paycheck delivered regularly, on time, in the correct amount.
Ease of operation and maintenance. Consistent operation with minimal defects & breakdowns.
  Effective maintenance with minimal downtime.
Monthly project report. A timely monthly report with sufficient information on progress.

Notice that all these examples don’t give adequate descriptors from which to build upon. The “Need” should be a counter-factual to an existing problem (hence why an LSS project is being undertaken). For example, “A monthly project report” could simply be an A4 paper, with a single sentence “All good.”, delivered any time within a 4 week period. But if the problem is that project reports are constantly late, and lacking detail, then the need should describe the converse: timely, and with sufficient information on progress.

2) Elaborating quality drivers

The next step is to list what quality entails. These should be adjectives. It can be difficult for people to start jumping to numerical quantities, so using descriptive words helps build momentum. A useful thing to ask is what does good look like?. Conversely, if you’re having trouble coming up with ideas, a better question to ask is what does bad look like?.

In a previous post, I mentioned a poor experience I had at a hotel. Below are some examples of turning a bad experience in to a performance requirement:

  • Tepid water in the shower $\longrightarrow$ How water consistently available in the bathroom.
  • Having to turn the tap really hard to stop water flowing $\longrightarrow$ Faucet can be turned off with minimal effort.
  • Coffee is weak and watery $\longrightarrow$ Espresso available for breakfast.

3) Defining performance requirements

The next step is to turn these descriptions in to quantifiable metrics. This should give a precise number with some kind of constraint, preferably with a mathematical qualifier $=, >, <$. Continuing my example from above we could define:

  • How water consistently available in the bathroom $\longrightarrow$ Water reaches 50 $^\circ$ C.
  • Faucet can be turned off with minimal effort $\longrightarrow$ Water stops flowing between 0.5 Nm ~ 2 Nm
  • Espresso available for breakfast $\longrightarrow$ Yes / No.

4) Define operational measurements

It’s often good to define how the measurement should be taken. This way we can:

  1. Agree that everyone is measuring the same thing, and
  2. Different people will measure consistently.

In the Define phase, this operational definition only needs to be high-level. If a more detailed procedure is necessary, it can be elaborated on in the Measure phase as part of the data collection plan.

  • Water reaches 50 $^\circ$ C $\longrightarrow$ Take temperature of water withing 1cm of the shower head with a thermometer.
  • Water stops flowing between 0.5 Nm ~ 2 Nm $\longrightarrow$ Use a torque wrench set to 2 Nm.

🔝 Back to top.

Examples

Lamingtons

In a previous post I used an example of the time I made lamingtons for friends and colleagues whilst I was living in Italy. These are an Australian delicacy, consisting of a sponge cake, dipped in chocolate sauce, and rolled in coconut shavings. I developed a SIPOC, which is a tool used to identify who receives the output of a process, i.e. the customer. Then based on customer feedback, we can proceed with developing the CTQ tree.


Delicious lamingtons, made by me!

One of the problems I had was that my first batch was perfect, and all subsequent batches were too dense and flat. Below is how a CTQ tree might be developed for making good lamingtons:


A CTQ tree for properly made lamingtons.

On my first batch (where the sponge cake quality was perfect), the chocolate layer was quite thick. I thought it was too much, but my friend said it was perfect. She didn’t like that commercially made lamingtons have thin chocolate layers in order to save money. This highlights the necessity of getting direct customer feedback when creating the CTQs (thanks Trina!).

🔝 Back to top.

Medical Phantom Organ

Whilst working on the Terabotics project with the University of Leeds, other postdocs and I won a competition for our mini research proposal. The idea was to make artificial limbs with the same optical and mechanical properties as a human (a phantom limb / organ). It could be used for testing and experiments using THz sensing in skin contact measurements.

In late 2024, we met at the University of Warwick for a workshop to develop a plan for how we were going to make these things. It had never been done before (phantom organs exist, but not phantom limbs for this specific technology), so the CTQ tree was the perfect tool to take a vague definition and refine it into quantitative engineering specifications.

Below is the initial draft we developed, and below is my refined version.



A CTQ tree to define engineering specifications of a phantom forearm for skin contact sensing.

One of the interesting outcomes from applying this tool was that, during discussions, we realized the compression of the skin decays asymptotically over time. This can be quantified using a mathematical property called a time constant. The process of developing the tool itself helped us articulate and define this important metric.

Another thing to note is, the first draft we developed (pictured above) had a few question marks, uncertainties, and deficiencies. It can be difficult to get produce a correct CTQ on the first go. After some thought, and reflection, I developed a better one. It should be standard practice to take some extra time to ensure the metrics are correctly defined. This will have a big impact on what data is collected during the Measure phase. Conversely, it’s also OK to get things wrong, and go back and change it as necessary.

🔝 Back to top.

Customer vs Process

From about 2006 to early 2007, I worked as a barista. I became quite skilled at making coffee, and I had a very strict process. The proof of my efforts was the reputation I developed for making excellent quality coffee.


A cappuccino I made working as a barista in 2007.

Making good coffee is surprisingly technical. For example:

  • The extraction time for the crema must be within 24 to 26 seconds.
    • $<$ 24 seconds, and the coffee is too weak.
    • $>$ 26 seconds, and the coffee is too bitter.
  • The milk has to be heated to a maximum of 60 $^\circ$ C. If it gets too hot, the fat melts, and you can’t generate good foam.
  • Humidity affects the extraction time, and you have to adjust the grind size, and tamping force throughout the day.
  • You need to pour the milk on to the crema immediately, otherwise it start losing flavour.

A CTQ tree is developed based on customer requirements, but how we achieve those requirements internally might be different. In this case, it is necessary to use tools like the quality function deployment (QFD) that maps the relationship between what the customer wants (i.e. the CTQs), versus how to achieve it internally within the process.

For example, here is a customer-centric CTQ of what makes a good cappucino:


A CTQ tree for a cappuccino.

But based on my expert knowledge above, here is the QFD that relates the quality factors for a good coffee to the technical aspects involved in making it. Notice that there are multiple factors that affect the flavour, and, from a production side, many of them are interrelated (as seen in the “roof” part of the diagram).


A matrix diagram, or quality function deployment (QFD), relating customer requirements to technical specifications for coffee production.

The customer doesn’t care about technical details like extraction time, the precise temperature measurements, etc. They only care that the coffee tastes good, is well made, and is sufficiently hot. But in order to meet this customer expectations, the actual coffee making process must be very tightly controlled.

🔝 Back to top.

Key Takeaway

The Critical to Quality Tree is a useful tool and structued thinking exercise for translating vague, subjective customer needs in to quantifiable performance metrics. These become the basis for developing a data collection plan in the the Measure phase of the project. A well thought-out CTQ tree can provide useful KPIs for a business or product.

Some of the important things I’ve learned over the years is to:

  • Have a short, but descriptive need. It should convey basic descriptors of quality, without saying too much.
  • Quality drivers are subjective, they should involve adjectives.
  • It’s helpful to pose a question when developing drivers: “What does good look like?”, or “What does bad look like?”.
  • The requirements, or CTQs, must have a quantifiable metric attached, and some sort of constraint $=, >, <$ where possible.
  • It’s useful to provide an operation definition for how a measurement is taken.

🔝 Back to top.

  1. The Tree Diagram is one of the 7 Management & Planning Tools

]]>
<![CDATA[What does quality mean for your customer? A core tenet of the Lean Six Sigma project is empiricism, so it is necessary to articulate, and quantify exactly what customer means. The Critical to Quality (CTQ) tree is a useful tool and thinking exercise for developing quantifiable measures of quality. These become the basis for data collection in the Measure phase, but can also be useful as engineering specifications, and key performance indicators.]]>
Hamiltonian Mechanics2025-06-25T00:00:00+00:002025-06-25T00:00:00+00:00https://woolfrey.github.io/lagrange/hamiltonian/mechanics/hamilton/2025/06/25/hamiltonian-mechanics<![CDATA[

In classical mechanics, the Lagrangian is defined as the difference between kinetic and potential energy. We use this to solve for the equations of motion for systems of rigid bodies. But what is its relationship to the conservation of energy, which states the sum of kinetic and potential is constant? In this article I show how to derive the Hamiltonian from the Lagrangian, i.e. the sum of kinetic and potential for rigid body systems. I then show how momentum is used in lieu of velocity to define phase space, and touch on its implications with respect to the Hamiltonian.

📄 Download a PDF version.

🧭 Navigation

The Conservation of Energy

One of the most important principles in physics is the conservation of energy. When an apple falls from a tree, it loses potential energy but gains kinetic energy. The total energy remains constant for all time (until it hits someone on the head).


A falling apple loses potential energy and gains kinetic energy as it falls.

We can use this principle to solve for the state of an object at any given time. If $x\in\mathbb{R}$ is its position, and $\dot{x}\in\mathbb{R}$ is its velocity, then at any 2 given points in time it must hold that:

\[\tfrac{1}{2}m\dot{x}_1^2 + mgx_1 = \tfrac{1}{2}m\dot{x}_2^2 + mgx_2. \tag{1}\]

in which $m\in\mathbb{R}^+$ is the mass (kg), and $g\in\mathbb{R}$ is gravitational acceleration. So, for example, given $x_1,~\dot{x}_1$ and $x_2$ we could determine the speed just before it hits the ground $\dot{x}_2$.


We can use conservation of energy to solve for state variables at different points in time.

In fact, we can use the conservation of energy to derive Newton’s second law: \(\begin{align} \frac{d}{dt}\left(\tfrac{1}{2}m\dot{x}^2 + mgx\right) &= 0 \tag{2a} \\ m\dot{x}\ddot{x} + mg\dot{x} &= 0 \tag{2b} \\ m\ddot{x} &= -mg. \tag{2c} \end{align}\)

🔝 Back to top.

Systems of Rigid Bodies

The Lagrangian

Lagrangian mechanics extends Newton’s second law and enables us to solve the dynamic equations of motion for systems of rigid bodies. Suppose $\mathbf{q}\in\mathbb{R}^n$ is the configuration vector, and $\dot{\mathbf{q}}\in\mathbb{R}^n$ is the velocity vector. Hamilton noted that we could first define a functional as the difference between kinetic energy (kinetic energy is now configuration dependent) and potential energy.1

\[\mathcal{L}(\mathbf{q},\dot{\mathbf{q}}) = \mathcal{K}(\mathbf{q},\dot{\mathbf{q}}) - \mathcal{P}(\mathbf{q}) : \mathbb{R}^{n}\times\mathbb{R}^n\mapsto\mathbb{R}. \tag{3}\]

This is known as the Lagrangian. Then, from the calculus of variations, this function is an extremum (maximum or minimum) when its variation is zero $\delta\mathcal{L} = 0$. This leads to the Euler-Lagrange equation:

\[\frac{d}{dt}\left(\frac{\partial\mathcal{L}}{\partial\dot{\mathbf{q}}}\right) - \frac{\partial\mathcal{L}}{\partial\mathbf{q}} = \mathbf{0}. \tag{4}\]

Using Hamilton’s definition Eqn. (3) gives Lagrange’s equations of motion.2

This is strange, though, right? From the conservation of energy we would expect the sum of kinetic and potential, yet the Lagrangian requires the difference between the two. What is the relationship between them?

🔝 Back to top.

Deriving the Hamiltonian

If we were to take the time derivative of Eqn. (3) then we would obtain:

\[\begin{align} \dot{\mathcal{L}} &= \dot{\mathbf{q}}^T\frac{\partial\mathcal{L}}{\partial\mathbf{q}} + \ddot{\mathbf{q}}^T\frac{\partial\mathcal{L}}{\partial\dot{\mathbf{q}}} \tag{5a} \\ &= \underbrace{\dot{\mathbf{q}}^T\frac{d}{dt}\left(\frac{\partial\mathcal{L}}{\partial\dot{\mathbf{q}}}\right) + \ddot{\mathbf{q}}^T\frac{\partial\mathcal{L}}{\partial\dot{\mathbf{q}}}}_{\frac{d}{dt}\left(\frac{\partial\mathcal{L}}{\partial\dot{\mathbf{q}}}\right)} \tag{5b} \end{align}\]

But now we can integrate with respect to time to get back an expression containing the original Lagrangian:

\[\mathcal{L}(\mathbf{q},\dot{\mathbf{q}}) = \dot{\mathbf{q}}^T\frac{\partial\mathcal{L}}{\partial\dot{\mathbf{q}}} + \text{const.} \tag{6}\]

Notice how the constant appears due to the rules of integral calculus. Now we can simply re-arrange and define the Hamiltonian:

\[\mathcal{H}(\mathbf{q},\dot{\mathbf{q}}) = \dot{\mathbf{q}}^T\frac{\partial\mathcal{L}}{\partial\dot{\mathbf{q}}} - \mathcal{L}(\mathbf{q},\dot{\mathbf{q}}) \tag{7}\]

which is constant in a conservative system.

In a rigid body system the kinetic energy is:

\[\mathcal{K}=\frac{1}{2}\dot{\mathbf{q}}^T\mathbf{M}(\mathbf{q})\dot{\mathbf{q}} \tag{8}\]

where $\mathbf{M}(\mathbf{q})=\mathbf{M}(\mathbf{q})^T\in\mathbb{R}^{n\times n}$ is the generalised inertia matrix. By putting this back in to Eqn. (7) we now obtain a much more familiar form:

\[\mathcal{H}(\mathbf{q},\dot{\mathbf{q}}) = \mathcal{K}(\mathbf{q},\dot{\mathbf{q}}) + \mathcal{P}(\mathbf{q}). \tag{9}\]

🔝 Back to top.

Momentum

What Newton Really Said

Newton didn’t actually express his second law as “force = mass x acceleration”, as people often recite. What he said was:

“Lex II: Mutationem motus proportionalem esse vi motrici impressae, et fieri secundum lineam rectam qua vis illa imprimitur.” 3

or translated to English:

“Law II: The change of motion is proportional to the motive force impressed; and is made in the direction of the straight line in which that force is impressed.”

Here, “motion” is better conceptualised as “momentum”, i.e. the product of mass and velocity. The time derivative of momentum is equal to the forces applied. In a 1D system we would write:

\[p = m\dot{x}~\Longrightarrow~ f = \frac{dp}{dt} = m\ddot{x}. \tag{10}\]

For a system of rigid bodies, we denote the generalised momentum as:

\[\mathbf{p} \triangleq \mathbf{M}(\mathbf{q})\dot{\mathbf{q}} = \frac{\partial\mathcal{L}}{\partial\dot{\mathbf{q}}} \tag{11}\]

which, from Eqn. (3) and Eqn. (8), is the partial derivative of the Lagrangian with respect to velocity.

🔝 Back to top.

Re-Visiting the Lagrangian

If we re-arrange Eqn. (4) and substitute in Eqn. (11) we obtain:

\[\begin{align} \frac{d}{dt}\left(\frac{\partial\mathcal{L}}{\partial\dot{\mathbf{q}}}\right) &= \frac{\partial\mathcal{L}}{\partial\mathbf{q}} \tag{12a}\\ \dot{\mathbf{p}} &= \frac{\partial\mathcal{K}}{\partial\mathbf{q}} - \frac{\partial\mathcal{P}}{\partial\mathbf{q}} \tag{12b} \\ \mathbf{M}\ddot{\mathbf{q}} + \dot{\mathbf{M}}\dot{\mathbf{q}} &= \frac{1}{2}\dot{\mathbf{q}}^T\frac{\partial\mathbf{M}}{\partial\mathbf{q}}\dot{\mathbf{q}} - \mathbf{g}. \tag{12c} \end{align}\]

where $\mathbf{g} = \frac{\partial \mathcal{P}}{\partial\mathbf{q}}$ it the generalised gravitational force vector.

On the left hand side of Eqn. (12c) we have the familiar mass by acceleration $\mathbf{M}\ddot{\mathbf{q}}$. But a new term appears, $\dot{\mathbf{M}}\dot{\mathbf{q}}$. This is because, in a system of rigid bodies, the distribution of mass can also change over time. Then, on the right hand side, we see the effect of gravity $\mathbf{g}$, but also the forces due to a configuration change.

So Lagrange’s equations of motion are just a generalisation of Newton’s second law. The time derivative of momentum is equal to the force applied. But it accounts for the change in configuration, and the subsequent change in the distribution of mass.

If we consider the case where $\dot{\mathbf{q}} = \mathbf{0}$, then Eqn. (12c) reduces to a much more familiar form:

\[\mathbf{M}\ddot{\mathbf{q}} = -\mathbf{g}. \tag{13}\]

🔝 Back to top.

Phase Space

If we have the system configuration $\mathbf{q}$ and its time derivative $\dot{\mathbf{q}}$, then we have all the information we need to reconstruct its equations of motion under its own impetus.4 The concatenation of the two gives the state space vector:

\[\mathbf{x} = \begin{bmatrix} \mathbf{q} \\ \dot{\mathbf{q}} \end{bmatrix} ~\longrightarrow ~ \dot{\mathbf{x}} = \begin{bmatrix} \dot{\mathbf{q}} \\ \ddot{\mathbf{q}} \end{bmatrix}. \tag{14}\]

We could instead consider phase space as configuration and momentum:

\[\mathbf{y} = \begin{bmatrix} \mathbf{q} \\ \mathbf{p}\end{bmatrix}. \tag{15}\]

Now using Eqn. (7) & (11) we can express the Hamiltonian as a function of momentum:

\[\mathcal{H}(\mathbf{p},\mathbf{q}) = \dot{\mathbf{q}}^T\mathbf{p} - \mathcal{L}(\mathbf{q},\dot{\mathbf{q}}). \tag{16}\]

We can actually use this to generate the dynamic equations of motion. The trick is to treat $\mathbf{p}$ and $\mathbf{q}$ as independent variables. First, we can easily recover the velocity from Eqn. (16) by taking the partial derivative with respect to momentum:

\[\dot{\mathbf{q}} = \frac{\partial\mathcal{H}}{\partial\mathbf{p}}. \tag{17}\]

Then from Eqn. (12) & (16) we can get the time derivative of momentum:

\[\begin{align} \dot{\mathbf{p}} = \frac{d}{dt}\left(\frac{\partial\mathcal{L}}{\partial\dot{\mathbf{q}}}\right) = \frac{\partial\mathcal{L}}{\partial\mathbf{q}} = -\frac{\partial\mathcal{H}}{\partial\mathbf{q}}. \tag{18} \end{align}\]

This is interesting because the Hamiltonian is 1 dimensional, and this means we can conceptualise it as an energy surface. This also means the time derivative of the phase space is a vector on this surface:

\[\dot{\mathbf{y}} = \begin{bmatrix} \dot{\mathbf{q}} \\ \dot{\mathbf{p}} \end{bmatrix} = \begin{bmatrix} \phantom{-}\partial\mathcal{H}/\partial\mathbf{p} \\ -\partial\mathcal{H}/\partial\mathbf{q} \end{bmatrix}. \tag{19}\]

A conservative system $\dot{H} = 0$ will follow a single contour line along this surface, i.e. a fixed level set.


A conservative system will remain on a single contour line in the phase portrait.

🔝 Back to top.

  1. [Hamilton, 1835] Hamilton, W. R. (1835). Second essay on a general method in dynamics. Philosophical Transactions of the Royal Society of London, 125:95–144. 

  2. [Lagrange, 1788] Lagrange, J.-L. (1788). Mécanique analytique. Imprimerie de la République, Paris. Available online at various archives. 

  3. [Newton, 1687] Newton, I. (1687). Philosophiæ Naturalis Principia Mathematica. Royal Society, London. First edition. 

  4. We can simple re-arrange Eqn. (12c) to solve for $\ddot{\mathbf{q}}$. 

]]>
<![CDATA[In classical mechanics, the Lagrangian is defined as the difference between kinetic and potential energy. We use this to solve for the equations of motion for systems of rigid bodies. But what is its relationship to the conservation of energy, which states the sum of kinetic and potential is constant? In this article I show how to derive the Hamiltonian from the Lagrangian, i.e. the sum of kinetic and potential for rigid body systems. I then show how momentum is used in lieu of velocity to define phase space, and touch on its implications with respect to the Hamiltonian.]]>
Customer Satisfaction: Not Everything Is Equal2025-06-21T00:00:00+00:002025-06-21T00:00:00+00:00https://woolfrey.github.io/customer/six%20sigma/define/quality/lss/2025/06/21/kano<![CDATA[

Not all features of a product or service are of equal value. The Kano model is a concept for categorising and prioritising them. Using them we can distinguish between what we need to even do business, versus what will attract customers, versus what makes a market leader. I also give examples of how I’ve applied it to some of my engineering projects. It’s a good tool for task prioritisation.

🧭 Navigation

Overview

The Kano model is a tool often used in the Lean Six Sigma (LSS) project management methodology to enumerate and categorise customer requriments. LSS projects are divided in to 5 phases:

  1. Define the problem,
  2. Measure the current performance,
  3. Analyse the root cause,
  4. Improve the process, and
  5. Control the process.

In the Define phase, it is necessary to articulate the quality metrics of a product or service with respect to the customer. Often customers have many requirements, needs, and wants. They can often be subjective, and conflicting. A man named Noriaki Kano developed a conceptual model that can help categorise and prioritise quality features.

What Makes a Good (or Bad) Experience?

I travel a lot for work, so I’ve spent a decent amount of time in a variety of hotel rooms. I stayed in a cheap hotel recently (for a leisurely weekend away), and there were a few things that made it a dissatisfying experience:

  • Warm (not hot) water in the shower.
  • Toilet not flushing properly.
  • Faucet in the bathroom sink hard to turn on / off.
  • Hash browns for breakfast were partially cold.
  • Coffee was weak and diluted (I love a potent espresso in the morning).

What bemused me was how nonchalant the owner was about me having to open up the cistern to manually fiddle with it and flush the toilet every time.

Conversely, when I went to Japan in 2022 for a conference, I stayed at an incredible hotel in Kyoto.


A photo I took of the Prince Kyoto Takaragaike from the conference center.

Some of the things that stood out were:

  • An interior courtyard,
  • Western & traditional Japanese breakfast,
  • A bellboy who carried my luggage,
  • A koi pond outside,
  • A traditional Japanese teahouse,
  • Enormous rooms,
  • Beautiful scenery.


The view from my hotel room at the Prince Kyoto Takaragaike.

Clearly, there are minimum expectations we have about a decent hotel room, like functional plumbing. And there are things we would expect to get better the more we pay for it, like breakfast options, and room sizes. But there are also things that amaze us; koi ponds, tea houses, etc.

🔝 Back to top.

The Kano Model

The Kano model categorises features of a product or service in to 3 categories:

  • Minimum requirements: The bare essentials that you need to start a business.
  • Performance requirements: Enable you to compete with rival businesses, and generate profit.
  • Innovative features: Makes you a market leader.

We can plot this on a Cartesian graph with 2 axes:

  1. Customer satisfaction, ranging from dissatisfied to satisfied, and
  2. Level of implementation, from absent to fully implemented.


The Kano model conceptualises customer satisfaction versus level of implementation.

According to Kano’s conceptual model, minimum requirements must be implemented. But, no matter how much effort you put in to them, your customer will not be impressed. If they’re absent, however, or done poorly, your customer will be very unhappy. A hot shower is a hot shower, but a tepid shower on a cold, rainy day in England is awful!

Conversely, customer satisfaction increases proportionally to the level of implementation of the performance requirements. A bigger hotel room, and more breakfast options available? Yes please! And if they can be done for the same price, or cheaper, you will easily put your rivals out of business.

Innovative features (sometimes called delighters, or wow factors) are unexpected, but amaze the customer. A koi pond, and traditional Japanese tea house? Wow! These are features that can transform an industry. Free Wi-Fi at a hotel used to be an exciting feature, and distinguished a quality hotel from its rivals. Now, however, it’s become a minimum expectation. Innovative features often become minimum standards over time, especially if they can be done economically.

To help with determining the different features of a Kano model, I developed my own sorting algorithm.


My sorting algorithm for Kano model categories.

🔝 Back to top.

Examples

Underwater Robot

From about 2015 - 2018, I worked on the submerged pile inspection robot (SPIR) as part of my PhD. This was an underwater robot designed to clean marine growth off underwater bridge colums. In May 2017 I hosted a design worskhop with the team to review problems with the previous 2 prototypes, and what we need to do for the 3rd protoype.


The third prototype of the submerged pile inspection robot (SPIR).

We began by brainstorming all the kind of problems we had when working with previous prototype, and what we needed to improve. This included use-case scenarios like:

  • Assembly & maintenance,
  • Transportation, and
  • Operation.

We then used Affinity Diagrams to group ideas together based on common themes. This made the vast number of ideas easier to manage.


Brainstorming and affinity diagrams for the SPIR prototype development.

We sorted all these ideas using the algorithm above. We also added a few ideas of what would be really cool to implement (if we had the time).



The Kano model developed for the SPIR.

We each received $n = \frac{15}{3} = 5$ (number of ideas divided by 3) votes to place on what we thought was most important to work on. Notice that we did not vote on the basic features / minimum requirements. These must be done.

We can put these votes in to a Pareto chart to see what the team thought was most important.


A pareto chart of votes on the most important features to implement.

Note: This Pareto chart doesn’t follow the 80/20 rule very well, which implies that the features haven’t been adequately categorised or articulated.

At the end, we had a list of engineering specifications, and performance requirements. The basic / minimum requirements become a design checklist of things to be achieved. The weighted performance requirements then became a way to manage time, resources, and priorities.

To recap, the procedure we followed was:

  1. Brainstorm ideas based on each use-case scenario (Assembly & maintenance, transport, operation),
  2. Group with Affinity Diagrams
  3. Sort in to Kano model categories
  4. Prioritise non-critical tasks with voting

🔝 Back to top.

Humanoid Robot

In 2023 I was working for the Italian Institute of Technology (IIT) on the ergoCub robot as part of the Humanoid Sensing & Perception (HSP) team. We had to showcase our human-robot interaction module at the International Conference in Robotics and Automation (ICRA) in London. We had a very tight deadline, and a lot of different features to get up and running (decision trees, control algorithms, object recognition, software interfaces etc.).


The ergoCub robot can recognise a human waving, and respond.

I started by hosting a workshop with my team to review the performance of our previous demo at the start of the year. We brainstormed a bunch of ideas around 3 questions:

  • What did we do well (competitive features),
  • What can we improve (minimum requirements), and
  • What can we do to impress people (innovations).

You can see that I didn’t frame this explicitly as a Kano model, but there was almost a direct mapping. Afterward, we used Affinity Diagrams to group all these ideas in to common categories. This enabled us to assign responsibility based on subject matter expertise.



A brainstorming session for the ergoCub human-robot interaction demo.

After, we explicity categorised each of the tasks based on the Kano model. Like before, all the minimum requirements were things that we had to do. For the performance requirements, we used a Prioritisation Matrix1 to compare them all. This revealed where we should invest most of our efforts with limited time. We used this later as part of our project planning & monitoring.

Group Item Category
Action Recognition Incorrect action recognition when holding object (box, phone) Minimum Requirement
  Additional reactions beyond wave and handshake Extra
  Ability to change actions mid-task Extra
  Idle actions when nothing is happening Extra
Administration Book trip to London Minimum Requirement
  Plan with AMI to work on the robot Minimum Requirement
Behaviour Tree Behaviour tree not responsive (clarification?) Performance Requirement
Code Code works cross-platform (iCub2, ergoCub, Gazebo simulation) Minimum Requirement
  Successful communication over network Minimum Requirement
  Modules from different packages integrate successfully Minimum Requirement
  Code takes a long time to compile (separate .cpp from .h) Performance Requirement
  Load parameters from a configuration file Performance Requirement
  Use thrift communication instead of strings over yarp port Performance Requirement
Control Robot can execute joint control Minimum Requirement
  Robot can avoid singularities Minimum Requirement
  Robot moves quickly Performance Requirement
  Robot moves smoothly, naturally Performance Requirement
  Robot can jump over obstacles Extra
Grasping Robot grasps a box without it slipping Minimum Requirement
  Robot can grasp an object from given hand transforms Performance Requirement
  Robot can grasp box from different poses Performance Requirement
  The robot can grasp different objects Performance Requirement
  Robot can more accurately shake hands Performance Requirement
  Force control on the box Performance Requirement
  Hands can follow box as it moves Extra
  Collaboratively grasp and lift a box in a short time Extra
Hardware Demonstration runs on ergoCub Minimum requirement
  Ambient lighting affecting vision Minimum Requirement
  Hardware fails so we can’t use the robot Minimum Requirement
Head / Gaze Control Reliable human focus detection Minimum Requirement
  We can send commands to move the head Minimum Requirement
  Robot looks at & follows object Performance Requirement, Extra?
  Robot follows human gaze Extra
  Robot changes focus to different things in environment Extra
  Neck moves to stabilize head while walking Unnecessary
Marketing Live demonstration executes as planned Minimum Requirement
  Presentation summarizing research development Minimum Requirement
Navigation Robot can navigate successfully in simulation Minimum Requirement
  Navigation works on real ergoCub Extra
  Robot localizes without artificial landmarks Extra
Robot Communication Robot responds to voice commands Extra
  Robot follows human on command Extra
  Robot talks to people Extra
  Robot changes facial expressions based on actions Extra
  Robot can learn on the fly (clarification for this one?) Extra

To recap, the overall procedure here was:

  1. Brainstorm ideas
  2. Group with Affinity Diagrams
  3. Categorise with Kano model
  4. Rank non-critical tasks using the Prioritisation Matrix.

🔝 Back to top.

Key Takeaway

The Kano model is a conceptual method of categorising features of a product or service. It helps reveal the priorities you need to have a successful business, and can also be a great way to prioritise tasks in a project.

Some key lessons I’ve learned over the years are:

  1. If you’re having trouble thinking of features, try thinking about what makes a bad experience.
  2. Use prioritisation methods to rank the performance requirements by level of importance.
  3. Innovative features don’t have to be realistic; think laterally, be creative, challenge norms. That’s why they’re innovative.
  4. Integrate the Kano model with other tools:
    • Brainstorming & affinity diagrams
    • Voting method
    • Prioritisation matrix, etc.
  5. Innovative features migrate to minimum requirements over time.

🔝 Back to top.

  1. This is one of the Seven Management & Planning Tools

]]>
<![CDATA[Not all features of a product or service are of equal value. The Kano model is a concept for categorising and prioritising them. Using them we can distinguish between what we need to even do business, versus what will attract customers, versus what makes a market leader. I also give examples of how I’ve applied it to some of my engineering projects. It’s a good tool for task prioritisation.]]>
Who Is Your Customer?2025-06-19T00:00:00+00:002025-06-19T00:00:00+00:00https://woolfrey.github.io/customer/six%20sigma/define/sipoc/lss/2025/06/19/who-is-your-customer<![CDATA[

Who is the customer of your business? Who receives the output of your work? Surprisingly, it’s not always who you think, and in this article I’d like to demonstrate why. The SIPOC is a fundamental tool in the Lean Six Sigma project management method. When done correctly, it can reveal important insights in to a business process. It’s an important step before establishing quality metrics and key performance indicators of your work.

🧭 Navigation

Whom Do You Serve?

Lean Six Sigma (LSS) is a project management methodology used to optimise the performance of business and engineering systems. It combines the heuristics for process optimisation from Toyota’s Lean production with statistical process control from Motorola’s Six Sigma.

An LSS project is divided in to 5 phases:

  1. Define the problem,
  2. Measure the current performance,
  3. Analyse root causes of the problem
  4. Improve the process, and
  5. Control the process.

This is abbreviated as DMAIC.

A core principle of LSS is to define the quality of a product or service with respect to the customer; not what the business itself believes. As such, one the first steps in the Define phase of a project is to:

  1. Identify who the customer of a product / service is, and
  2. Use this to define measures of quality, product specifications, key performance indicators, etc.

There is a canonical tool that we use to help articulate this. But before introducing it, I want to take you through a thinking exercise. Hopefully it will show the value in applying this tool correctly, but also the utility of using project management tools as structured thinking.

A Thought Exercise

Who is the customer for a Bachelor’s degree program at a university? The student? The student pays for the tuition fees, therefore the student is the customer, right? This is the wrong way to think about it, and I will demonstrate why.

First let’s think of the Bachelors degree program as a process. The basic steps are:

  1. Enrol students,
  2. Teach students,
  3. Assess students, then
  4. Graduate (or fail!).

Next, who enrols in to the university system?

  • Highschool graduates,
  • Mature-age people,
  • Vocational transfers (e.g. tradespeople),
  • International students,
  • Transfer students.

And what comes out?

  • A graduate.

Now we have a clearly defined Input-Process-Output.

The Supplier

The next important step is to identify where all these inputs come from:

  • High schools
  • Technical / vocational colleges
  • Bridging programs
  • Other universities

These are the suppliers. It is important to connect them directly to inputs so we can trace problems back to the origin.

The Customer

Finally, where do all these graduates go?

  • Private sector / industry,
  • Public sector, and
  • Grad schools.

These are the customers of a graduate degree program. In light of this, students are the product, not the customer. This means we should frame the key performance indicators of a Bachelor’s degree program with respect to the customer’s requirements.

If we were to begin by naively asking “What makes a good University?” at the beginning of the project, then we might answer with things like:

  • Campus facilities,
  • Cost of tuition,
  • Study spaces,
  • Food & dining,
  • Social groups,
  • etc.

But this will lead us to the wrong conclusions. It tells us nothing about the quality of the students coming out of the program.

Instead, by asking the customer, we might get responses like:

  • Subject matter expertise,
  • Communication skills,
  • Initiative & independence,
  • Teamwork,
  • etc.

Of course, the KPIs will be specific to the field of study. I’m an engineer, so I would frame in terms of mathematical ability, programming skills, ability to use software. Whereas a degree like history might emphasise knowledge, writing, and research synthesis.

In light of this, we might measure the quality of graduates through things like:

  • Employment rate,
  • Customer satisfaction surveys,
  • Starting salaries,
  • Graduate outputs like publications or patents,
  • etc.

Admittedly, there is a danger in treating scholasticism as business. A University degree may devolve in to merely producing technical competencies, rather than the development of the intellect. The former should be the purview of technical colleges, in my opinion, but I digress.

⬆️ Back to top.

SIPOC

The Supplier-Input-Process-Output (SIPOC) tool is a staple of the Define phase in a Six Sigma project. Its purpose is to:

  • Identify suppliers (as potential sources of error),
  • Provide a high-level process for project stakeholders, and
  • Identify customers.

Firstly, knowing who is supplying the inputs to a process can be an important first step in resolving quality issues in a product or process. In LSS there is the adage “rubbish in = rubbish out”. If we are receiving poor quality materials & products from our suppliers, this can cause issues within our processes.

Second, having a high-level process map can help with early identification of potential problem areas in the system.

Third, identifying the customer is integral to the success of the project. The next step in the Define phase is usually to develop the Voice of the Customer (VoC). This often involves interviews and focus groups to obtain primary evidence about what the customer actually wants, versus opinions of what we think they want.

It is also important for establishing the product or service specifications (critical to quality factors). By knowing who the customer is, we define quality with respect to their needs. These metrics are what we use in the later phases of the project:

  • Measure: Determining how close the current performance is to the customer requirements.
  • Improve: Demonstrating the new process meets or exceeds customer requirements.
  • Control: Monitoring the process with respect to customer requirements.

⬆️ Back to top.

Example #1: Train Membrane Dryers

When I was working for Sydney Trains, circa 2013, I did a sabbatical over the summer as a train technician. One thing we had to do was replace faulty membrane dryers from the trains. These were devices that removed moisture from the air before it entered all the pneumatic systems. They were failing quite frequently, and were being replaced often.

When I went back to corporate in the Autumn, I was sitting in on the Six Sigma Green Belt training course. Since I was already employed in the Six Sigma group, I ended up helping other students with their projects. One of the engineering managers was investigating why membrane dryers were failing. We developed the SIPOC, and I, having worked on the trains myself, added some subject matter expertise.

I told him the output is the defective membrane dryer, and we should define who receives it (a customer). It turns out they get put in a box in the storeroom. The supplier was never informed of the problem.

Immediately, just from working on the SIPOC, we identified a crucial broken point in the overall system. How can our suppliers fix the problem if they haven’t received a defective one to inspect? This might have fixed the problem immediately if our suppliers had been informed.

To me this highighted 2 important things:

  1. Being diligent with project management tools, because they can reveal vital information, and
  2. The necessity of formal feedback to our suppliers when their products aren’t working as intended.

⬆️ Back to top.

Example #2: Baking Lamingtons

Lamingtons are an Australian delicacy. They are a sponge cake, coated in chocolate sauce, and dipped in coconut shavings. They are best enjoyed with tea or coffee.

When I was living in Italy, I baked lamingtons for my friends & colleagues. The very first batch I ever made turned out perfectly. All the batches after were poor quality; too firm, too dry. I made this as part of the Six Sigma online course that I’m a guest lecturer for.

Some key insights from this example are:

  • I only listed inputs that are transformed by the process. All the kitchen utensils are not considered in this tool.
  • Waste products are an output. I had a lot of leftover coconut shavings. I could control my baking process by carefully measuring how much coconut I need to cover a given surface area of cake.
  • The baking paper is pure waste. It might be better to use a non-stick pan instead.

I never actually figured out what was wrong, but I suspect it was the flour with raising agent which had lost its potency. This is a good learning lesson; check the quality of the inputs (ChatGPT suggested to test the reaction of the flour to warm water).

⬆️ Back to top.

Example #3: How NOT to do a SIPOC

I asked ChatGPT to generate a SIPOC for the Bachelor’s degree program scenario above. Since the AI learns from examples on the internet, I think its amalgamated many poor habits when developing a SIPOC.

Here are what I think it’s done wrong, or poorly:

  • ❌ Listing faculty & staff as an input. These are not transformed by the process, so they should not be considered in the SIPOC. The effect that staff have on student quality should be considered in the Analyse phase of the project.
  • ❌ Not connecting suppliers to inputs,or outputs to customers. Who supplies the academic records? How can we trace it back if there are errors? Who receives “completed courses”?
  • ❌ Listing completed courses as an output. The course itself is not a product or service, only a process. Who is its customer?
  • ❌ Silos. Components are visually separated in to S, I, P, O, and C boxes. It doesn’t illustrate process flow.

What I do think was good was:

  • ✅ Listing the degree (or academic record maybe?) as a process output. Often these are required by employers as evidence of credentials.

⬆️ Back to top.

Key Takeaways

To summarise, a diligent application of the SIPOC is crucial to correctly identifying the customer of business process. This will frame how quality and key performance indicators are developed. It can also provide early insight in to potential problem areas for further investigation (or identify them immediately!).

Here are some tips for making a good SIPOC:

  • Link every input to a customer. Rubbish in = rubbish out, so its important to trace defects back to their source.
  • Match every output to a customer. It’s important to identify who is receiving them (if at all!).
  • Waste should be recorded as an output.
  • Only list things that are transformed by the process. This is integral for identifying “value-add” process steps (work that produces value, and hence revenue).
  • Try identify feedback channels to the supplier.

⬆️ Back to top.

]]>
<![CDATA[Who is the customer of your business? Who receives the output of your work? Surprisingly, it’s not always who you think, and in this article I’d like to demonstrate why. The SIPOC is a fundamental tool in the Lean Six Sigma project management method. When done correctly, it can reveal important insights in to a business process. It’s an important step before establishing quality metrics and key performance indicators of your work.]]>
Lagrangian Mechanics Is Just A Generalisation of Newtonian Mechanics2025-06-17T00:00:00+00:002025-06-17T00:00:00+00:00https://woolfrey.github.io/lagrange/lagrangian/mechanics/hamilton/2025/06/17/lagrangian-mechanics<![CDATA[

Lagrangian mechanics is a sophisticated method for deriving the equations of motion for a dynamic system. The key principle is that it minimises the difference between kinetic and potential energy, integrated across time. But why? In this article, I trace the derivation from Newton’s second principle, to Lagrange’s formulation, to Hamilton’s principle of least action. I show that Lagrangian mechanics is just a generalisation of Newton’s law, extended to multi-body systems.

📄 Download a PDF version.

🧭 Navigation

Force or Energy?

Between 1589 and 1592, Galileo Galilei supposedly dropped two objects of different masses from the Leaning Tower of Pisa to show that acceleration is independent of mass.


Galileo demonstrated that acceleration is independent of mass by dropping two different objects from the Tower of Pisa.

About 100 years later, in 1687, Sir Isaac Newton published his laws of motion in the Principia Matematica1. His second law of motion codified what Galileo had observed; that the acceleration due to gravity $\frac{d^2 x}{dt^2} = \ddot{x}$ is independent of mass. In light of Galileo’s experiment we would write the equation of motion for a falling object as:

\[m\ddot{x} = -mg ~\Longrightarrow~ \ddot{x} = -g \tag{1}\]

where:

  • $m$ is the mass (kg), and
  • $g$ is gravitational acceleration (m/s/s).

Assuming the object starts with zero velocity, we can compute its speed when it impacts the ground using integration:

\[\dot{x}_f = \int_{t_0}^{t_f} \ddot{x}~dt. \tag{2}\]

But there’s another way we could solve this problem. The potential energy of an object at any given height is:

\[\mathcal{P}(x) = mgx. \tag{3}\]


The gravitational potential energy in an object is a function of its height. This is converted to kinetic energy as it falls.

And when it hits the ground all of this potential energy is converted to kinetic energy:

\[\mathcal{K}(\dot{x}) = \frac{1}{2}m\dot{x}^2. \tag{4}\]

By equating the two we can solve:

\[\begin{align} \frac{1}{2}m\dot{x}_f^2 &= mgx_0 \tag{5a} \\ \dot{x}_f &= \sqrt{2 g x_0}. \tag{5b} \end{align}\]

So there are 2 ways to frame this problem that result in the same solution: force, or energy.

⬆️ Back to top.

Lagrangian Mechanics

Lagrange’s Generalisation

It is well established that forces in a potential field are the negative of the gradient:

\[\mathcal{P}(x) = mgx ~\Longrightarrow~ f_g = -\frac{d\mathcal{P}}{dx} = -mg. \tag{6}\]

But the dynamic forces $m\ddot{x}$ may also be expressed in terms of derivatives of kinetic energy. Specifically, we could re-write Newton’s second law as:

\[\underbrace{\frac{d}{dt}\left(\frac{d\mathcal{K}}{d\dot{x}}\right)}_{m\ddot{x}} = \underbrace{-\frac{d\mathcal{P}}{dx}\vphantom{\begin{bmatrix} a\\ b\end{bmatrix}}}_{-mg}. \tag{7}\]

Newton’s laws concern particles; individual, rigid bodies. But Lagrange’s genius was to generalise these principles to systems of rigid bodies2. Now we consider the configuration for a rigid body system denoted by $\mathbf{q}\in\mathbb{R}^n$ (e.g., a vector of joint angles for a robot), and the associated velocities $\dot{\mathbf{q}}\in\mathbb{R}^n$.

If the energy in a closed system is conserved, then it follows that an infinitesimal change in the kinetic energy must equal an infinitesimal change in potential energy:

\[\delta\mathcal{K} = \delta\mathcal{P}. \tag{8}\]

Three things to keep in mind here:

  1. We don’t consider the signs here, as you might expect from Eqn. (7); it will be resolved implicitly.
  2. Lagrange actually appealed to d’Alembert’s principle, but I think this approach is a little more straightforward.
  3. Kinetic energy is now configuration dependent: $\mathcal{K}(\mathbf{q},\dot{\mathbf{q}})$.

Taking the variation, we consider the effect of infinitesimal changes in configuration $\delta\mathbf{q}$ and velocity $\delta\dot{\mathbf{q}}$ on energy balance:

\[\delta\mathbf{q}^T\frac{\partial\mathcal{K}}{\partial\mathbf{q}} + \delta\dot{\mathbf{q}}^T\frac{\partial\mathcal{K}}{\partial\dot{\mathbf{q}}} = \delta\mathbf{q}^T\frac{\partial\mathcal{P}}{\partial\mathbf{q}}. \tag{9}\]

Then we can use integration by parts to eliminate $\delta\dot{\mathbf{q}}$:

\[\delta\dot{\mathbf{q}}^T\frac{\partial\mathcal{K}}{\partial\dot{\mathbf{q}}} = -\delta\mathbf{q}^T\frac{d}{dt}\left(\frac{\partial \mathcal{K}}{\partial\dot{\mathbf{q}}}\right). \tag{10}\]

Now putting Eqn. (10) back in to Eqn (9) we obtain:

\[\begin{align} \delta\mathbf{q}^T\left(\frac{\partial\mathcal{K}}{\partial\mathbf{q}} -\frac{d}{dt}\left(\frac{\partial \mathcal{K}}{\partial\dot{\mathbf{q}}}\right)\right) &= \delta\mathbf{q}^T\frac{\partial\mathcal{P}}{\partial\mathbf{q}} \tag{11a} \\ \frac{d}{dt}\left(\frac{\partial \mathcal{K}}{\partial\dot{\mathbf{q}}}\right) - \frac{\partial\mathcal{K}}{\partial\mathbf{q}} &= -\frac{\partial P}{\partial\mathbf{q}}. \tag{11b} \end{align}\]

Equation (11a) is d’Alemberts principle. It is the projection of a virtual displacement $\delta\mathbf{q}$ on to the forces acting on the system, which should sum to zero.3 More importantly, Eqn. (11b) is Lagrange’s equations for the dynamics of a rigid body system. Note its structural similarity to (a generalisation of) Eqn. (7).

⬆️ Back to top.

What Does It All Mean?

What is Eqn. (11b) telling us?

Firstly, Newton didn’t state his second law as “force equals mass times acceleration”, as often recited. What he wrote was:

“Lex II: Mutationem motus proportionalem esse vi motrici impressae, et fieri secundum lineam rectam qua vis illa imprimitur.” 1

or in English:

“Law II: The change of motion is proportional to the motive force impressed; and is made in the direction of the straight line in which that force is impressed.”

In modern parlance we would say that force is equal to the time derivative of momentum. For a system of rigid bodies, we would denote its generalised inertia matrix as $\mathbf{M}(\mathbf{q}) = \mathbf{M}(\mathbf{q})^T\in\mathbb{R}^{n\times n}$. Then its kinetic energy is:

\[\mathcal{K}(\mathbf{q},\dot{\mathbf{q}}) = \frac{1}{2}\dot{\mathbf{q}}^T\mathbf{M}(\mathbf{q})\dot{\mathbf{q}} \tag{12}\]

and the momentum:

\[\mathbf{p} = \mathbf{M}(\mathbf{q})\dot{\mathbf{q}} = \frac{\partial\mathcal{K}}{\partial\dot{\mathbf{q}}}. \tag{13}\]

So:

  • $\frac{d}{dt}\left(\frac{\partial\mathcal{K}}{\partial\dot{\mathbf{q}}}\right) = \frac{d\mathbf{p}}{dt}$ is the change in momentum,
  • $\frac{\partial\mathcal{K}}{\partial\mathbf{q}}$ are internal forces from a configuration change, and
  • These are both induced by the potential field through $-\frac{\partial\mathcal{P}}{\partial\mathbf{q}}$.

⬆️ Back to top.

The Principle of Least Action

There was a different thread running through history at the same time. In 1744 Pierre-Louis Moreau de Maupertuis philosophised that:

“…in all changes that happen in nature, the amount of action is as small as possible.” 4

He would later denote this as the integral of momentum over distance, which is twice the kinetic energy across time:

\[A = \int m\dot{x} ~dx = \int m\dot{x}^2~dt. \tag{14}\]

This didn’t pan out, as evinced by history. Sir William Rowan Hamilton would propose its canonical form still used in classical mechanics today5. He observed that Lagrange’s equations of motion, Eqn. (11b), may be first be written as the functional:

\[\mathcal{L}(\mathbf{q},\dot{\mathbf{q}}) = \mathcal{K}(\mathbf{q},\dot{\mathbf{q}}) - \mathcal{P}(\mathbf{q}) ~:~ \mathbb{R}^{n}\times\mathbb{R}^n\mapsto\mathbb{R} \tag{15}\]

This is, unsurprisingly, referred to as the Lagrangian. Then, via the calculus of variations we obtain the (surprise!) Euler-Lagrange equation:

\[\frac{d}{dt}\left(\frac{\partial\mathcal{L}}{\partial\mathbf{\dot{q}}}\right) - \frac{\partial\mathcal{L}}{\partial\mathbf{q}} = \mathbf{0}. \tag{16}\]

This is equivalent to (11b). Reverse-engineering this, the action is defined as:

\[A = \int \underbrace{\mathcal{K}(\mathbf{q},\dot{\mathbf{q}}) - \mathcal{P}(\mathbf{q})\vphantom{\begin{matrix} a \\ b \end{matrix}}}_{\mathcal{L}(\mathbf{q},\dot{\mathbf{q}})}~dt \tag{17}\]

which has the SI units of joule-seconds. It follows that, for a conservative system, the equations of motion are an extremum of the action:

\[\delta A = \int \delta\mathcal{L}~dt = 0 ~\Longrightarrow~\delta L = 0 \tag{18}\]

whose solution is (16). The second variation, with respect to $\dot{\mathbf{q}}$, is:

\[\frac{\partial^2\mathcal{L}}{\partial\dot{\mathbf{q}}^2} = \mathbf{M}(\mathbf{q}) \succ 0. \tag{19}\]

The inertia matrix is positive definite, such that kinetic energy is always positive.6 Hence Eqn. (16), (11b) is a minimum of action.

Newton’s law is about the instantaneous balance of forces. Equation (17) is metric across time. That is, the trajectory that a system of rigid bodies will follow through a potential field will minimise the difference between all the forces acting on it.

  1. Newton, I. (1687). Philosophiæ Naturalis Principia Mathematica. Royal Society, London. First edition.  2

  2. Lagrange, J.-L. (1788). Mécanique analytique. Imprimerie de la République, Paris. Available online at various archives. 

  3. Virtual displacements do no net work, since they’re not real. Obviously. 

  4. Maupertuis, P. L. M. (1744). Accord de différentes loix de la nature qui avoient jusqu’ici paru incompatibles. Mémoir de l’Académie Royale des Sciences de Paris, pages 417–426. 

  5. Hamilton, W. R. (1835). Second essay on a general method in dynamics. Philosophical Transactions off the Royal Society of London, 125:95–144. 

  6. A matrix $\mathbf{A} = \mathbf{A}^T \in\mathbb{R}^n$ is positive definite if for $\mathbf{x}\in\mathbb{R}^n \ne \mathbf{0}$ then $\mathbf{x}^T\mathbf{A}\mathbf{x} > 0$. 

]]>
<![CDATA[Lagrangian mechanics is a sophisticated method for deriving the equations of motion for a dynamic system. The key principle is that it minimises the difference between kinetic and potential energy, integrated across time. But why? In this article, I trace the derivation from Newton’s second principle, to Lagrange’s formulation, to Hamilton’s principle of least action. I show that Lagrangian mechanics is just a generalisation of Newton’s law, extended to multi-body systems.]]>
Quaternions for Dummies2025-06-15T00:00:00+00:002025-06-15T00:00:00+00:00https://woolfrey.github.io/orientation/robot/quaternion/hamilton/2025/06/15/quaternions-for-dummies<![CDATA[

Quaternions are sophisticated mathematical objects that are used to represent orientation in 3D for robotics, animation, and aerospace. In this article I trace a logical sequence from using complex numbers as rotations toward the derivation of the quaternion itself. I then derive the Lie group properties for combining and inverting quaternions. Lastly, I show how they can be used to rotate vectors, and some of their advantages over rotation matrices.

📄 Download a PDF version.

🧭 Navigation

Complex Numbers as Rotations

Euler’s formula states that:

\[e^{i\psi} = \cos(\psi) + i\cdot\sin(\psi) \in\mathbb{C} ~,~ i = \sqrt{-1}. \tag{1}\]

We can think of this as a rotation in to the complex plane (Fig. 1). When we multiply powers together, we add the exponent. This equates to adding rotations together (Fig. 1):

\[e^{i\psi}\cdot e^{i\phi} = e^{i(\psi + \phi)} = \cos(\psi + \phi) + i\cdot\sin(\psi + \phi). \tag{2}\]


Figure 1: A complex number represents a rotation in to the complex plane. Multiplying complex numbers is equivalent to adding rotations.

If we took a complex number:

\[\mathrm{z} = \mathrm{x} + i\cdot\mathrm{y}\in\mathbb{C} \tag{3}\]

and multiplied it by Eqn. (1) then we would get:

\[\begin{align} e^{i\psi}\cdot \mathrm{z} &= \left(\cos(\psi) + i\cdot\sin(\psi)\right)\left(\mathrm{x} +i\cdot \mathrm{y}\right) \tag{4a} \\ &= \mathrm{x}\cdot\cos(\psi) - \mathrm{y}\cdot\sin(\psi) + i\left(\mathrm{x}\cdot\sin(\psi) - \mathrm{y}\cdot\cos(\psi)\right) \tag{4b} \end{align}\]

But we could also represent Eqn. (3) as a vector:

\[\mathbf{v} = \begin{bmatrix} \mathrm{x} \\ \mathrm{y} \end{bmatrix} \begin{matrix} \leftarrow \text{Real part}\phantom{abcd} \\ \leftarrow \text{Complex part} \tag{5} \end{matrix}\]

In the same manner, we could write Eqn. (4) as:

\[\begin{bmatrix} \mathrm{x}\cdot\cos(\psi) - \mathrm{y}\cdot\sin(\psi) \\ \mathrm{x}\cdot\sin(\psi) + \mathrm{y}\cdot\cos(\psi) \end{bmatrix} = \underbrace{ \begin{bmatrix} \cos(\psi) & -\sin(\psi) \\ \sin(\psi) & \phantom{-}\cos(\psi) \end{bmatrix} }_{\mathbf{R}} \underbrace{ \begin{bmatrix} \mathrm{x} \\ \mathrm{y} \end{bmatrix} }_{\mathbf{v}}. \tag{6}\]

This matrix $\mathbf{R}$ is in fact a 2D rotation matrix. It belongs to the Special Orthogonal group:

\[\mathbb{SO}(n) = \left\{ \mathbf{R}\in\mathbb{R}^{n\times n} ~\big|~ \mathbf{RR}^T = \mathbf{I}~,~ det(\mathbf{R}) = 1 \right\}. \tag{7}\]

Multiplying a complex number by Euler’s equation is equivalent to rotating a 2D vector with a 2D rotation matrix. But this isn’t the only connection between complex numbers and 2D rotations. An eigenvector $\mathbf{v}$ of $\mathbf{R}\in\mathbb{SO}(2)$ satisfies the identity:

\[\mathbf{Rv} = \lambda\mathbf{v} \tag{8}\]

where $\lambda$ is the corresponding eigenvalue. We can find the eigenvalue(s) of a 2D matrix using the shortcut:

\[\begin{align} \lambda^2 - trace(\mathbf{R}) \lambda + det(\mathbf{R}) &= 0 \tag{9a} \\ \lambda^2 -2\cos(\psi) + 1&= 0 \tag{9b} \end{align}\]

where

  • $trace(\cdot)$ is the sum of diagonal elements, and
  • $det(\cdot)$ is the determinant.

We can then solve Eqn. (9) with the quadratic formula and some trigonometric identities:

\[\begin{align} \lambda = \cos(\psi) &\pm \sqrt{\cos^2(\psi)-1 } \tag{10a}\\ \cos(\psi) &\pm \sqrt{-\sin^2(\psi)} \tag{10b} \\ \cos(\psi) &\pm i\cdot\sin(\psi) \in\mathbb{C}. \tag{10c} \end{align}\]

The eigenvalue of $\mathbb{SO}(2)$ is a complex number. Is this surprising? Take a look at Eqn. (4), (6) and (8) again:

\[e^{i\psi}\cdot \mathrm{z} = \lambda\mathbf{v} = \mathbf{R}\mathbf{v}. \tag{11}\]

⬆️ Back to top.

Complex Numbers in Higher Dimensions?

Now you may be thinking: if 1 complex element gives rotation in 2D, then 2 complex elements are needed for rotation in 3D. Let’s declare an “extended” complex number where $j = \sqrt{-1}$:

\[\mathrm{x} + i\cdot \mathrm{y} + j \cdot \mathrm{z} \in\mathbb{C}^2. \tag{12}\]

What happens when we multiply two of them together?

\[\begin{align} \left(\mathrm{x} + i\cdot \mathrm{y} + j\cdot \mathrm{z}\right)\left(\mathrm{x} + i\cdot \mathrm{y} + j\cdot \mathrm{z}\right) &= \underbrace{\mathrm{x}^2 - \mathrm{y}^2 - \mathrm{z}^2}_{\text{Real}} + \underbrace{i\cdot 2\mathrm{xy} + j\cdot 2\mathrm{xz}}_{\text{Complex}} + \underbrace{\left(ij + ji\right)\cdot \mathrm{yz} }_{\text{???}} \notin\mathbb{C}^2 \tag{13} \end{align}\]

What is $ij$ and $ji$? The mathematical object on the right is different from the object on the left. The problem is that Eqn. (12) is not a Lie group.

Lie groups are mathematical objects that satisfy 4 properties:

  1. Closure: Combining 2 elements within the group produces another element within the group.
  2. Associativity: The manner in which we cluster a series of closure operations doesn’t matter, as long as the sequence remains the same.
  3. Identity: The element that results in no change.
  4. Inverse: The element that leads to the identity.

Complex numbers form a Lie group. This why we could rotate another complex number using Eqn. (4). We multiply 2 complex numbers, and get a 3rd. Equation (13) violates the closure property. To represent rotations in 3D, we need a Lie group so that we can use the closure property to combine them.

  Group Properties of $\mathbb{C}$ (Over Multiplication)
Closure: $\mathrm{z}_1,\mathrm{z}_2\in\mathbb{C}~:~ \mathrm{z}_1 \mathrm{z}_2 \in\mathbb{C}$
Associativity: $\left(\mathrm{z}_1 \mathrm{z}_2\right) \mathrm{z}_3 = \mathrm{z}_1 \left(\mathrm{z}_2 \mathrm{z}_3\right)$
Identity: $1 \equiv1+i\cdot0\subset\mathbb{C}: 1\mathrm{z} = \mathrm{z}$
Inverse: $\mathrm{z}^{-1} = \frac{\bar{\mathrm{z}}}{\mathrm{z}\bar{\mathrm{z}}} ~:~ \mathrm{z}^{-1}\mathrm{z} = 1 + i\cdot 0$

⬆️ Back to top.

Hamilton’s Epiphany

Sir William Rowan Hamilton proposed the now famous quaternion:

\[\boldsymbol{q} = \mathrm{w} + i\cdot \mathrm{x} + j\cdot \mathrm{y} + k\cdot \mathrm{z} \in\mathbb{H} \tag{14}\]

where $i^2 = j^2 = k^2 = \sqrt{-1}$. By multiplying 2 of them together with standard rules for arithmetic we obtain:

\[\begin{align} \boldsymbol{q}_1\cdot\boldsymbol{q}_2 &= (\mathrm{w}_1\mathrm{w}_2 - \mathrm{x}_1\mathrm{x}_2 - \mathrm{y}_1\mathrm{y}_2 -\mathrm{z}_1\mathrm{z}_2) \nonumber \\ &+ i\cdot(\mathrm{w}_1 \mathrm{x}_2 + \mathrm{x}_1 \mathrm{w}_2) + j\cdot(\mathrm{w}_1 \mathrm{y}_2 + \mathrm{y}_1 \mathrm{w}_2) + k\cdot(\mathrm{w}_1 \mathrm{z}_2 + \mathrm{z}_1 \mathrm{w}_2) \nonumber \\ &+ ij\cdot\mathrm{x}_1\mathrm{y}_2 + ji\cdot\mathrm{y}_1\mathrm{x}_2 + jk\cdot\mathrm{y}_1\mathrm{z}_2 + kj\cdot\mathrm{z}_1\mathrm{y}_2 + ki\cdot\mathrm{z}_1\mathrm{x}_2 + ik\cdot\mathrm{x}_1\mathrm{z}_2. \tag{15} \end{align}\]

On October 16th, 1843, he had an epiphany about how to resolve the closure property. His insight was to say that $ijk = -1$. He inscribed this now famous identity on to the Brougham Bridge in Dublin (Fig. 2).


Figure 2: A plaque on Brougham (Broom) Bridge commemorating Hamilton's invention.
(JP, William Rowan Hamilton Plaque, CC BY-SA 2.0)

  Quaternion Multiplication Rules
$\times$ $\phantom{-}i$ $\phantom{-}j$ $\phantom{-}k$
$i$ $-1$ $\phantom{-}k$ $-j$
$j$ $-k$ $-1$ $\phantom{-}i$
$k$ $\phantom{-}j$ $-i$ $-1$

The key is that quaternions obey their own rules for multiplication. Specifically, we resolve $ij = k$, $ji = -k$, etc. That way $ijk = k^2 = -1$. We may now complete Eqn. (15):

\[\begin{align} \boldsymbol{q}_1\cdot\boldsymbol{q}_2 &= \phantom{h\cdot}(\mathrm{w}_1\mathrm{w}_2 - \mathrm{x}_1\mathrm{x}_2 - \mathrm{y}_1\mathrm{y}_2 -\mathrm{z}_1\mathrm{z}_2) \nonumber \\ &+ i\cdot(\mathrm{w}_1 \mathrm{x}_2 + \mathrm{x}_1 \mathrm{w}_2 + \mathrm{y}_1\mathrm{z}_2 - \mathrm{z}_1\mathrm{y}_2) \nonumber\\ &+ j\cdot(\mathrm{w}_1 \mathrm{y}_2 + \mathrm{y}_1 \mathrm{w}_2 + \mathrm{z}_1\mathrm{x}_2 - \mathrm{x}_1\mathrm{z}_2 ) \nonumber \\ &+ k\cdot(\mathrm{w}_1 \mathrm{z}_2 + \mathrm{z}_1 \mathrm{w}_2 + \mathrm{x}_1\mathrm{y}_2 - \mathrm{y}_1\mathrm{x}_2) \in\mathbb{H} \tag{16} \end{align}\]

which satisfies the closure property for a Lie group.

If the exponential of a purely imaginary complex number represents a rotation, Eqn. (1), what about a purely complex quaternion?

\[\boldsymbol{p} = i\cdot\mathrm{x} + j\cdot\mathrm{y} + k\cdot\mathrm{z} \in\mathbb{H}. \tag{17}\]

When exponentiating Eqn. (17) we obtain:

\[e^{\boldsymbol{p}} = \sum_{n=0}^{\infty} \frac{(\|\boldsymbol{p}\|\cdot\hat{\boldsymbol{p}})^n} {n!} \tag{18}\]

where $\hat{\boldsymbol{p}} = \frac{\boldsymbol{p}}{|\boldsymbol{p}|}$ such that $\hat{\boldsymbol{p}}^2 = -1$. We can split this in to even and odd terms and simplify them a little:

\[\begin{align} (\|\boldsymbol{p}\|\cdot\hat{\boldsymbol{p}})^{2n\phantom{+1}} &= (-1)^n \cdot\|\boldsymbol{p}\|^{2n} \tag{19a} \\ (\|\boldsymbol{p}\|\cdot\hat{\boldsymbol{p}})^{2n+1} &= (-1)^n \cdot\|\boldsymbol{p}\|^{2n+1} \cdot \hat{\boldsymbol{p}}. \tag{19b} \end{align}\]

By substituting Eqn. (19a) & (19b) in to (18) we arrive at:

\[e^{\boldsymbol{p}} = \underbrace{\sum_{n=0}^{\infty} \frac{(-1)^n \cdot\|\boldsymbol{p}\|^{2n}} {(2n)!}}_{\cos(\|\boldsymbol{p}\|)} + \underbrace{\sum_{n=0}^{\infty} \frac{(-1)^n \cdot\|\boldsymbol{p}\|^{2n+1}} {(2n+1)!}}_{_{\sin(\|\boldsymbol{p}\|)}}\cdot\hat{\boldsymbol{p}} \in\mathbb{H} \tag{20}\]

which is itself a quaternion. In this context,

  • $|\boldsymbol{p}|$ is equivalent to the magnitude of rotation, and
  • $\hat{\boldsymbol{p}}$ is the axis of rotation.

This is exactly what Euler’s rotation theorem states: any 3D rotation may be parameterised by an angle of rotation about a fixed axis. Thus, we can use quaternions to represent rotation. But not just any quaternion; it must be the exponential of a purely imaginary quaternion.

⬆️ Back to top.

Euler-Rodrigues Parameters & The Versor

I am now going to switch notation, and from (14) I am going to define:

\[\eta = \mathrm{w} ~,~ \boldsymbol{\varepsilon} = \begin{bmatrix} \mathrm{x} \\ \mathrm{y} \\ \mathrm{z} \end{bmatrix} ~\longrightarrow~ \boldsymbol{q} = \begin{bmatrix} \eta \\ \boldsymbol{\varepsilon} \end{bmatrix}. \tag{21}\]

From careful inspection of Eqn. (16) we can now re-write the product of 2 quaternions using 2 familiar vector operations; the dot product1, and cross product:

\[\boldsymbol{q}_1 \cdot \boldsymbol{q}_2 = \begin{bmatrix} \eta_1\eta_2 - \boldsymbol{\varepsilon}_1^T\boldsymbol{\varepsilon}_2 \\ \eta_1 \boldsymbol{\varepsilon}_2 + \eta_2\boldsymbol{\varepsilon}_1 + \boldsymbol{\varepsilon}_1\times\boldsymbol{\varepsilon}_2 \end{bmatrix} \in\mathbb{H}. \tag{22}\]

To re-iterate, this is the closure property of $\mathbb{H}$. In fact, if the product of any 2 quaternions is another quaternion, then the associativity property follows:

\[\left(\boldsymbol{q}_1 \cdot \boldsymbol{q}_2\right) \cdot \boldsymbol{q}_3 = \boldsymbol{q}_1 \cdot \left(\boldsymbol{q}_2 \cdot \boldsymbol{q}_3\right). \tag{23}\]

Be careful though; since $\boldsymbol{\varepsilon}_1\times\boldsymbol{\varepsilon}_2 \ne \boldsymbol{\varepsilon}_2\times\boldsymbol{\varepsilon}_2$ it is also the case that $\boldsymbol{q}_1\cdot\boldsymbol{q}_2 \ne \boldsymbol{q}_2\cdot\boldsymbol{q}_1$.

The identity element of a quaternion is the same as $\mathbb{C}$, a unit real part and zero complex part:

\[\boldsymbol{\iota} = \begin{bmatrix} 1 \\ \mathbf{0} \end{bmatrix} \in\mathbb{H}~\Longrightarrow ~ \boldsymbol{q}\cdot\boldsymbol{\iota} = \boldsymbol{q}. \tag{24}\]

Now, for a complex number we obtain the conjugate by negating the complex component. The product of a complex number and its conjugate gives a purely real number:

\[\mathrm{z} = \mathrm{x} + i\cdot\mathrm{y}~,~\bar{\mathrm{z}} = \mathrm{x} - i\cdot\mathrm{y} \in\mathbb{C} ~\Longrightarrow~ \mathrm{z\bar{z}} = \mathrm{x}^2 + \mathrm{y}^2 \in\mathbb{R}. \tag{25}\]

The same is true of quaternions. We form the conjugate by negating the complex component. And when we multiply a quaternion with its conjugate we end up with a purely real number:

\[\bar{\boldsymbol{q}} = \begin{bmatrix} \phantom{-}\eta \\ -\boldsymbol{\varepsilon} \end{bmatrix} ~\Longrightarrow~ \boldsymbol{q}\cdot\bar{\boldsymbol{q}} = \begin{bmatrix} \eta^2 + \boldsymbol{\varepsilon}^T\boldsymbol{\varepsilon} \\ \mathbf{0} \end{bmatrix}. \tag{26}\]

Can you see it? Eqn. (26) leads to the identity Eqn. (24) if, and only if:

\[\underbrace{\eta^2 + \boldsymbol{\varepsilon}^T\boldsymbol{\varepsilon}}_{\mathrm{w^2 + x^2 + y^2 + z^2}} = 1. \tag{27}\]

This condition is known as the Euler-Rodrigues parameters. We already have a solution using the exponential quaternion Eqn. (20):

\[\boldsymbol{v} =e^{\tfrac{1}{2}\mathbf{a}} = \underbrace{\cos\left(\tfrac{1}{2}\alpha\right)}_{\eta} + \underbrace{\sin\left(\tfrac{1}{2}\alpha\right)\hat{\mathbf{a}}}_{\boldsymbol{\varepsilon}} \in \mathbb{S}^3\subset \mathbb{H} \tag{28}\]

where $\mathbf{a} = \alpha\cdot\hat{\mathbf{a}}$ (the angle-axis parameterisation). The reason for the half angle will be apparent later. A quaternion of unit norm is called a versor. Equation (27) implies that the versor is a point on the surface of a 4D sphere, hence $\mathbb{S}^3$ (4D volume, 3D surface).

So for a versor, the conjugate is the inverse element since:

\[\boldsymbol{v}\cdot\bar{\boldsymbol{v}} = \boldsymbol{\iota}. \tag{29}\]

We have completed the Lie algebra; but not for quaternions $\mathbb{H}$ per se, but for versors $\mathbb{S}^3\subset\mathbb{H}$.

  Group Properties for $\mathbb{S}^3\subset\mathbb{H}$
Closure: $\boldsymbol{v}_1,\boldsymbol{v}_2\in\mathbb{S}^3~:~\boldsymbol{v}_1\cdot\boldsymbol{v}_2\in\mathbb{S}^3$
Associativity: $\left(\boldsymbol{v}_1\cdot\boldsymbol{v}_2\right)\cdot\boldsymbol{v}_3 = \boldsymbol{v}_1\cdot\left(\boldsymbol{v}_2\cdot\boldsymbol{v}_3\right)$
Identity: $\boldsymbol{\iota} = \begin{bmatrix} 1 & \mathbf{0} \end{bmatrix}^T\in\mathbb{S}^3 ~:~ \boldsymbol{v}\cdot\boldsymbol{\iota} = \boldsymbol{v}$
Inverse: $\bar{\boldsymbol{v}} = \begin{bmatrix} \eta & -\boldsymbol{\varepsilon}^T\end{bmatrix}^T~:~ \boldsymbol{v}\cdot\bar{\boldsymbol{v}} = \boldsymbol{\iota}$

⬆️ Back to top.

Rotating Vectors

To rotate a vector $\mathbf{v}\in\mathbb{R}^3$ we:

  1. Treat it as a pure quaternion, and
  2. Couch it between a versor $\boldsymbol{v}\in\mathbb{S}^3$ and its conjugate $\bar{\boldsymbol{v}}$.

The result is:

\[\begin{bmatrix} 0 \\ \mathbf{u} \end{bmatrix} = \overbrace{ \begin{bmatrix} \eta \\ \boldsymbol{\varepsilon} \end{bmatrix} }^{\boldsymbol{v}} \cdot \begin{bmatrix} 0 \\ \mathbf{v} \end{bmatrix} \cdot \overbrace{ \begin{bmatrix} \phantom{-}\eta \\ -\boldsymbol{\varepsilon} \end{bmatrix} }^{\bar{\boldsymbol{v}}} = \begin{bmatrix} 0 \\ \mathbf{R}(\eta,\boldsymbol{\varepsilon})\mathbf{v} \end{bmatrix}. \tag{30}\]

First, we need the half-angle in Eqn. (28) so that, when we apply this left-side and right-side product, we end up with zero in the real part of the result. Without it, we wouldn’t have a pure quaternion (try it!).

Second, any rotation of a vector $\mathbf{v}\in\mathbb{R}^n \to \mathbf{u}\in\mathbb{R}^n$ that preserves its length is equivalent to applying a rotation matrix $\mathbf{R}\in\mathbb{SO}(n)$. If we were to expand Eqn. (30) we would find:

\[\mathbf{R}(\eta,\boldsymbol{\varepsilon}) = \begin{bmatrix} 1 - 2(\varepsilon_2^2 + \varepsilon_3^2) & 2(\varepsilon_1 \varepsilon_2 - \eta \varepsilon_3) & 2(\varepsilon_1 \varepsilon_3 + \eta \varepsilon_2) \\ 2(\varepsilon_1 \varepsilon_2 + \eta \varepsilon_3) & 1 - 2(\varepsilon_1^2 + \varepsilon_3^2) & 2(\varepsilon_2 \varepsilon_3 - \eta \varepsilon_1) \\ 2(\varepsilon_1 \varepsilon_3 - \eta \varepsilon_2) & 2(\varepsilon_2 \varepsilon_3 + \eta \varepsilon_1) & 1 - 2(\varepsilon_1^2 + \varepsilon_2^2) \end{bmatrix}\in\mathbb{SO}(3) \tag{31}\]

Now we have a short-hand for constructing a rotation matrix from a versor. This is more efficient because we can skip all the calculations that cancel to zero.

NOTE: $\boldsymbol{v}$ and $-\boldsymbol{v}$ represent the same orientation. This is because $\mathbf{u} = (-\boldsymbol{v})\cdot\mathbf{v}\cdot(-\bar{\boldsymbol{v}}) = \boldsymbol{v}\cdot\mathbf{v}\cdot\bar{\boldsymbol{v}}$. You can think of it like this: facing South and walking backwards is equivalent to facing North and walking forwards.

⬆️ Back to top.

Advantages of Quaternions

Quaternions are used in animation, robotics, and aerospace. They require fewer floating point operations (FLOPs) when propagating rotations versus rotation matrices. However, they are more costly when rotating vectors. This can be reduced from 56 flops to 39 flops by forming a rotation matrix first, Eqn. (31), then performing the rotation.

Quaternions are also much more efficient for storing and transmitting data. They only require 4 parameters, versus 9 for rotation matrices. This is important when we have limited bandwidth, and limited storage space.

They are also numerically stable. Successive rotations will lead to an accumulation of floating point error. We can easily re-normalise a versor to preserve Eqn. (27).

    $\mathbb{SO}(3)$ $\mathbb{S}^3\subset\mathbb{H}$
  Parameters 9 4
Closure Multiplications 27 16
  Additions 18 12
  Total FLOPs 45 28
Vector Rotation Multiplications 9 32 (23)
  Additions 6 24 (16)
  Total FLOPs 15 56 (39)

⬆️ Back to top.

  1. For two vectors $\mathbf{a},\mathbf{b}\in\mathbb{R}^n$ the dot product $\mathbf{a}\bullet\mathbf{b} = \mathbf{a}^T\mathbf{b}$. 

]]>
<![CDATA[Quaternions are sophisticated mathematical objects that are used to represent orientation in 3D for robotics, animation, and aerospace. In this article I trace a logical sequence from using complex numbers as rotations toward the derivation of the quaternion itself. I then derive the Lie group properties for combining and inverting quaternions. Lastly, I show how they can be used to rotate vectors, and some of their advantages over rotation matrices.]]>
Orientation Control With Angle-Axis Representation2025-06-11T00:00:00+00:002025-06-11T00:00:00+00:00https://woolfrey.github.io/feedback/control/robot/orientation/2025/06/11/orientation-control-with-angle-axis-representation<![CDATA[

In this article I provide some basic definitions and proofs of identities for rotation matrices $\mathbf{R}\in\mathbb{SO}(3)$. I show that a rotation matrix can be represented as a matrix exponential. From this, Rodrigues’ formula follows which expresses the matrix in terms of the angle and axis of rotation. I then show how to reverse this formula to obtain the angle and axis from an arbitrary rotation matrix. Then using the exponential form, and the angle-axis, I derive a control law for the angular velocity to perform feedback control on orientation error.

📄 Download a PDF version.

🧭 Navigation

Euler’s Rotation Theorem

Euler’s rotation theorem states that any change in orientation of a rigid body can be described by:

  • A single rotation $\alpha$ (rad),
  • About an axis $\hat{\mathbf{a}}\in\mathbb{R}^3$ where $\hat{\mathbf{a}}$ is a unit vector such that $|\hat{\mathbf{a}}|^2 = \hat{\mathbf{a}}^T\hat{\mathbf{a}} = 1$.

The 3 combined rotations in the illustration below can be reduced to a single rotation about a single axis:


Any number of combined rotations can be expressed as a single rotation about a single axis.

Any transformation of a vector $\mathbf{v}\in\mathbb{R}^n\to\mathbf{u}\in\mathbb{R}^n$ that preserves its length can be expressed with a product involving a rotation matrix:

\[\mathbf{u} = \mathbf{Rv}. \tag{1}\]

This matrix belongs to the Special Orthogonal group:

\[\mathbb{SO}(n) = \Big\{\mathbf{R}\in\mathbb{R}^{n\times n} ~\Big|~ \mathbf{RR}^T = \mathbf{I}~,~ det(\mathbf{R}) = 1\Big\} \tag{2}\]

Given an arbitrary rotation matrix $\mathbf{R}\in\mathbb{SO}(3)$ we may be interested in finding the angle and axis of rotation. To do this, we need to define some other properties of $\mathbb{SO}(3)$ that we can exploit.

⬆️ Back to top.

Time Derivative & Exponential

If we take the time derivative of Eqn. (1), and assuming $\dot{\mathbf{v}} = \mathbf{0}$, then we arrive at:

\[\dot{\mathbf{u}} = \dot{\mathbf{R}}\mathbf{v}. \tag{3}\]

But in 3D, the time derivative of a vector is given by the cross product with the instantaneous angular velocity $\boldsymbol{\omega}\in\mathbb{R}^3$ (rad/s):

\[\mathbf{\dot{u}} = \boldsymbol{\omega}\times\mathbf{u} = S(\boldsymbol{\omega})\mathbf{u} \tag{4}\]

where $S(\cdot)$ is the skew-symmetric matrix operator:

\[S(\boldsymbol{\omega}) = \begin{bmatrix} \phantom{-}0 & -\omega_z & \phantom{-}\omega_y \\ \phantom{-}\omega_y & \phantom{-}0 & -\omega_x \\ -\omega_y & \phantom{-}\omega_x & \phantom{-}0 \end{bmatrix} \in\mathfrak{so}(3). \tag{5}\]

This is also the Lie algebra of $\mathbb{SO}(3)$. By equating Eqn. (3) with Eqn. (4), and substituting in Eqn. (1) we can see that the time derivative of the rotation matrix is “proportional” to itself:

\[\mathbf{\dot{R}} = S(\boldsymbol{\omega})\mathbf{R} ~\Longrightarrow \mathbf{R}(\mathrm{t}) = e^{S(\boldsymbol{\omega})\mathrm{t}}\mathbf{R}(0) \tag{6}\]

This is a first-order differential equation whose solution is a (matrix) exponential. But the integral of the angular velocity is simply the angle-axis vector at any given point in time:

\(\int_0^t \boldsymbol{\omega}~dt = \boldsymbol{\omega}t + const. = \alpha\cdot\hat{\mathbf{a}} = \mathbf{a}. \tag{7}\) (where $const. = 0$).

Assuming we start from zero rotation $(\mathbf{R}(0) = \mathbf{I})$, then the rotation matrix is equivalent to a matrix exponential containing the angle-axis:

\[\mathbf{R} = e^{S(\mathbf{a})}\in\mathbb{SO}(3). \tag{8}\]

From the definition of the exponential:

\[e^{S(\mathbf{a})} = \sum_{k=0}^\infty \frac{\alpha^{k}}{\mathrm{k!}}S(\hat{\mathbf{a}})^{k} \tag{9}\]

we can reduce Eqn. (8) to Rodrigues’ formula which features the angle and axis as separate parameters:

\[\mathbf{R}(\alpha,\hat{\mathbf{a}}) = \mathbf{I} + \sin(\alpha)S(\hat{\mathbf{a}}) + (1-\cos(\alpha))S(\hat{\mathbf{a}})^2. \tag{10}\]

⬆️ Back to top.

Angle & Axis from Rotation Matrix

Rodrigues’ formula, Eqn. (10), contains 3 matrices with a particular structure to their respective diagonal elements. If we take the trace (sum of diagonal elements) we can see that:

  • $trace(\mathbf{I}) = 3$,
  • $trace\left(S(\hat{\mathbf{a}})\right) = 0$, and
  • $trace\left(S(\hat{\mathbf{a}})\right)^2 = -2$ since $|\hat{\mathbf{a}}| = 1.$

Hence the trace of a rotation matrix must be:

\[\begin{align} trace(\mathbf{R}) &= 3 - 2\cdot(1 - \cos(\alpha)) \tag{11a} \\ &= 1 + 2\cdot\cos(\alpha). \tag{11b} \end{align}\]

We can re-arrange this to solve for the angle of rotation:

\[\alpha = \cos^{-1}\left(\frac{trace(\mathbf{R}) - 1}{2}\right). \tag{12}\]

If the angle of rotation is zero $\alpha = 0$, then the axis of rotation is arbitrary since $0\cdot\hat{\mathbf{a}} = \mathbf{0}$.

The axis for a rotation matrix does not change $\mathbf{R}\hat{\mathbf{a}} = \hat{\mathbf{a}}$. This implies that it is an eigenvector whose corresponding eigenvalue $\lambda = 1$.1 For any arbitrary eigenvector of $\mathbf{R}$ it must hold that:

\[\mathbf{Rv = v}. \tag{13}\]

Multiplying this by the transpose of the rotation yields:

\[\begin{align} \overbrace{\mathbf{R}^T\mathbf{R}}^{\mathbf{I}}\mathbf{v} &= \mathbf{R}^T\mathbf{v} \tag{14a}\\ \mathbf{v} &= \mathbf{R}^T\mathbf{v}. \tag{14b} \end{align}\]

Equating Eqn. (13) and Eqn. (14b) we obtain:

\[\begin{align} \mathbf{Rv} &= \mathbf{R}^T\mathbf{v} \tag{15a} \\ \underbrace{\left(\mathbf{R} - \mathbf{R}^T\right)}_{S(\mathbf{v})}\mathbf{v} &= \mathbf{0}. \tag{15b} \end{align}\]

The matrix $\mathbf{R} - \mathbf{R}^T $ must be skew-symmetric since $\mathbf{v}\times\mathbf{v} = S(\mathbf{v})\mathbf{v} = \mathbf{0}$. Expanding this we have:

\[\mathbf{R} - \mathbf{R}^T = \begin{bmatrix} 0 & r_{12} - r_{21} & r_{13} - r_{31} \\ r_{21} - r_{12} & 0 &r_{23} - r_{32} \\ r_{31} - r_{13} & r_{32} - r_{23} & 0 \end{bmatrix}. \tag{16}\]

Using what we know about the structure of skew-symmetric matrices, Eqn. (5), we can deduce that the eigenvector is:

\[\mathbf{v} = \begin{bmatrix} r_{32} - r_{23} \\ r_{13} - r_{31} \\ r_{21} - r_{12} \end{bmatrix}. \tag{17}\]

We can then normalise this vector to obtain the axis of rotation $\hat{\mathbf{a}}$:

\[\hat{\mathbf{a}} = \begin{cases} \frac{\mathbf{v}}{\|\mathbf{v}\|} & \text{if } \alpha \ne 0 \\ \text{trivial} & \text{otherwise.} \end{cases} \tag{18}\]

Note that if $\mathbf{R} = \mathbf{I}$ (i.e. no rotation), then $\mathbf{v} = \mathbf{0}$ and hence $\nexists|\mathbf{v}|^{-1}$. In this case, we can assign any arbitrary value to the axis of rotation.

⬆️ Back to top.

Orientation Feedback Control

We can use the angle-axis vector to perform feedback on the orientation of an automated system. Suppose $\mathbf{R}_d\in\mathbb{SO}(3)$ is the desired orientation, and $\mathbf{R}\in\mathbb{SO}(3)$ is our actual orientation. We can define our orientation error as:

\[\mathbf{E} \triangleq \mathbf{R}_d\mathbf{R}^T = e^{S(\boldsymbol{\epsilon})}. \tag{19}\]

If $\mathbf{R} = \mathbf{R}_d$ then $\mathbf{E} = \mathbf{I}$, implying no difference between orientations. From Eqn. (6) the time derivative of our rotation error is:

\[\dot{\mathbf{E}} = S(\dot{\boldsymbol{\epsilon}})\mathbf{E}~,~\dot{\boldsymbol{\epsilon}} = \boldsymbol{\omega}_d -\boldsymbol{\omega}. \tag{20}\]

where:

  • $\boldsymbol{\omega}_d\in\mathbb{R}^3$ is the desired angular velocity (rad/s),and
  • $\boldsymbol{\omega}\in\mathbb{R}^3$ is the actual angular velocity (rad/s).

Assuming $\boldsymbol{\omega}$ is our control input, we can define the control law:

\[\boldsymbol{\omega} \triangleq \boldsymbol{\omega}_d + \mathbf{K}\boldsymbol{\epsilon} \tag{21}\]

where $\mathbf{K}\in\mathbb{R}^{3\times 3}$ is a positive-definite gain matrix (an easy choice here is a diagonal matrix with positive values). The desired angular velocity $\boldsymbol{\omega}_d$ becomes a feed-forward term, whereas $\mathbf{K}\boldsymbol{\epsilon}$ is a proportional feedback on the orientation error. In such cases where $\boldsymbol{\omega}_d$ is unavailable, then $\boldsymbol{\omega} = \mathbf{K}\boldsymbol{\epsilon}$ is sufficient.

If we substitute Eqn. (21) in to Eqn. (20) we obtain:

\[\dot{\boldsymbol{\epsilon}} = -\mathbf{K}\boldsymbol{\epsilon} ~\Longrightarrow \boldsymbol{\epsilon}(t) = e^{-\mathbf{K}t}\boldsymbol{\epsilon}(0). \tag{22}\]

This form implies exponential decay. As the error angle approaches zero $\boldsymbol{\epsilon}\to \mathbf{0}$ then the orientation error will approach the identity $\mathbf{E}\to\mathbf{I}$ such that $\mathbf{R}\to\mathbf{R}_d$. This follows from the fact that $e^0 = 1$.

Below is a video of the ergoCub robot rotating an object using the bimanual manipulation library that I wrote whilst working as a Postdoc at the Italian Institute of Technology. It uses this exact method for orientation feedback control.


The ergoCub is able to rotate an object with 2 hands using the angle-axis representation for orientation control.

⬆️ Back to top.

  1. For any arbitrary matrix $\mathbf{A}\in\mathbb{R}^{m\times m}$ the eigenvector $\mathbf{v}\in\mathbb{C}^m$ and eigenvalue $\lambda\in\mathbb{C}$ obey the identity $\mathbf{Av} = \lambda\mathbf{v}$. 

]]>
<![CDATA[In this article I provide some basic definitions and proofs of identities for rotation matrices $\mathbf{R}\in\mathbb{SO}(3)$. I show that a rotation matrix can be represented as a matrix exponential. From this, Rodrigues’ formula follows which expresses the matrix in terms of the angle and axis of rotation. I then show how to reverse this formula to obtain the angle and axis from an arbitrary rotation matrix. Then using the exponential form, and the angle-axis, I derive a control law for the angular velocity to perform feedback control on orientation error.]]>
Feedback Control With Lie Groups2025-06-05T00:00:00+00:002025-06-05T00:00:00+00:00https://woolfrey.github.io/feedback/control/robot/lie%20group/2025/06/05/feedback-control-with-lie-groups<![CDATA[

In this post I extend the concept of linear feedback control for scalars and vectors in to the realm of Lie groups. Lie groups are mathematical objects with generalised properties for combining, inverting, and computing “differences”. They are used to represent orientation in 3D space in robotics and animation. By understanding their properties we can apply the same logic as linear systems and solve more sophisticated, nonlinear control problems.

📄 Download a PDF version.

🧭 Navigation

Linear Feedback Control

In a previous post I discussed the problem of solving feedback control for a linear system using a 3-step process. Given the current position $\mathbf{x}\in\mathbb{R}^m$ and the desired position $\mathbf{x}_d\in\mathbb{R}^{m}$, we:

1 Denote the error from the desired position:

\[\boldsymbol{\epsilon} = \mathbf{x}_d - \mathbf{x}. \tag{1}\]

2 Evaluate the time derivative:

\[\dot{\boldsymbol{\epsilon}} = \dot{\mathbf{x}}_d - \dot{\mathbf{x}} \tag{2}\]

3 Solve the input to force an exponential decay for the error:

\[\dot{\mathbf{x}} = \dot{\mathbf{x}}_d + \mathbf{K}\boldsymbol{\epsilon} ~\Longrightarrow~ \dot{\boldsymbol{\epsilon}} = -\mathbf{K}\boldsymbol{\epsilon} ~\Longrightarrow~ \boldsymbol{\epsilon}(t) = e^{-\mathbf{K}t}\boldsymbol{\epsilon}_0. \tag{3}\]

where $\mathbf{K}\in\mathbb{R}^{n\times n}$ is a positive definite matrix, such that $-\mathbf{K}$ has negative eigenvalues.

How do we perform feedback control for other types of mathematical structures?

For example, it is common to represent the orientation of a rigid body using a rotation matrix:

\[\mathbf{R} = \begin{bmatrix} r_{11} & r_{12} & r_{13} \\ r_{21} & r_{22} & r_{23} \\ r_{31} & r_{32} & r_{33} \end{bmatrix} \in\mathbb{R}^{3\times 3}. \tag{4}\]

Each of the columns are unit norm:

\[r_{1i}^2 + r_{2i}^2 + r_{3i}^2 = 1 \quad \text{for } i \in\{1,2,3\} \tag{5}\]

and are orthogonal:

\[r_{1i}r_{1j} + r_{2i}r_{2j} + r_{3i}r_{3j} = 0 \quad \quad \text{for } i, j \in\{1,2,3\} \text{ and } i \ne j \tag{6}\]

So we cannot add or subtract these matrices $\mathbf{R}_d - \mathbf{R}$ without violating these properties.

⬆️ Back to top.

Lie Groups

Lie groups are mathematical structures that satisfy 4 properties:

  1. Closure: combining 2 objects in the group remains in the group,
  2. Associativity: the order in which we cluster operations doesn’t matter, as long as the sequence remains the same,
  3. Identity: The element in the group that results in no change, and
  4. Inverse: The element that leads to the identity.

Vectors (over addition) form a Lie group.

  Vectors (Over Addition)
Closure: $\mathbf{x}_1,\mathbf{x}_2\in\mathbb{R}^n$ : $\mathbf{x}_1 + \mathbf{x}_2 \in\mathbb{R}^n$
Associativity: $\mathbf{x}_1 + \left(\mathbf{x}_2 + \mathbf{x}_3 \right) = \left(\mathbf{x}_1 + \mathbf{x}_2 \right) + \mathbf{x}_3$
Identity: $\mathbf{0} \in\mathbb{R}^n : \mathbf{x} + \mathbf{0} = \mathbf{x}$
Inverse: $-\mathbf{x} : \mathbf{x} + (-\mathbf{x}) = \mathbf{0}$

The closure and inverse property were applied to define the position error, Eq. (1). In fact, we can see that when the desired position is equal to the actual position, this leads to the identity:

\[\mathbf{x}_d = \mathbf{x} ~\Longrightarrow~ \mathbf{x}_d - \mathbf{x} = \mathbf{0}. \tag{7}\]

⬆️ Back to top.

Orientation Control with Rotation Matrices

The rotation matrix, Eq. (4), actually belongs to the Special Orthogonal $\mathbb{SO}$ group:

\[\mathbf{R}\in\mathbb{SO}(n) \triangleq \big\{\mathbf{R}\in\mathbb{R}^{n\times n} : \mathbf{RR}^T = \mathbf{I},~\det(\mathbf{R}) = 1 \big\}. \tag{8}\]

Importantly, the closure property is defined by matrix multiplication, and inverse by its transpose.

  Special Orthogonal Group
Closure: $\mathbf{R}_1,\mathbf{R}_2\in\mathbb{SO}(n)$ : $\mathbf{R}_1\mathbf{R}_2\in\mathbb{SO}(n)$
Associativity: $\mathbf{R}_1 \left(\mathbf{R}_2 \mathbf{R}_3 \right) = \left(\mathbf{R}_1 \mathbf{R}_2 \right) \mathbf{R}_3$
Identity: $\mathbf{I} \in\mathbb{SO}(n)\subset\mathbb{R}^{n\times n} : \mathbf{R}\mathbf{I} = \mathbf{R}$
Inverse: $\mathbf{R}^T : \mathbf{RR}^T = \mathbf{I}$

As with Eq. (1), we first apply the closure and inverse properties of $\mathbb{SO}(n)$ to define the rotation error as:

\[\mathbf{E} = \mathbf{R}_d\mathbf{R}^T. \tag{9}\]

The $\mathbb{SO}$ group can actually be written as a matrix exponential, so we can instead write Eq. (9) as:

\[\mathbf{E} = e^{S(\boldsymbol{\epsilon})} \tag{10}\]

where:

  • $\theta = |\boldsymbol{\epsilon}| \in [0, 2\pi]$ is the magnitude of the rotation error (rad), and
  • $\hat{\boldsymbol{\epsilon}} = \frac{\boldsymbol{\epsilon}}{|\boldsymbol{\epsilon}|} \in\mathbb{R}^3$ is the axis to rotate about,

and

\[S(\boldsymbol{\epsilon}) = \begin{bmatrix} \phantom{-}0 & -\epsilon_z & \phantom{-}\epsilon_y \\ \phantom{-}\epsilon_z & \phantom{-}0 & -\epsilon_x \\ -\epsilon_y & \phantom{-}\epsilon_x & \phantom{-}0 \end{bmatrix} \in\mathfrak{so}(3) \tag{11}\]

is the Lie algebra of $\mathbb{SO}(3)$ (a skew-symmetric matrix).

Second, we evaluate the time derivative which, from Eq. (10), becomes:

\[\dot{\mathbf{E}} = S(\dot{\boldsymbol{\epsilon}})\mathbf{E} ~,~\dot{\boldsymbol{\epsilon}} = \boldsymbol{\omega}_d - \boldsymbol{\omega}. \tag{12}\]

The time derivative of the Lie algebra is actually the difference between the desired velocity $\boldsymbol{\omega}_d\in\mathbb{R}^3$ (rad/s), and the actual velocity $\boldsymbol{\omega}\in\mathbb{R}^3$ (rad/s).

Now instead of operating over $\mathbb{SO}(3)$ or $\mathfrak{so}(3)$, we can apply what we already know about $\mathbb{R}^n$. If we define the input angular velocity as:

\[\boldsymbol{\omega} \triangleq \boldsymbol{\omega}_d + \mathbf{K}\boldsymbol{\epsilon} \tag{13}\]

for a matrix $\mathbf{K}\in\mathbb{R}^{3\times 3}$, then the error derivative becomes:

\[\dot{\boldsymbol{\epsilon}} = -\mathbf{K}\boldsymbol{\epsilon} ~\Longrightarrow~ \boldsymbol{\epsilon} = e^{-\mathbf{K}t}\boldsymbol{\epsilon}_0. \tag{14}\]

Likewise, the rotation error will decay to the identity:

\[\lim_{t\to\infty} \mathbf{E} = e^{-S\left(\mathbf{K}\boldsymbol{\epsilon}\right)t} = \mathbf{I}. \tag{15}\]

Below is a simulation of the ergoCub where I used this principle to enable it to rotate an object when grasping with 2 hands.


We can use the underlying Lie algebra of the rotation matrix to control the orientation of a robot's hands.

⬆️ Back to top.

]]>
<![CDATA[In this post I extend the concept of linear feedback control for scalars and vectors in to the realm of Lie groups. Lie groups are mathematical objects with generalised properties for combining, inverting, and computing “differences”. They are used to represent orientation in 3D space in robotics and animation. By understanding their properties we can apply the same logic as linear systems and solve more sophisticated, nonlinear control problems.]]>