Chapter 17

Optical Matrices

\( \def\bmatrix#1{\begin{bmatrix}#1\end{bmatrix}} \)

In this chapter, we will begin using matrices and vectors to describe optical systems. The end result here will be the matrices to describe a refracting surface, and proving that the thin lens equation is correct.

There is a lot of maths here, because one aim of this chapter is to prove what has previously been assumed to be true. However, the maths is mainly difficult because of the number of steps needed to do anything. Each step is fairly simple.

Writing a Light Ray as a Vector

The basic setup is shown in Figure 1. A baseline (which will be called the optic axis) runs through the entire part of the world that we care about. A single ray of light has been drawn above it, travelling from left to right. As the ray passes through a vertical plane, called a reference plane, it can be described by two numbers. The first is the height of the ray above the baseline, which we will call \( h \) (Figure 1(b)). The second is how much the ray is sloping up or down, which we will call \( s \) (Figure 1(c)).

fig1a
Figure 1(a). A light ray passes through a reference plane which is perpendicular to a baseline (or optic axis). The light ray is described by two numbers.
fig1b
Figure 1(b). The first number is the height \( h \) above or below the optic axis. If the ray hits the reference plane below the optic axis, the height is negative.
fig1c
Figure 1(c). The second number is the slope \( s \) of the ray. The slope is defined as the change in height divided by the distance travelled. If the ray is travelling upwards, the slope is positive. If travelling downwards, the slope is negative.

The slope says how quickly the ray increases (or decreases) it's height as it travels along. If the ray rises \( 0.1\text{m} \) for every \( 0.5\text{m} \) it travels, the slope of the ray is \( s=0.1/0.5 = 0.2 \) If the ray falls \( 0.2\text{m} \) for every \( 3\text{m} \) it travels, the slope of the ray is \( -0.2/3 = -0.0667 \). In general (Figure 3(c)),

\[\text{ray slope} = \dfrac{\text{change in height}}{\text{distance travelled}} \]

So any ray passing through the reference plane can be described by two numbers \( h \) and \( s \). We can put these two numbers into a vector \( \begin{bmatrix} h \\ s\end{bmatrix} \)

The Matrix for a Gap

The simplest case we will look at - but a very useful one - is to look at what happens to a ray as it travels through a gap of width \( d \) (Figure 2). At the start of the gap, the ray (as it passes through the first reference plane) can be described by a vector \( \begin{bmatrix} h_1 \\ s_1\end{bmatrix} \). At the end of the gap, the ray (as it passes through the second reference plane) can be described by another vector \( \begin{bmatrix} h_2 \\ s_2\end{bmatrix} \).

fig2
Figure 2. A light ray travelling through a gap. The gap is defined by two reference planes, separated by a distance \( d \). The height of the ray as it passes through the first plane is \( h_1 \), and the height of the ray as it passes through the second plane is \( h_2 \).

How are these two vectors related? First, it's pretty clear that the slope doesn't change as the ray travels through the gap, so

\[s_2 = s_1\tag{1}\]

However, the height of the ray does change, depending on the slope. The definition of slope can be rewritten to be

\[\text{change in height} = \text{distance travelled}\times\text{ray slope} \]

That is, the change in height over a gap with width \( d \) is \( d\times s_1 \). The final height \( h_2 \) is the initial height \( h_1 \) plus the change in height, or

\[h_2 = h_1+d s_1\tag{2}\]

Putting Equations (1) and (2) together, we have:

\[ \begin{array}{c} h_2 &=& h_1 &+& &d s_1 \\ s_2 &=& & & &s_1 \end{array} \]

These equations can be put into matrix notation as:

\[\bmatrix{h_2 \\ s_2}=\bmatrix{1 & d \\ 0 & 1}\bmatrix{h_1 \\ s_1} \]

If this looks a lot like the matrices describing how a car travels that we worked through in Chapter 16, that's no coincidence. The slope of a ray tells us how fast the ray changes its height, so it acts a lot like the speed of the car (which tells us how fast a car changes its position) in the examples given in Chapter 16.

Two Gaps

What happens to a ray of light if it passes through two gaps, one after the other? Suppose the first gap has width \( d_1 \), and so the matrix describing that gap is

\[\bmatrix{1 & d_1 \\ 0 & 1} \]

The second gap has width \( d_2 \) and so the matrix describing that gap is

\[\bmatrix{1 & d_2 \\ 0 & 1} \]

The ray passes through the first gap, and then the second gap, so the matrix equation describing this is

\[\bmatrix{h_2 \\ s_2}=\bmatrix{1 & d_2 \\ 0 & 1} \left(\bmatrix{1 & d_1 \\ 0 & 1}\bmatrix{h_1 \\ s_1}\right) \]

Notice that the ray is first multiplied by the first gap matrix (in brackets) and the result of this is then multiplied by the second gap matrix. This is because the ray hits the first gap first, and then the resulting ray hits the second gap. But it means the matrices are written from right to left, rather than the left to right order of the gaps.

However, because matrix multiplication is associative, we can re-bracket the equation as follows:

\[\bmatrix{h_2 \\ s_2}=\left(\bmatrix{1 & d_2 \\ 0 & 1} \bmatrix{1 & d_1 \\ 0 & 1}\right)\bmatrix{h_1 \\ s_1} \]

We can multiply together the two gap matrices in the brackets to give a matrix for the overall gap:

\[\bmatrix{h_2 \\ s_2}=\bmatrix{1 & d_1+d_2 \\ 0 & 1}\bmatrix{h_1 \\ s_1} \]

So the matrix for a gap of width \( d_1 \) followed by another gap of width \( d_2 \) is simply the matrix for a gap of width \( d_1+d_2 \). This seems perfectly sensible, if not particularly amazing.

The Matrix for a Refracting Surface.

Almost all optical systems are composed of refracting surfaces and gaps. We already know the matrix for a gap, so if we know the matrix for a refracting surface, we basically know all the matrices we will ever need for optics.

Figure 3(a) shows a light ray being refracted by a spherical surface. The centre of the spherical surface (a large white dot in the diagram) must be on the baseline, or optic axis. The refractive indices to the left and right of the surface are \( n_{in} \) and \( n_{out} \). We are interested in what happens to the ray as it passes through a reference plane that has been drawn through the point where the ray hits the surface.

fig3a
Figure 3 (a). A ray if light is refracted by a spherical surface. The centre of the sphere (large white dot) is placed on the horizontal baseline. A reference plane has bee drawn where the ray is refracted by the surface.
fig3b
Figure 3 (b). The height of the ray before refraction \( h_1 \) and the height after \( h_2 \) have been labelled here. Clearly, they don't change.
fig3c
Figure 3 (c). The slope of the ray before refraction \( s_1 \) and the slope after \( s_2 \) have been marked by shaded triangles. The slopes change as the ray is refracted.

In Figure 3(b) the height of the ray before refraction and the height after have been labelled. Obviously, the heights aren't changed by refraction, so we can say

\[h_2 = h_1\tag{3}\]

If Figure 3(c) the slope of the ray before refraction and the slope after refraction have been marked out by shaded triangles. The slopes change - that is what refraction does. How the slope changes is complicated, and we will derive it at the end of this chapter; but for now, we will simply declare that the change in slope is:

\[s_2 = \dfrac{-F_{surface}}{n_{out}} h_1 + \dfrac{n_{in}}{n_{out}} s_1 \tag{4}\]

where \( F_{surface} \) is the surface power \( (n_{out}-n_{in})/r \). Putting Equations (3) and (4) together into a matrix equation gives

\[ \begin{bmatrix} h_2 \\ \vphantom{\dfrac{-F_{surface}}{n_{out}}} s_2 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ \dfrac{-F_{surface}}{n_{out}} & \dfrac{n_{in}}{n_{out}} \end{bmatrix} \begin{bmatrix} h_1 \\ \vphantom{\dfrac{-F_{surface}}{n_{out}}} s_1 \end{bmatrix} \]

The matrix in this equation is the matrix for a refracting surface.

The Matrix for a Thick Lens.

A thick lens consists of two surfaces separated by a gap. Since we already know the matrix for a refracting surface, and the matrix for a gap, we can work out the matrix for a thick lens. The front and back (i.e. left and right) surfaces of the lens have power \( F_{front} \) and \( F_{back} \) . The lens has refractive index \( n \) and the refractive index of air is \( 1 \). The gap between the front and back surfaces is \( d \). Thus, the lens can be described by three matrices:

A ray of light \( \bmatrix{h_1 \\ s_1} \) travelling through the lens and emerging as \( \bmatrix{h_2 \\ s_2} \) can be described by the following matrix equation:

\[ \begin{bmatrix}h_2 \\ \vphantom{\dfrac{-F_{back}}{1}} s_2\end{bmatrix} = \begin{bmatrix} 1 & 0 \\ \dfrac{-F_{back}}{1} & \dfrac{n}{1} \end{bmatrix} \left( \begin{bmatrix} 1 & d \\ \vphantom{\dfrac{-F_{back}}{1}} 0 & 1 \end{bmatrix} \left( \begin{bmatrix} 1 & 0 \\ \dfrac{-F_{front}}{n} & \dfrac{1}{n} \end{bmatrix} \begin{bmatrix}h_1 \\ \vphantom{\dfrac{-F_{back}}{1}} s_1\end{bmatrix} \right) \right) \]

The brackets have been put in to show the order in which we'd naturally do the multiplication. First, we would multiply the ray vector by the front surface matrix to get another ray vector. We'd then multiply that by the gap matrix to get a second vector. Finally, we'd multiply that vector by the back surface matrix.

However, thanks to the associative law, we can rebracket the above equation as:

\[ \begin{bmatrix}h_2 \\ \vphantom{\dfrac{-F_{back}}{1}} s_2\end{bmatrix} = \left( \begin{bmatrix} 1 & 0 \\ \dfrac{-F_{back}}{1} & \dfrac{n}{1} \end{bmatrix} \begin{bmatrix} 1 & d \\ \vphantom{\dfrac{-F_{back}}{1}} 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ \dfrac{-F_{front}}{n} & \dfrac{1}{n} \end{bmatrix} \right) \begin{bmatrix}h_1 \\ \vphantom{\dfrac{-F_{back}}{1}} s_1\end{bmatrix} \]

and then multiply all the matrices in brackets together to get a single matrix for a lens. If we do this, we get

\[ \begin{bmatrix} \vphantom{F_{front}}h_2 \\ \vphantom{\dfrac{d}{n} F_{front}}s_2\end{bmatrix} = \begin{bmatrix} 1-d/n F_{front} & d/n \\ -\left(F_{front}+F_{back}-\dfrac{d}{n} F_{front} F_{back}\right) & 1-d/n F_{back} \end{bmatrix} \begin{bmatrix} \vphantom{F_{front}}h_1 \\ \vphantom{\dfrac{d}{n} F_{front}}s_1\end{bmatrix} \]

The matrix doesn't look all that simple, but once the various values \( d \), \( n \), \( F_{front} \) , and \( F_{back} \) are known, the elements of the matrix can be calculated quite simply, and then we could use the matrix to figure out what happens to any ray hitting it.

The quantity in brackets in the lower left corner of the matrix, \( F_{front}+F_{back}-\dfrac{d}{n} F_{front} F_{back} \), is called the equivalent power, for reasons which will be made clear next.

The Matrix for a Thin Lens.

A thin lens is one where the thickness \( d \) is very small. At the extreme, \( d=0 \). In this case, the matrix for a thin lens is given by setting \( d \) to zero in the above lens matrix, to give:

\[\bmatrix{1 & 0\\ -\left(F_{front}+F_{back}\right) & 1}\]

For a thin lens, the power \( F \) is just the sum of the front and back powers, so we can write the thin lens matrix even more simply as:

\[\text{thin lens matrix} =\bmatrix{1 & 0\\ -F & 1} \]

Notice that the power of the thin lens is in the lower left corner of the matrix, in the same place as the equivalent power of the thick lens matrix. That's why \( F_{front}+F_{back}-(d/n)F_{front}F_{back} \) is called the equivalent power.

The Thin Lens Equation \( V_{in}+F=V_{out} \)

Suppose that an optical system can be described by the following matrix equation:

\[\bmatrix{h_2 \\ s_2}= \bmatrix{A & 0 \\ C & D}\bmatrix{h_1 \\ s_1} \]

Notice that here the final height \( h_2 \) only depends on the height at the start \( h_1 \), since \( h_2=A h_1 \) , and the slope doesn't enter into the calculation of height. What does that mean?

There are two possibilities. The first (not very interesting) possibility is that the ray hasn't travelled any distance at all, like the ray in Figure 3(b). The second, more interesting situation is when the ray has travelled some distance.

This situation is shown in Figure 4. All the rays of light at the start, with height \( h_1 \), end up at the end with height \( h_2 \), regardless of the slopes at the start and the end. This means that all the rays at the start that diverge from a single point are converged to a single point at the end.

This is what happens when an object forms an image, so the matrix with a zero in the top right corner describes an image-forming optical system. The value of \( A \) in the matrix gives the linear magnification (i.e. how much bigger or smaller the image is compared to the object). To summarize,

Any optical matrix with a zero in the top right corner is describing a system where an object produces an image.
Figure 4. The two reference planes define the start and end of an optical system. At the first plane, a number of rays have been drawn with the same height \( h_1 \) but different slopes. If the optical matrix describing this system \( \begin{bmatrix}A & 0 \\ C & D \end{bmatrix} \) has a zero in the top right corner, all the rays leaving the point \( h_1 \) will end up at the same point \( h_2 = A h_1 \) on the second plane, regardless of their slope.

That is, rays diverging from the point \( h_1 \) end up converging on the point \( h_2 \).

Suppose then that we have a thin lens set up so that an object produces an image. Let the distance from object to lens be \( u \), and the distance from lens to image be \( v \). (These are both positive distances, because we are measuring both from left to right in the direction of the light travel.) The distances \( u \) and \( v \) and the lens can all be described by matrices:

Putting these together in a single matrix equation gives:

\[\bmatrix{h_2 \\ s_2}= \bmatrix{1 & v \\ 0 & 1}\bmatrix{1 & 0 \\ -F & 1}\bmatrix{1 & u \\ 0 & 1} \bmatrix{h_1 \\ s_1} \]

As always, we can multiply the matrices together to give an equation with a single matrix:

\[\bmatrix{h_2 \\ s_2} = \bmatrix{1-v F & u+v - u v F \\ -F & 1 - u F} \bmatrix{h_1 \\ s_1} \]

Although we are thinking of \( u \) and \( v \) as being object and image distances, this matrix is actually for system with any two gaps and a lens. To make sure that the gaps \( u \) and \( v \) are in fact the object and image distances, the top right corner of this matrix must be zero. That is,

\[ u+v - u v F = 0 \]

We can get to the thin lens equation by following these steps:

Begin with:
\[u+v - u v F = 0\]
Divide both sides by \( u \):
\[1+\frac{v}{u} - v F = 0\]
Divide both sides by \( v \):
\[\frac{1}{v}+\frac{1}{u} - F = 0\]
Multiply by \( -1 \):
\[-\frac{1}{v} - \frac{1}{u} + F = 0\]
Add \( 1/v \) to both sides:
\[- \frac{1}{u} + F = \frac{1}{v} \]
Tidy up the \( 1/u \):
\[\frac{1}{-u} + F = \frac{1}{v} \]
Notice that \( 1/v \) is the same as \( V_{out} \):
\[\frac{1}{-u} + F = V_{out} \]
Notice that \( 1/(-u) \) is the same as \( V_{in} \):
\[V_{in} + F = V_{out} \]

So, here, we've actually proved the thin lens equation is correct.

Deriving the Matrix for a Refracting Surface.

The only thing we haven't proven so far is Equation (4), which says how the slope of the ray changes as it passes through a refracting surface. We will do that here. The strategy for proving Equation (4) is to first build up a set of facts about the slopes and angles of a ray that passes through a spherical surface, and then assemble them into the equation.

The facts first. Figure 5(a) shows the slope of a ray passing through a spherical refracting surface (the same as Figure 3(c)). An extra horizontal line has been drawn where the ray intersects the surface, as this will be useful in working out some angles.

fig5a
Figure 5(a). This repeats Figure 3(c) except a horizontal line (parallel to the optic axis) has been drawn where the ray hits the refracting surface.
fig5b
Figure 5(b). The angle \( a \) is the angle between the incident ray and the added horizontal line. It is also the angle inside the triangle whose height and length define the ray slope \( s_1 \). From the diagram, \( \tan{a}=s_1 \).
fig5c
Figure 5(c). The angle \( b \) is the angle between the refracted ray and the added horizontal line. It is also the angle inside the triangle whose height and length define the ray slope \( s_2 \), because the long side of the triangle is also horizontal. From the diagram, \( \tan{b}=-s_2 \). The minus sign is needed because we are treating all angles as positive (so their tan is positive), but \( s_2 \) is negative in the diagram.

From Figure 5(b), the angle \( a \) between the incoming ray and the horizontal line is the same as the angle \( a \) inside the triangle marking the slope of the ray. The change in height is opposite the angle, and the distance travelled is adjacent to the angle in the triangle, so

\[ \tan{(a)}=s_1 \]

From Figure 5(c), the angle \( b \) between the incoming ray and the horizontal line is the same as the angle \( b \) inside the triangle marking the slope of the ray. The change in height is opposite the angle, and the distance travelled is adjacent to the angle in the triangle, so

\[ \tan{(b)}= -s_2 \]

(The negative sign here is because the slope \( s_2 \) is negative, since the ray is travelling downwards, but we're treating all angles as positive.) Finally, to make the algebra coming up a little easier, we will use the small-angle approximation, which is \( \theta\approx \sin{(\theta)} \approx \tan{(\theta)} \) , which is mostly true when the angle \( \theta \) is small, and measured in radians. using this approximation, the above two facts can be summarized as:

\[ \begin{align} a &\approx s_1 \\ b &\approx -s_2 \end{align} \]

Our next set of facts uses Snell's Law, or the small-angle approximation to it. Figure 6(a) shows the angle of incidence \( \theta_{in} \) and the angle of refraction \( \theta_{out} \). The surface normal is the line passing from the centre of the sphere through the surface. Snell's Law says

\[ n_{in}\sin{(\theta_{in})} = n_{out}\sin{(\theta_{out})} \]

Using the small-angle approximation, this can be written as

\[ n_{in}\theta_{in} \approx n_{out}\theta_{out} \]
fig6a
Figure 6(a). The red line from the centre of the sphere to the edge is at right angles to the sphere's surface; that is, it is a surface normal. The angles of incidence and refraction \( \theta_{in} \) and \( \theta_{out} \) between the rays and the normal have been marked.
fig6b
Figure 6(b). Here, the angles \( a \) and \( b \) from Figure 5 have been added in.
fig6c
Figure 6(c). We define a new angle \( \alpha \) between the optic axis and the red radial line. All the angles equal to \( \alpha \) have been drawn in.
fig6d
Figure 6(d). Finally, we look at how \( \alpha \) relates to some distances in the diagram. The tangent of \( \alpha \) is the height of the shaded triangle divided by the base. The height of the triangle is \( h_1 \). When \( \alpha \) is small (much smaller than here), the base of the triangle is approximately the same as the radius \( r \) of the sphere. Thus, when \( \alpha \) is small, \( \tan{(\alpha)}\approx h_1/r \).

We would like to connect \( \theta_{in} \) and \( \theta_{out} \) to the ray slopes \( a \) and \( b \). These have been added in Figure 6(b) . In Figure 6(c), an angle \( \alpha \) has been added, which is the angle between the normal line and the optic axis. From Figure 6(c), we can say:

\[ \begin{align} \theta_{in} &= \alpha+a \\ \theta_{out} &= \alpha-b \end{align} \]

The last fact we need is shown in Figure 6(d): \( \tan{\alpha} \approx h_1/r \), because \( r \) is approximately the length of the adjacent side of the triangle shown. Using the small angle approximation again ( \( \tan{\theta}\approx\theta \)), we have our last fact:

\[ \alpha \approx \dfrac{h_1}{r} \]

Actually, that's not quite true. The distance \( r \) is a negative distance by the sign convention, but \( h_1 \) is positive, and we want the angle \( \alpha \) to be positive, like all the other angles. So to get the signs to work out, we need to write:

\[ \alpha \approx \dfrac{h_1}{(-r)} \]

To summarize, from Figures 5 and 6, we have the following facts:

\[ \begin{align} a &\approx s_1 \\ b &\approx -s_2 \\ \theta_{in} &= \alpha+a \\ \theta_{out} &= \alpha -b \\ \alpha &\approx \dfrac{h_1}{(-r)} \\ \end{align} \]

And from Snell's law,

\[ n_{in}\theta_{in} \approx n_{out}\theta_{out} \]

These facts involve \( s_1 \), \( s_2 \), and \( h_1 \), but they also involve other quantities. We want to put them together so we can calculate \( s_2 \) from \( s_1 \) and \( h_1 \), and nothing else. The following steps will do it:

Begin with
\[n_{in}\theta_{in} \approx n_{out}\theta_{out}\]
Substitute \( \theta_{in} = \alpha+a \)
\[n_{in}(\alpha+a) \approx n_{out}\theta_{out}\]
Substitute \( \theta_{out} = \alpha-b \)
\[n_{in}(\alpha+a) \approx n_{out}(\alpha-b)\]
Notice that we've got rid of \( \theta_{in} \) and \( \theta_{out} \)
Substitute \( a\approx s_1 \)
\[n_{in}(\alpha+s_1) \approx n_{out}(\alpha-b)\]
Substitute \( b\approx -s_2 \) , or
equivalently, \( -b\approx s_2 \)
\[n_{in}(\alpha+s_1) \approx n_{out}(\alpha+s_2)\]
Now we've also got rid of \( a \) and \( b \)
Expand
\[n_{in}\alpha+n_{in}s_1 \approx n_{out}\alpha+n_{out}s_2\]
Gather terms in \( \alpha \)
\[(n_{in}-n_{out})\alpha+n_{in}s_1 \approx n_{out}s_2\]
Divide both sides by \( n_{out} \)
\[\dfrac{n_{in}-n_{out}}{n_{out}}\alpha+\dfrac{n_{in}}{n_{out}}s_1 \approx s_2\]
Substitute \( \alpha=h_1/r \)
\[\dfrac{n_{in}-n_{out}}{n_{out}}\dfrac{h_1}{(-r)}+\dfrac{n_{in}}{n_{out}}s_1 \approx s_2\]
Now we've got rid of \( \alpha \)
Note that \( (n_{in}-n_{out})/(-r)=-F_{surface} \)
\[\dfrac{-F_{surface}}{n_{out}}h_1+\dfrac{n_{in}}{n_{out}}s_1 \approx s_2\]

That is Equation (4) and we're finished.