Chapter 17: Optical Matrices

In this chapter, we will begin using matrices and vectors to describe optical systems. The end result here will be the matrices to describe a refracting surface, and proving that the thin lens equation is correct.

There is a lot of maths here, because one aim of this chapter is to prove what has previously been assumed to be true. However, the maths is mainly difficult because of the number of steps needed to do anything. Each step is fairly simple.

Writing a Light Ray as a Vector

The basic setup is shown in Figure 1. A baseline (which will be called the optic axis) runs through the entire part of the world that we care about. A single ray of light has been drawn above it, travelling from left to right. As the ray passes through a vertical plane, called a reference plane, it can be described by two numbers. The first is the height of the ray above the baseline, which we will call \( h \) (Figure 1(b)). The second is how much the ray is sloping up or down, which we will call \( s \) (Figure 1(c)).

The slope says how quickly the ray increases (or decreases) it's height as it travels along. If the ray rises \( 0.1\text{m} \) for every \( 0.5\text{m} \) it travels, the slope of the ray is \( s=0.1/0.5 = 0.2 \) If the ray falls \( 0.2\text{m} \) for every \( 3\text{m} \) it travels, the slope of the ray is \( -0.2/3 = -0.0667 \). In general (Figure 3(c)),

\[\text{ray slope} = \dfrac{\text{change in height}}{\text{distance travelled}} \]

So any ray passing through the reference plane can be described by two numbers \( h \) and \( s \). We can put these two numbers into a vector \( \begin{bmatrix} h \\ s\end{bmatrix} \)

The Matrix for a Gap

The simplest case we will look at - but a very useful one - is to look at what happens to a ray as it travels through a gap of width \( d \) (Figure 2). At the start of the gap, the ray (as it passes through the first reference plane) can be described by a vector \( \begin{bmatrix} h_1 \\ s_1\end{bmatrix} \). At the end of the gap, the ray (as it passes through the second reference plane) can be described by another vector \( \begin{bmatrix} h_2 \\ s_2\end{bmatrix} \).

How are these two vectors related? First, it's pretty clear that the slope doesn't change as the ray travels through the gap, so

\[s_2 = s_1\tag{1}\]

However, the height of the ray does change, depending on the slope. The definition of slope can be rewritten to be

\[\text{change in height} = \text{distance travelled}\times\text{ray slope} \]

That is, the change in height over a gap with width \( d \) is \( d\times s_1 \). The final height \( h_2 \) is the initial height \( h_1 \) plus the change in height, or

\[h_2 = h_1+d s_1\tag{2}\]

\[ \begin{array}{c} h_2 &=& h_1 &+& &d s_1 \\ s_2 &=& & & &s_1 \end{array} \]

\[\bmatrix{h_2 \\ s_2}=\bmatrix{1 & d \\ 0 & 1}\bmatrix{h_1 \\ s_1} \]

If this looks a lot like the matrices describing how a car travels that we worked through in Chapter 16, that's no coincidence. The slope of a ray tells us how fast the ray changes its height, so it acts a lot like the speed of the car (which tells us how fast a car changes its position) in the examples given in Chapter 16.

Two Gaps

What happens to a ray of light if it passes through two gaps, one after the other? Suppose the first gap has width \( d_1 \), and so the matrix describing that gap is

\[\bmatrix{1 & d_1 \\ 0 & 1} \]

\[\bmatrix{1 & d_2 \\ 0 & 1} \]

The ray passes through the first gap, and then the second gap, so the matrix equation describing this is

\[\bmatrix{h_2 \\ s_2}=\bmatrix{1 & d_2 \\ 0 & 1} \left(\bmatrix{1 & d_1 \\ 0 & 1}\bmatrix{h_1 \\ s_1}\right) \]

Notice that the ray is first multiplied by the first gap matrix (in brackets) and the result of this is then multiplied by the second gap matrix. This is because the ray hits the first gap first, and then the resulting ray hits the second gap. But it means the matrices are written from right to left, rather than the left to right order of the gaps.

However, because matrix multiplication is associative, we can re-bracket the equation as follows:

\[\bmatrix{h_2 \\ s_2}=\left(\bmatrix{1 & d_2 \\ 0 & 1} \bmatrix{1 & d_1 \\ 0 & 1}\right)\bmatrix{h_1 \\ s_1} \]

We can multiply together the two gap matrices in the brackets to give a matrix for the overall gap:

\[\bmatrix{h_2 \\ s_2}=\bmatrix{1 & d_1+d_2 \\ 0 & 1}\bmatrix{h_1 \\ s_1} \]

So the matrix for a gap of width \( d_1 \) followed by another gap of width \( d_2 \) is simply the matrix for a gap of width \( d_1+d_2 \). This seems perfectly sensible, if not particularly amazing.

The Matrix for a Refracting Surface.

Almost all optical systems are composed of refracting surfaces and gaps. We already know the matrix for a gap, so if we know the matrix for a refracting surface, we basically know all the matrices we will ever need for optics.

Figure 3(a) shows a light ray being refracted by a spherical surface. The centre of the spherical surface (a large white dot in the diagram) must be on the baseline, or optic axis. The refractive indices to the left and right of the surface are \( n_{in} \) and \( n_{out} \). We are interested in what happens to the ray as it passes through a reference plane that has been drawn through the point where the ray hits the surface.

In Figure 3(b) the height of the ray before refraction and the height after have been labelled. Obviously, the heights aren't changed by refraction, so we can say

\[h_2 = h_1\tag{3}\]

If Figure 3(c) the slope of the ray before refraction and the slope after refraction have been marked out by shaded triangles. The slopes change - that is what refraction does. How the slope changes is complicated, and we will derive it at the end of this chapter; but for now, we will simply declare that the change in slope is:

\[s_2 = \dfrac{-F_{surface}}{n_{out}} h_1 + \dfrac{n_{in}}{n_{out}} s_1 \tag{4}\]

where \( F_{surface} \) is the surface power \( (n_{out}-n_{in})/r \). Putting Equations (3) and (4) together into a matrix equation gives

\[ \begin{bmatrix} h_2 \\ \vphantom{\dfrac{-F_{surface}}{n_{out}}} s_2 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ \dfrac{-F_{surface}}{n_{out}} & \dfrac{n_{in}}{n_{out}} \end{bmatrix} \begin{bmatrix} h_1 \\ \vphantom{\dfrac{-F_{surface}}{n_{out}}} s_1 \end{bmatrix} \]

The Matrix for a Thick Lens.

A thick lens consists of two surfaces separated by a gap. Since we already know the matrix for a refracting surface, and the matrix for a gap, we can work out the matrix for a thick lens. The front and back (i.e. left and right) surfaces of the lens have power \( F_{front} \) and \( F_{back} \) . The lens has refractive index \( n \) and the refractive index of air is \( 1 \). The gap between the front and back surfaces is \( d \). Thus, the lens can be described by three matrices:

A ray of light \( \bmatrix{h_1 \\ s_1} \) travelling through the lens and emerging as \( \bmatrix{h_2 \\ s_2} \) can be described by the following matrix equation:

\[ \begin{bmatrix}h_2 \\ \vphantom{\dfrac{-F_{back}}{1}} s_2\end{bmatrix} = \begin{bmatrix} 1 & 0 \\ \dfrac{-F_{back}}{1} & \dfrac{n}{1} \end{bmatrix} \left( \begin{bmatrix} 1 & d \\ \vphantom{\dfrac{-F_{back}}{1}} 0 & 1 \end{bmatrix} \left( \begin{bmatrix} 1 & 0 \\ \dfrac{-F_{front}}{n} & \dfrac{1}{n} \end{bmatrix} \begin{bmatrix}h_1 \\ \vphantom{\dfrac{-F_{back}}{1}} s_1\end{bmatrix} \right) \right) \]

The brackets have been put in to show the order in which we'd naturally do the multiplication. First, we would multiply the ray vector by the front surface matrix to get another ray vector. We'd then multiply that by the gap matrix to get a second vector. Finally, we'd multiply that vector by the back surface matrix.

\[ \begin{bmatrix}h_2 \\ \vphantom{\dfrac{-F_{back}}{1}} s_2\end{bmatrix} = \left( \begin{bmatrix} 1 & 0 \\ \dfrac{-F_{back}}{1} & \dfrac{n}{1} \end{bmatrix} \begin{bmatrix} 1 & d \\ \vphantom{\dfrac{-F_{back}}{1}} 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ \dfrac{-F_{front}}{n} & \dfrac{1}{n} \end{bmatrix} \right) \begin{bmatrix}h_1 \\ \vphantom{\dfrac{-F_{back}}{1}} s_1\end{bmatrix} \]

and then multiply all the matrices in brackets together to get a single matrix for a lens. If we do this, we get

\[ \begin{bmatrix} \vphantom{F_{front}}h_2 \\ \vphantom{\dfrac{d}{n} F_{front}}s_2\end{bmatrix} = \begin{bmatrix} 1-d/n F_{front} & d/n \\ -\left(F_{front}+F_{back}-\dfrac{d}{n} F_{front} F_{back}\right) & 1-d/n F_{back} \end{bmatrix} \begin{bmatrix} \vphantom{F_{front}}h_1 \\ \vphantom{\dfrac{d}{n} F_{front}}s_1\end{bmatrix} \]

The matrix doesn't look all that simple, but once the various values \( d \), \( n \), \( F_{front} \) , and \( F_{back} \) are known, the elements of the matrix can be calculated quite simply, and then we could use the matrix to figure out what happens to any ray hitting it.

The quantity in brackets in the lower left corner of the matrix, \( F_{front}+F_{back}-\dfrac{d}{n} F_{front} F_{back} \), is called the equivalent power, for reasons which will be made clear next.

The Matrix for a Thin Lens.

A thin lens is one where the thickness \( d \) is very small. At the extreme, \( d=0 \). In this case, the matrix for a thin lens is given by setting \( d \) to zero in the above lens matrix, to give:

\[\bmatrix{1 & 0\\ -\left(F_{front}+F_{back}\right) & 1}\]

For a thin lens, the power \( F \) is just the sum of the front and back powers, so we can write the thin lens matrix even more simply as:

\[\text{thin lens matrix} =\bmatrix{1 & 0\\ -F & 1} \]

Notice that the power of the thin lens is in the lower left corner of the matrix, in the same place as the equivalent power of the thick lens matrix. That's why \( F_{front}+F_{back}-(d/n)F_{front}F_{back} \) is called the equivalent power.

The Thin Lens Equation \( V_{in}+F=V_{out} \)

Suppose that an optical system can be described by the following matrix equation:

\[\bmatrix{h_2 \\ s_2}= \bmatrix{A & 0 \\ C & D}\bmatrix{h_1 \\ s_1} \]

Notice that here the final height \( h_2 \) only depends on the height at the start \( h_1 \), since \( h_2=A h_1 \) , and the slope doesn't enter into the calculation of height. What does that mean?

There are two possibilities. The first (not very interesting) possibility is that the ray hasn't travelled any distance at all, like the ray in Figure 3(b). The second, more interesting situation is when the ray has travelled some distance.

This situation is shown in Figure 4. All the rays of light at the start, with height \( h_1 \), end up at the end with height \( h_2 \), regardless of the slopes at the start and the end. This means that all the rays at the start that diverge from a single point are converged to a single point at the end.

This is what happens when an object forms an image, so the matrix with a zero in the top right corner describes an image-forming optical system. The value of \( A \) in the matrix gives the linear magnification (i.e. how much bigger or smaller the image is compared to the object). To summarize,

Suppose then that we have a thin lens set up so that an object produces an image. Let the distance from object to lens be \( u \), and the distance from lens to image be \( v \). (These are both positive distances, because we are measuring both from left to right in the direction of the light travel.) The distances \( u \) and \( v \) and the lens can all be described by matrices:

\[\bmatrix{h_2 \\ s_2}= \bmatrix{1 & v \\ 0 & 1}\bmatrix{1 & 0 \\ -F & 1}\bmatrix{1 & u \\ 0 & 1} \bmatrix{h_1 \\ s_1} \]

As always, we can multiply the matrices together to give an equation with a single matrix:

\[\bmatrix{h_2 \\ s_2} = \bmatrix{1-v F & u+v - u v F \\ -F & 1 - u F} \bmatrix{h_1 \\ s_1} \]

Although we are thinking of \( u \) and \( v \) as being object and image distances, this matrix is actually for system with any two gaps and a lens. To make sure that the gaps \( u \) and \( v \) are in fact the object and image distances, the top right corner of this matrix must be zero. That is,

\[ u+v - u v F = 0 \]

Deriving the Matrix for a Refracting Surface.

The only thing we haven't proven so far is Equation (4), which says how the slope of the ray changes as it passes through a refracting surface. We will do that here. The strategy for proving Equation (4) is to first build up a set of facts about the slopes and angles of a ray that passes through a spherical surface, and then assemble them into the equation.

The facts first. Figure 5(a) shows the slope of a ray passing through a spherical refracting surface (the same as Figure 3(c)). An extra horizontal line has been drawn where the ray intersects the surface, as this will be useful in working out some angles.

From Figure 5(b), the angle \( a \) between the incoming ray and the horizontal line is the same as the angle \( a \) inside the triangle marking the slope of the ray. The change in height is opposite the angle, and the distance travelled is adjacent to the angle in the triangle, so

From Figure 5(c), the angle \( b \) between the incoming ray and the horizontal line is the same as the angle \( b \) inside the triangle marking the slope of the ray. The change in height is opposite the angle, and the distance travelled is adjacent to the angle in the triangle, so

\[ \tan{(b)}= -s_2 \]

(The negative sign here is because the slope \( s_2 \) is negative, since the ray is travelling downwards, but we're treating all angles as positive.) Finally, to make the algebra coming up a little easier, we will use the small-angle approximation, which is \( \theta\approx \sin{(\theta)} \approx \tan{(\theta)} \) , which is mostly true when the angle \( \theta \) is small, and measured in radians. using this approximation, the above two facts can be summarized as:

\[ \begin{align} a &\approx s_1 \\ b &\approx -s_2 \end{align} \]

Our next set of facts uses Snell's Law, or the small-angle approximation to it. Figure 6(a) shows the angle of incidence \( \theta_{in} \) and the angle of refraction \( \theta_{out} \). The surface normal is the line passing from the centre of the sphere through the surface. Snell's Law says

\[ n_{in}\sin{(\theta_{in})} = n_{out}\sin{(\theta_{out})} \]

\[ n_{in}\theta_{in} \approx n_{out}\theta_{out} \]

We would like to connect \( \theta_{in} \) and \( \theta_{out} \) to the ray slopes \( a \) and \( b \). These have been added in Figure 6(b) . In Figure 6(c), an angle \( \alpha \) has been added, which is the angle between the normal line and the optic axis. From Figure 6(c), we can say:

\[ \begin{align} \theta_{in} &= \alpha+a \\ \theta_{out} &= \alpha-b \end{align} \]

The last fact we need is shown in Figure 6(d): \( \tan{\alpha} \approx h_1/r \), because \( r \) is approximately the length of the adjacent side of the triangle shown. Using the small angle approximation again ( \( \tan{\theta}\approx\theta \)), we have our last fact:

\[ \alpha \approx \dfrac{h_1}{r} \]

Actually, that's not quite true. The distance \( r \) is a negative distance by the sign convention, but \( h_1 \) is positive, and we want the angle \( \alpha \) to be positive, like all the other angles. So to get the signs to work out, we need to write:

\[ \alpha \approx \dfrac{h_1}{(-r)} \]

\[ \begin{align} a &\approx s_1 \\ b &\approx -s_2 \\ \theta_{in} &= \alpha+a \\ \theta_{out} &= \alpha -b \\ \alpha &\approx \dfrac{h_1}{(-r)} \\ \end{align} \]

\[ n_{in}\theta_{in} \approx n_{out}\theta_{out} \]

These facts involve \( s_1 \), \( s_2 \), and \( h_1 \), but they also involve other quantities. We want to put them together so we can calculate \( s_2 \) from \( s_1 \) and \( h_1 \), and nothing else. The following steps will do it:

Begin with:	\[u+v - u v F = 0\]
Divide both sides by \( u \):	\[1+\frac{v}{u} - v F = 0\]
Divide both sides by \( v \):	\[\frac{1}{v}+\frac{1}{u} - F = 0\]
Multiply by \( -1 \):	\[-\frac{1}{v} - \frac{1}{u} + F = 0\]
Add \( 1/v \) to both sides:	\[- \frac{1}{u} + F = \frac{1}{v} \]
Tidy up the \( 1/u \):	\[\frac{1}{-u} + F = \frac{1}{v} \]
Notice that \( 1/v \) is the same as \( V_{out} \):	\[\frac{1}{-u} + F = V_{out} \]
Notice that \( 1/(-u) \) is the same as \( V_{in} \):	\[V_{in} + F = V_{out} \]

Begin with	\[n_{in}\theta_{in} \approx n_{out}\theta_{out}\]
Substitute \( \theta_{in} = \alpha+a \)	\[n_{in}(\alpha+a) \approx n_{out}\theta_{out}\]
Substitute \( \theta_{out} = \alpha-b \)	\[n_{in}(\alpha+a) \approx n_{out}(\alpha-b)\]
Notice that we've got rid of \( \theta_{in} \) and \( \theta_{out} \)
Substitute \( a\approx s_1 \)	\[n_{in}(\alpha+s_1) \approx n_{out}(\alpha-b)\]
Substitute \( b\approx -s_2 \) , or equivalently, \( -b\approx s_2 \)	\[n_{in}(\alpha+s_1) \approx n_{out}(\alpha+s_2)\]
Now we've also got rid of \( a \) and \( b \)
Expand	\[n_{in}\alpha+n_{in}s_1 \approx n_{out}\alpha+n_{out}s_2\]
Gather terms in \( \alpha \)	\[(n_{in}-n_{out})\alpha+n_{in}s_1 \approx n_{out}s_2\]
Divide both sides by \( n_{out} \)	\[\dfrac{n_{in}-n_{out}}{n_{out}}\alpha+\dfrac{n_{in}}{n_{out}}s_1 \approx s_2\]
Substitute \( \alpha=h_1/r \)	\[\dfrac{n_{in}-n_{out}}{n_{out}}\dfrac{h_1}{(-r)}+\dfrac{n_{in}}{n_{out}}s_1 \approx s_2\]
Now we've got rid of \( \alpha \)
Note that \( (n_{in}-n_{out})/(-r)=-F_{surface} \)	\[\dfrac{-F_{surface}}{n_{out}}h_1+\dfrac{n_{in}}{n_{out}}s_1 \approx s_2\]