Subsections


1.2 The Lorentz Transformation

The Lorentz transformation describes how measurements of the position and time of events change from one observer to the next. It includes Lorentz-Fitzgerald contraction and time dilation as special cases.


1.2.1 The transformation formulae

This subsection explains how the position and time coordinates of events differ from one observer to the next.

Figure 1.2: Coordinate systems for the Lorentz transformation.
\begin{figure}
\centering
\setlength{\unitlength}{1pt}
\begin{picture}(...
...
\put(254,2){\makebox(0,0)[b]{$z_{\rm {B}}$}}
\end{picture}
\end{figure}

Consider two observers A and B that are in motion compared to each other with a relative speed $V$. To make things as simple as possible, it will be assumed that the relative motion is along the line through the two observers,

As the left side of figure 1.2 shows, observer A can believe herself to be at rest and see observer B moving away from her at speed $V$; similarly, observer B can believe himself to be at rest and see observer A moving away from him at speed $V$, as in the right side of the figure. The principle of relativity says that both views are equally valid; there is no physical measurement that can find a fundamental difference between the two observers. That also implies that both observers must agree on the same magnitude of the relative velocity $V$ between them.

It will further be assumed that both observers use coordinate systems with themselves at the origin to describe the locations and times of events. In addition, they both take their $x$ axes along the line of their relative motion. They also align their $y$ and $z$ axes. And they take their times to be zero at the instant that they meet.

In that case the Lorentz transformation says that the relation between positions and times of events as perceived by the two observers is, {D.4}:

\begin{displaymath}
\fbox{$\displaystyle
c t_{\rm{B}} = \frac{c t_{\rm{A}} -...
...rm{B}} = y_{\rm{A}}
\qquad
z_{\rm{B}} = z_{\rm{A}}
$} %
\end{displaymath} (1.6)

To get the transformation of the coordinates of B into those of A, just swap A and B and replace $V$ by $\vphantom0\raisebox{1.5pt}{$-$}$$V$. Indeed, if observer B is moving in the positive $x$-​direction with speed $V$ compared to observer A, then observer A is moving in the negative $x$-​direction with speed $V$ compared to observer B, as in figure 1.2. In the limit that the speed of light $c$ becomes infinite, the Lorentz transformation becomes the nonrelativistic “Galilean transformation” in which $t_{\rm {B}}$ is simply $t_{\rm {A}}$ and $x_{\rm {B}}$ $\vphantom0\raisebox{1.5pt}{$=$}$ $x_{\rm {A}}-Vt$, i.e. $x_{\rm {B}}$ equals $x_{\rm {A}}$ except for a shift of magnitude $Vt$.

The made assumptions are that A and B are at the origin of their coordinate system. And that their spatial coordinate systems are aligned. And that their relative motion is along the $x$ axes. And that they take the zero of time to be the instant that they meet. These simplifying assumptions may look very restrictive. But they are not. A different observer A' at rest relative to A can still use any coordinate system he wants, with any arbitrary orientation, origin, and zero of time. Since A' is at rest relative to A, the two fundamentally agree about space and time. So whatever coordinates and times A' uses for events are easily converted to those that A uses in the classical way, {A.3}. Similarly an observer B' at rest compared to B can still use any arbitrary coordinate system that she wants. The coordinates and times of events observed by the arbitrary observers A' and B' can then be related to each other in stages. First relate the coordinates of A' to those of A in the classical way. Next use the Lorentz transformation as given above to relate those to the coordinates of B. Then relate those in the classical way to the coordinates of B'. In this way, the coordinates and times of any two observers in relative motion to each other, using arbitrary coordinate systems and zeros of time, can be related. The simple Lorentz transformation above describes the nontrivial part of how the observations of different observers relate.

Time dilation is one special case of the Lorentz transformation. Assume that two events 1 and 2 happen at the same location $x_{\rm {A}},y_{\rm {A}},z_{\rm {A}}$ in system A. Then the first Lorentz transformation formula (1.6) gives

\begin{displaymath}
t_{2,\rm {B}}-t_{1,\rm {B}} = \frac{t_{2,\rm {A}}-t_{1,\rm {A}}}{\sqrt{1-(V/c)^2}}
\end{displaymath}

So observer B finds that the time difference between the events is larger. The same is of course true vice-versa, just use the inverse formulae.

Lorentz-Fitzgerald contraction is another special case of the Lorentz transformation. Assume that two stationary locations in system B are apart by a distance $x_{2,B}-x_{1,B}$ in the direction of relative motion. The second Lorentz transformation formula (1.6) then says how far these points are apart in system A at any given time $t_{\rm {A}}$:

\begin{displaymath}
x_{2,\rm {B}}-x_{1,\rm {B}} = \frac{x_{2,\rm {A}}-x_{1,\rm {A}}}{\sqrt{1-(V/c)^2}}
\end{displaymath}

Taking the square root to the other side gives the contraction.

As a result of the Lorentz transformation, measured velocities are related as

\begin{displaymath}
v_{x,B} = \frac{v_{x,A}-V}{1-(V/c^2)v_{x,A}} \quad
v_{y,...
...
v_{z,B}= \frac{v_{z,A}\sqrt{1-(V/c)^2}}{1-(V/c^2)v_{x,A}} %
\end{displaymath} (1.7)

Note that $v_x,v_y,v_z$ refer here to the perceived velocity components of some moving object; they are not components of the velocity difference $V$ between the coordinate systems.


1.2.2 Proper time and distance

In classical Newtonian mechanics, time is absolute. All observers agree about the difference in time $\Delta{t}$ between any two events:

\begin{displaymath}
\mbox{nonrelativistic: $\Delta{t}$\ is independent of the observer}
\end{displaymath}

The time difference is an invariant; it is the same for all observers.

All observers, regardless of how their spatial coordinate systems are oriented, also agree over the distance $\vert\Delta{\skew0\vec r}\vert$ between two events that occur at the same time:

\begin{displaymath}
\mbox{nonrelativistic:
$\vert\Delta{\skew0\vec r}\vert$\ is independent of the observer if $\Delta t = 0$}
\end{displaymath}

Here the distance between any two points 1 and 2 is found as

\begin{displaymath}
\vert\Delta{\skew0\vec r}\vert
\equiv \sqrt{(\Delta{\ske...
...\Delta {\skew0\vec r}\equiv {\skew0\vec r}_2-{\skew0\vec r}_1
\end{displaymath}

The fact that the distance may be expressed as a square root of the sum of the square components is known as the “Pythagorean theorem.”

Relativity messes all these things up big time. As time dilation shows, the time between events now depends on who is doing the observing. And as Lorentz-Fitzgerald contraction shows, distances now depend on who is doing the observing. For example, consider a moving ticking clock. Not only do different observers disagree over the distance $\vert\Delta{\skew0\vec r}\vert$ traveled between ticks, (as they would do nonrelativistically), but they also disagree about the time difference $\Delta{t}$ between ticks, (which they would not do nonrelativistically).

However, there is one thing that all observers can agree on. They do agree on how much time between ticks an observer moving along with the clock would measure. That time difference is called the “proper time” difference. (The word proper is a wrongly translated French propre, which here means own. So proper time really means the clock’s own time.) The time difference $\Delta{t}$ that an observer actually perceives is longer than the proper time difference $\Delta{t}_0$ due to the time dilation:

\begin{displaymath}
\Delta t = \frac{\Delta t_0}{\sqrt{1-(v/c)^2}}
\end{displaymath}

Here $v$ is the velocity of the clock as perceived by the observer.

To clean this up, take the square root to the other side and write $v$ as the distance $\vert\Delta{\skew0\vec r}\vert$ traveled by the clock divided by $\Delta{t}$. That gives the proper time difference $\Delta{t}_0$ between two events, like the ticks of a clock here, as

\begin{displaymath}
\fbox{$\displaystyle
\Delta t_0 = \Delta t
\sqrt{1 - \...
...\Delta x)^2+(\Delta y)^2+(\Delta z)^2}{(c\Delta t)^2}}
$} %
\end{displaymath} (1.8)

The numerator in the ratio is the square of the distance between the events.

Note however that the proper time difference is imaginary if the quantity under the square root is negative. For example, if an observer perceives two events as happening simultaneously at two different locations, then the proper time difference between those two events is imaginary. To avoid dealing with complex numbers, it is then more convenient to define the “proper distance” $\Delta{s}$ between the two events as

\begin{displaymath}
\fbox{$\displaystyle
\Delta s = \sqrt{(\Delta x)^2+(\Delta y)^2+(\Delta z)^2 - (c \Delta t)^2}
$} %
\end{displaymath} (1.9)

Note that this is the ordinary distance between the two events if they are at the same time, i.e. $\Delta{t}$ $\vphantom0\raisebox{1.5pt}{$=$}$ 0. The proper distance is different from the proper time difference by a factor $\sqrt{-c^2}$. Because of the minus sign under the square root, this factor is imaginary. As a result, $\Delta{s}$ is imaginary if $\Delta{t}_0$ is real and vice-versa.

All observers agree about the values of the proper time difference $\Delta{t}_0$ and the proper distance $\Delta{s}$ for any two given events.

Physicists define the square of the proper distance to be the “space-time interval” $I$. The term is obviously confusing, as a dictionary defines an interval as a difference in time or space, not as the square of such a difference. To add more confusion, some physicists change sign in the definition, and others divide by the square speed of light. And some rightly define the interval to be $\Delta{s}$ without the square, unfortunately causing still more confusion.

If the interval, defined as $(\Delta{s})^2$, is positive, then the proper distance $\Delta{s}$ between the two events is real. Such an interval is called “space-like.” On the other hand, if the interval is negative, then the proper distance is imaginary. In that case it is the proper time difference between the events that is real. Such an interval is called “time-like.”

If the proper time difference is real, the earlier event can affect, or even cause, the later event. If the proper time difference is imaginary however, then the effects of either event cannot reach the other event even if traveling at the speed of light. It follows that the sign of the interval is directly related to “causality,” to what can cause what. Since all observers agree about the value of the proper time difference, they all agree about what can cause what.

For small differences in time and location, all differences $\Delta$ above become differentials ${\rm d}$.


1.2.3 Subluminal and superluminal effects

Suppose you stepped off the curb at the wrong moment and are now in the hospital. The pain is agonizing, so you contact one of the telecommunications microchips buzzing in the sky overhead. These chips are capable of sending out a “superluminal” beam; a beam that propagates with a speed greater than the speed of light. The factor with which the speed of the beam exceeds the speed of light is called the “warp factor” $w$. A beam with a high warp factor is great for rapid communication with space ships at distant locations in the solar system and beyond. A beam with a warp factor of 10 allows ten times quicker communication than those old-fashioned radio waves that propagate at the speed of light. And these chips have other very helpful uses, like for your predicament.

You select a microchip that is moving at high speed away from the location where the accident occurred. The microchip sends out its superluminal beam. In its coordinate system, the beam reaches the location of the accident at a time $t_m$, at which time the beam has traveled a distance $x_m$ equal to $wct_m$. According to the Lorentz transformation (1.6), in the coordinate system fixed to the earth, the beam reaches the location of the accident at a position and time equal to

\begin{displaymath}
t = \frac{1 - (wV/c)}{\sqrt{1-(V/c)^2}} t_m
\qquad
x = \frac{wc - V}{\sqrt{1-(V/c)^2}} t_m
\end{displaymath}

Because of the high speed $V$ of the microchip and the additional warp factor, the time that the beam reaches the location of the accident is negative; the beam has entered into the past. Not far enough in the past however, so another microchip picks up the message and beams it back, achieving another reduction in time. After a few more bounces, the message is beamed to your cell phone. It reaches you just when you are about to step off the curb. The message will warn you of the approaching car, but it is not really needed. The mere distraction of your buzzing cell phone causes you to pause for just a second, and the car rushes past safely. So the accident never happens; you are no longer in agony in the hospital, but on your Bermuda vacation as planned. And these microchips are great for investing in the stock market too.

Sounds good, does it not? Unfortunately, there is a hitch. Physicists refuse to work on the underlying physics to enable this technology. They claim it will not be workable, since it will force them to think up answers to tough questions like: “if you did not end up in the hospital after all, then why did you still send the message?” Until they change their mind, our reality will be that observable matter or radiation cannot propagate faster than the speed of light.

Therefore, manipulating the past is not possible. An event can only affect later events. Even more specifically, an event can only affect a later event if the location of that later event is sufficiently close that it can be reached with a speed of no more than the speed of light. A look at the definition of the proper time interval then shows that this means that the proper time interval between the events must be real, or time-like. And while different observers may disagree about the location and time of the events, they all agree about the proper time interval. So all observers, regardless of their velocity, agree on whether an event can affect another event. And they also all agree on which event is the earlier one, because before the time interval $\Delta{t}$ could change sign for some observer speeds, it would have to pass through zero. It cannot, because it must be the same for all observers. Relativity maintains a single reality, even though observers may disagree about precise times and locations.

A more visual interpretation of those concepts can also be given. Imagine a hypothetical spherical wave front spreading out from the earlier event with the speed of light. Then a later event can be affected by the earlier event only if that later event is within or on that spherical wave front. If you restrict attention to events in the $x,y$ plane, you can use the $z$-​coordinate to plot the values of time. In such a plot, the expanding circular wave front becomes a cone, called the “light-cone.” Only events within this light cone can be affected. Similarly in three dimensions and time, an event can only be affected if it is within the light cone in four-di­men­sion­al space-time. But of course, a cone in four dimensions is hard to visualize geometrically.


1.2.4 Four-vectors

The Lorentz transformation mixes up the space and time coordinates badly. In relativity, it is therefore best to think of the spatial coordinates and time as coordinates in a four-di­men­sion­al “space-time.”

Since you would surely like all components in a vector to have the same units, you probably want to multiply time by the speed of light, because $ct$ has units of length. So the four-di­men­sion­al “position vector” can logically be defined to be $(ct,x,y,z)$; $ct$ is the zeroth component of the vector where $x$, $y$, and $z$ are components number 1, 2, and 3 as usual. This four-di­men­sion­al position vector will be indicated by

\begin{displaymath}
\kern-1pt{\buildrel\raisebox{-1.5pt}[0pt][0pt]
{\hbox{\hsp...
...\left(\begin{array}{c}r_0\\ r_1\\ r_2\\ r_3\end{array}\right)
\end{displaymath} (1.10)

The hook on the arrow indicates that time has been hooked into it.

How about the important dot product between vectors? In three dimensional space this produces such important quantities as the length of vectors and the angle between vectors. Moreover, the dot product between two vectors is the same regardless of the orientation of the coordinate system in which it is viewed.

It turns out that the proper way to define the dot product for four-vectors reverses the sign of the contribution of the time components:

\begin{displaymath}
\kern-1pt{\buildrel\raisebox{-1.5pt}[0pt][0pt]
{\hbox{\hsp...
...\kern-1.3pt_2 \equiv -c^2t_1t_2 + x_1 x_2 + y_1 y_2 + z_1 z_2
\end{displaymath} (1.11)

It can be checked by simple substitution that the Lorentz transformation (1.6) preserves this dot product. In more expensive words, this inner product is invariant under the Lorentz transformation. Different observers may disagree about the individual components of four-vectors, but not about their dot products.

The difference between the four-vector positions of two events has a proper length equal to the proper distance between the events

\begin{displaymath}
\Delta s = \sqrt{(\Delta\kern-1pt{\buildrel\raisebox{-1.5p...
...riptstyle\hookrightarrow$\hspace{0pt}}}\over r}
\kern-1.3pt)}
\end{displaymath} (1.12)

So, the fact that all observers agree about proper distance can be seen as a consequence of the fact that they all agree about dot products.

It should be pointed out that many physicist reverse the sign of the spatial components instead of the time in their inner product. Obviously, this is completely inconsistent with the nonrelativistic analysis, which is often still a valid approximation. And this inconsistent sign convention seems to be becoming the dominant one too. Count on physicists to argue for more than a century about a sign convention and end up getting it all wrong in the end. One very notable exception is [48]; you can see why he would end up with a Nobel Prize in physics.

Some physicists also like to point out that if time is replaced by ${{\rm i}}t$, then the above dot product becomes the normal one. The Lorentz transformation can then be considered as a mere rotation of the coordinate system in this four-di­men­sion­al space-time. Gee, thanks physicists! This will be very helpful when examining what happens in universes in which time is imaginary, unlike our own universe, in which it is real. The good thing you can say about these physicists is that they define the dot product the right way: the ${\rm i}^2$ takes care of the minus sign on the zeroth component.

Returning to our own universe, the proper length of a four-vector can be imaginary, and a zero proper length does not imply that the four-vector is zero as it does in normal three-di­men­sion­al space. In fact, a zero proper length merely indicates that it requires motion at the speed of light to go from the start point of the four-vector to its end point.


1.2.5 Index notation

The notations used in the previous subsection are not standard. In literature, you will almost invariably find the four-vectors and the Lorentz transform written out in index notation. Fortunately, it does not require courses in linear algebra and tensor algebra to make some basic sense out of it.

First of all, in the nonrelativistic case position vectors are normally indicated by ${\skew0\vec r}$. The three components of this vector are commonly indicated by $x$, $y$, and $z$, or using indices, by $r_1$, $r_2$, and $r_3$. To handle space-time, physicists do not simply add a zeroth component $r_0$ equal to $ct$. That would make the meaning too easy to guess. Instead physicists like to indicate the components of four-vectors by $x^0,x^1,x^2,x^3$. It is harder to guess correctly what that means, especially since the letter $x$ is already greatly over-used as it is. A generic component may be denoted as $x^\mu$. An entire four-vector can then be indicated by $\{x^\mu\}$ where the brackets indicate the set of all four components. Needless to say, most physicists forget about the brackets, because using a component where a vector is required can have hilarious consequences.

In short,

\begin{displaymath}
\left(\begin{array}{c}ct\\ x\\ y\\ z\end{array}\right) \eq...
...\left(\begin{array}{c}x^0\\ x^1\\ x^2\\ x^3\end{array}\right)
\end{displaymath}

shows this book’s common sense notation to the left and the index notation commonly used in physics to the right.

Recall now the Lorentz transformation (1.6). It described the relationship between the positions and times of events as observed by two different observers A and B. These observers were in motion compared to each other with a relative speed $V$. Physicists like to put the coefficients of such a Lorentz transformation into a table, as follows:

\begin{displaymath}
\Lambda
\equiv
\left(
\begin{array}{cccc}
\lambda^...
...
0 & 0 & 1 & 0\\
0 & 0 & 0 & 1
\end{array}
\right) %
\end{displaymath} (1.13)

where

\begin{displaymath}
\gamma\equiv \frac{1}{\sqrt{1-(V/c)^2}} \qquad
\beta \equiv \frac{V}{c} \qquad
\gamma^2 - \beta^2\gamma^2 = 1
\end{displaymath}

A table like $\Lambda$ is called a matrix or second-order tensor. The individual entries in the matrix are indicated by $\lambda^\mu{}_\nu$ where $\mu$ is the number of the row and $\nu$ the number of the column. Note also the convention of showing the first index as a superscript. That is a tensor algebra convention. In linear algebra, you normally make all indices subscripts.

(Different sources use different letters for the Lorentz matrix and its entries. Some common examples are $\Lambda^\mu{}_\nu$ and $a^\mu{}_\nu$. The name Lorentz starts with L and the Greek letter for L is $\Lambda$. And Lorentz was Dutch, which makes him a European just like the Greek. Therefore $\Lambda$ is a good choice for the name of the Lorentz matrix, and $\Lambda$ or lower case $\lambda$ for the entries of the matrix. An $L$ for the matrix and $l$ for its entries would be just too easy to guess. Also, $\lambda$ is the standard name for the eigenvalues of matrices and $\Lambda$ for the matrix of those eigenvalues. So there is some potential for hilarious confusion there. An a for the Lorentz matrix is good too: the name Lorentz consists of roman letters and a is the first letter of the roman alphabet.)

The values of the entries $\lambda^\mu{}_\nu$ may vary. The ones shown in the final matrix in (1.13) above apply only in the simplest nontrivial case. In particular, they require that the relative motion of the observers is aligned with the $x$ axes as in figure 1.2. If that is not the case, the values become a lot more messy.

In terms of the above notations, the Lorentz transformation (1.6) can be written as

\begin{displaymath}
x^\mu_{\rm B} = \sum_{\nu=0}^3 \lambda^\mu{}_\nu x^\nu_{\rm A}
\qquad\mbox{for all four values $\mu=0,1,2,3$}
\end{displaymath}

That is obviously a lot more concise than (1.6). Some further shorthand notation is now used. In particular, the “Einstein summation convention” is to leave away the summation symbol $\sum$. So, you will likely find the Lorentz transformation written more concisely as

\begin{displaymath}
x^\mu_{\rm B} = \lambda^\mu{}_\nu x^\nu_{\rm A}
\end{displaymath}

Whenever an index like $\nu$ appears twice in an expression, summation over that index is to be understood. In other words, you yourself are supposed to mentally add back the missing summation over all four values of $\nu$ to the expression above. Also, if an index appears only once, like $\mu$ above, it is to be understood that the equation is valid for all four values of that index.

It should be noted that mathematicians call the matrix $\Lambda$ the transformation matrix from B to A, even though it produces the coordinates of B from those of A. However, after you have read some more in this book, insane notation will no longer surprise you. Just that in this case it comes from mathematicians.

In understanding tensor algebra, it is essential to recognize one thing. It is that a quantity like a position differential transforms different from a quantity like a gradient:

\begin{displaymath}
{\rm d}x^\mu_{\rm B} =
\frac{\partial x^\mu_{\rm B}}{\pa...
...m A}}
\frac{\partial x^\nu_{\rm A}}{\partial x^\mu_{\rm B}}
\end{displaymath}

In the first expression, the partial derivatives are by definition the entries of the Lorentz matrix $\Lambda$,

\begin{displaymath}
\frac{\partial x^\mu_{\rm B}}{\partial x^\nu_{\rm A}} \equiv \lambda^\mu{}_\nu
\end{displaymath}

In the second expression, the corresponding partial derivatives will be indicated by

\begin{displaymath}
\frac{\partial x^\mu_{\rm A}}{\partial x^\nu_{\rm B}} \equiv
(\lambda^{-1}){}^\mu{}_\nu
\end{displaymath}

The entries $(\lambda^{-1}){}^\mu{}_\nu$ form the so-called inverse Lorentz matrix $\Lambda^{-1}$. If the Lorentz transformation describes a transformation from an observer A to an observer B, then the inverse transformation describes the transformation from B to A.

Assuming that the Lorentz transformation matrix is the simple one to the right in (1.13), the inverse matrix $\Lambda^{-1}$ looks exactly the same as $\Lambda$ except that $-\beta$ gets replaced by $+\beta$. The reason is fairly simple. The quantity $\beta$ is the velocity between the observers scaled with the speed of light. And the relative velocity of B seen by A is the opposite of the one of A seen by B, if their coordinate systems are aligned.

Consider now the reason why tensor analysis raises some indices. Physicists use a superscript index on a vector if it transforms using the normal Lorentz transformation matrix $\Lambda$. Such a vector is called a “contravariant” vector for reasons not worth describing. As an example, a position differential is a contravariant vector. So the components of a position differential are indicated by ${\rm d}{x}^\mu$ with a superscript index.

If a vector transforms using the inverse matrix $\Lambda^{-1}$, it is called a “covariant” vector. In that case subscript indices are used. For example, the gradient of a function $f$ is a covariant vector. So a component $\partial{f}$$\raisebox{.5pt}{$/$}$$\partial{x}^\mu$ is commonly indicated by $\partial_{\mu}f$.

Now suppose that you flip over the sign of the zeroth, time, component of a four-vector like a position or a position differential. It turns out that the resulting four-vector then transforms using the inverse Lorentz transformation matrix. That means that it has become a covariant vector. (You can easily verify this in case of the simple Lorentz transform above.) Therefore lower indices are used for the flipped-over vector:

\begin{displaymath}
\{x_\mu\} \equiv (-ct,x,y,z) \equiv (x_0,x_1,x_2,x_3)
\end{displaymath}

The convention of showing covariant vectors as rows instead of columns comes from linear algebra. Tensor notation by itself does not have such a graphical interpretation.

Keep one important thing in mind though. If you flip the sign of a component of a vector, you get a fundamentally different vector. The vector $\{x_\mu\}$ is not just a somewhat different way to write the position four-vector $\{x^\mu\}$ of the space-time point that you are interested in. Now normally if you define some new vector that is different from a vector that you are already using, you change the name. For example, you might change the name from $x$ to $y$ or to $x^{\rm {L}}$ say. Tensor algebra does not do that. Therefore the golden rule is:

The names of tensors are only correct if the indices are at the right height.
If you remember that, tensor algebra becomes a lot less confusing. The expression $\{x^\mu\}$ is only your space-time location named $x$ if the index is a superscript as shown. The four-vector $\{x_\mu\}$ is simply a different animal. How do you know what is the right height? You just have to remember, you know.

Now consider two different contravariant four-vectors, call them $\{x^\mu\}$ and $\{y^\mu\}$. The dot product between these two four-vectors can be written as

\begin{displaymath}
x_\mu y^\mu
\end{displaymath}

To see why, recall that since the index $\mu$ appears twice, summation over that index is understood. Also, the lowered index of $x_\mu$ indicates that the sign of the zeroth component is flipped over. That produces the required minus sign on the product of the time components in the dot product.

Note also from the above examples that summation indices appear once as a subscript and once as a superscript. That is characteristic of tensor algebra.

Addendum {A.4} gives a more extensive description of the most important tensor algebra formulae for those with a good knowledge of linear algebra.


1.2.6 Group property

The derivation of the Lorentz transformation as given earlier examined two observers A and B. But now assume that a third observer C is in motion compared to observer B. The coordinates of an event as perceived by observer C may then be computed from those of B using the corresponding Lorentz transformation, and the coordinates of B may in turn be computed from those of A using that Lorentz transformation. Schematically,

\begin{displaymath}
\kern-1pt{\buildrel\raisebox{-1.5pt}[0pt][0pt]
{\hbox{\hsp...
...e\hookrightarrow$\hspace{0pt}}}\over r}
\kern-1.3pt_{\rm {A}}
\end{displaymath}

But if everything is OK, that means that the Lorentz transformations from A to B followed by the Lorentz transformation from B to C must be the same as the Lorentz transformation from A directly to C. In other words, the combination of two Lorentz transformations must be another Lorentz transformation.

Mathematicians say that Lorentz transformations must form a group. It is much like rotations of a coordinate system in three spatial dimensions: a rotation followed by another one is equivalent to a single rotation over some combined angle. In fact, such spatial rotations are Lorentz transformations; just between coordinate systems that do not move compared to each other.

Using a lot of linear algebra, it may be verified that indeed the Lorentz transformations form a group, {D.5}.