Sub­sec­tions


1.2 The Lorentz Trans­for­ma­tion

The Lorentz trans­for­ma­tion de­scribes how mea­sure­ments of the po­si­tion and time of events change from one ob­server to the next. It in­cludes Lorentz-Fitzger­ald con­trac­tion and time di­la­tion as spe­cial cases.


1.2.1 The trans­for­ma­tion for­mu­lae

This sub­sec­tion ex­plains how the po­si­tion and time co­or­di­nates of events dif­fer from one ob­server to the next.

Fig­ure 1.2: Co­or­di­nate sys­tems for the Lorentz trans­for­ma­tion.
\begin{figure}\centering
\setlength{\unitlength}{1pt}
\begin{picture}(405,11...
...B}}$}}
\put(254,2){\makebox(0,0)[b]{$z_{\rm {B}}$}}
\end{picture}
\end{figure}

Con­sider two ob­servers A and B that are in mo­tion com­pared to each other with a rel­a­tive speed $V$. To make things as sim­ple as pos­si­ble, it will be as­sumed that the rel­a­tive mo­tion is along the line through the two ob­servers,

As the left side of fig­ure 1.2 shows, ob­server A can be­lieve her­self to be at rest and see ob­server B mov­ing away from her at speed $V$; sim­i­larly, ob­server B can be­lieve him­self to be at rest and see ob­server A mov­ing away from him at speed $V$, as in the right side of the fig­ure. The prin­ci­ple of rel­a­tiv­ity says that both views are equally valid; there is no phys­i­cal mea­sure­ment that can find a fun­da­men­tal dif­fer­ence be­tween the two ob­servers. That im­plies that both ob­servers must agree on the same mag­ni­tude of the rel­a­tive ve­loc­ity $V$ be­tween them. And it im­plies that they need to agree on the speed $c$ that light moves.

It will fur­ther be as­sumed that both ob­servers use co­or­di­nate sys­tems with them­selves at the ori­gin to de­scribe the lo­ca­tions and times of events. In ad­di­tion, they both take their $x$ axes along the line of their rel­a­tive mo­tion. They also align their $y$ and $z$ axes. And they both de­fine time to be zero at the in­stant that they meet.

In that case the Lorentz trans­for­ma­tion says that the re­la­tion be­tween po­si­tions and times of events as per­ceived by the two ob­servers is, {D.4}:

\begin{displaymath}
\fbox{$\displaystyle
c t_{\rm{B}} = \frac{c t_{\rm{A}} - (...
...y_{\rm{B}} = y_{\rm{A}}
\qquad
z_{\rm{B}} = z_{\rm{A}}
$} %
\end{displaymath} (1.6)

To get the trans­for­ma­tion of the co­or­di­nates of B into those of A, just swap A and B and re­place $V$ by $\vphantom{0}\raisebox{1.5pt}{$-$}$$V$. In­deed, if ob­server B is mov­ing in the pos­i­tive $x$-​di­rec­tion with speed $V$ com­pared to ob­server A, then ob­server A is mov­ing in the neg­a­tive $x$-​di­rec­tion with speed $V$ com­pared to ob­server B, as in fig­ure 1.2. In the limit that the speed of light $c$ be­comes in­fi­nite, the Lorentz trans­for­ma­tion be­comes the non­rel­a­tivis­tic “Galilean trans­for­ma­tion” in which $t_{\rm {B}}$ is sim­ply $t_{\rm {A}}$ and $x_{\rm {B}}$ $\vphantom0\raisebox{1.5pt}{$=$}$ $x_{\rm {A}}-Vt$, i.e. $x_{\rm {B}}$ equals $x_{\rm {A}}$ ex­cept for a shift of mag­ni­tude $Vt$.

The made as­sump­tions are that A and B are at the ori­gin of their co­or­di­nate sys­tem. And that their spa­tial co­or­di­nate sys­tems are aligned. And that their rel­a­tive mo­tion is along the $x$ axes. And that they take the zero of time to be the in­stant that they meet. These sim­pli­fy­ing as­sump­tions may look very re­stric­tive. But they are not. A dif­fer­ent ob­server A' at rest rel­a­tive to A can still use any co­or­di­nate sys­tem he wants, with any ar­bi­trary ori­en­ta­tion, ori­gin, and zero of time. Since A' is at rest rel­a­tive to A, the two fun­da­men­tally agree about space and time. So what­ever co­or­di­nates and times A' uses for events are eas­ily con­verted to those that A uses in the clas­si­cal way, {A.3}. Sim­i­larly an ob­server B' at rest com­pared to B can still use any ar­bi­trary co­or­di­nate sys­tem that she wants. The co­or­di­nates and times of events ob­served by the ar­bi­trary ob­servers A' and B' can then be re­lated to each other in stages. First re­late the co­or­di­nates of A' to those of A in the clas­si­cal way. Next use the Lorentz trans­for­ma­tion as given above to re­late those to the co­or­di­nates of B. Then re­late those in the clas­si­cal way to the co­or­di­nates of B'. In this way, the co­or­di­nates and times of any two ob­servers in rel­a­tive mo­tion to each other, us­ing ar­bi­trary co­or­di­nate sys­tems and ze­ros of time, can be re­lated. The sim­ple Lorentz trans­for­ma­tion above de­scribes the non­triv­ial part of how the ob­ser­va­tions of dif­fer­ent ob­servers re­late.

Time di­la­tion is one spe­cial case of the Lorentz trans­for­ma­tion. As­sume that two events 1 and 2 hap­pen at the same lo­ca­tion $x_{\rm {A}},y_{\rm {A}},z_{\rm {A}}$ in sys­tem A. Then the first Lorentz trans­for­ma­tion for­mula (1.6) gives

\begin{displaymath}
t_{2,\rm {B}}-t_{1,\rm {B}} = \frac{t_{2,\rm {A}}-t_{1,\rm {A}}}{\sqrt{1-(V/c)^2}}
\end{displaymath}

So ob­server B finds that the time dif­fer­ence be­tween the events is larger. The same is of course true vice-versa, just use the in­verse for­mu­lae.

Lorentz-Fitzger­ald con­trac­tion is an­other spe­cial case of the Lorentz trans­for­ma­tion. As­sume that two sta­tion­ary lo­ca­tions in sys­tem B are apart by a dis­tance $x_{2,B}-x_{1,B}$ in the di­rec­tion of rel­a­tive mo­tion. The sec­ond Lorentz trans­for­ma­tion for­mula (1.6) then says how far these points are apart in sys­tem A at any given time $t_{\rm {A}}$:

\begin{displaymath}
x_{2,\rm {B}}-x_{1,\rm {B}} = \frac{x_{2,\rm {A}}-x_{1,\rm {A}}}{\sqrt{1-(V/c)^2}}
\end{displaymath}

Tak­ing the square root to the other side gives the con­trac­tion.

As a re­sult of the Lorentz trans­for­ma­tion, mea­sured ve­loc­i­ties are re­lated as

\begin{displaymath}
v_{x,B} = \frac{v_{x,A}-V}{1-(V/c^2)v_{x,A}} \quad
v_{y,B}...
...d
v_{z,B}= \frac{v_{z,A}\sqrt{1-(V/c)^2}}{1-(V/c^2)v_{x,A}} %
\end{displaymath} (1.7)

Note that $v_x,v_y,v_z$ re­fer here to the per­ceived ve­loc­ity com­po­nents of some mov­ing ob­ject; they are not com­po­nents of the ve­loc­ity dif­fer­ence $V$ be­tween the co­or­di­nate sys­tems.


1.2.2 Proper time and dis­tance

In clas­si­cal New­ton­ian me­chan­ics, time is ab­solute. All ob­servers agree about the dif­fer­ence in time $\Delta{t}$ be­tween any two events:

\begin{displaymath}
\mbox{nonrelativistic: $\Delta{t}$\ is independent of the observer}
\end{displaymath}

The time dif­fer­ence is an in­vari­ant; it is the same for all ob­servers.

All ob­servers, re­gard­less of how their spa­tial co­or­di­nate sys­tems are ori­ented, also agree over the dis­tance $\vert\Delta{\skew0\vec r}\vert$ be­tween two events that oc­cur at the same time:

\begin{displaymath}
\mbox{nonrelativistic:
$\vert\Delta{\skew0\vec r}\vert$\ is independent of the observer if $\Delta t = 0$}
\end{displaymath}

Here the dis­tance be­tween any two points 1 and 2 is found as

\begin{displaymath}
\vert\Delta{\skew0\vec r}\vert
\equiv \sqrt{(\Delta{\skew0...
... \Delta {\skew0\vec r}\equiv {\skew0\vec r}_2-{\skew0\vec r}_1
\end{displaymath}

The fact that the dis­tance may be ex­pressed as a square root of the sum of the square com­po­nents is known as the “Pythagorean the­o­rem.”

Rel­a­tiv­ity messes all these things up big time. As time di­la­tion shows, the time be­tween events now de­pends on who is do­ing the ob­serv­ing. And as Lorentz-Fitzger­ald con­trac­tion shows, dis­tances now de­pend on who is do­ing the ob­serv­ing. For ex­am­ple, con­sider a mov­ing tick­ing clock. Not only do dif­fer­ent ob­servers dis­agree over the dis­tance $\vert\Delta{\skew0\vec r}\vert$ trav­eled be­tween ticks, (as they would do non­rel­a­tivis­ti­cally), but they also dis­agree about the time dif­fer­ence $\Delta{t}$ be­tween ticks, (which they would not do non­rel­a­tivis­ti­cally).

How­ever, there is one thing that all ob­servers can agree on. They do agree on how much time be­tween ticks an ob­server mov­ing along with the clock would mea­sure. That time dif­fer­ence is called the “proper time” dif­fer­ence. (The word proper is a wrongly trans­lated French pro­pre, which here means own. So proper time re­ally means the clock’s own time.) The time dif­fer­ence $\Delta{t}$ that an ob­server ac­tu­ally per­ceives is longer than the proper time dif­fer­ence $\Delta{t}_0$ due to the time di­la­tion:

\begin{displaymath}
\Delta t = \frac{\Delta t_0}{\sqrt{1-(v/c)^2}}
\end{displaymath}

Here $v$ is the ve­loc­ity of the clock as per­ceived by the ob­server.

To clean this up, take the square root to the other side and write $v$ as the dis­tance $\vert\Delta{\skew0\vec r}\vert$ trav­eled by the clock di­vided by $\Delta{t}$. That gives the proper time dif­fer­ence $\Delta{t}_0$ be­tween two events, like the ticks of a clock here, as

\begin{displaymath}
\fbox{$\displaystyle
\Delta t_0 = \Delta t
\sqrt{1 - \frac{(\Delta x)^2+(\Delta y)^2+(\Delta z)^2}{(c\Delta t)^2}}
$} %
\end{displaymath} (1.8)

The nu­mer­a­tor in the ra­tio is the square of the dis­tance be­tween the events.

Note how­ever that the proper time dif­fer­ence is imag­i­nary if the quan­tity un­der the square root is neg­a­tive. For ex­am­ple, if an ob­server per­ceives two events as hap­pen­ing si­mul­ta­ne­ously at two dif­fer­ent lo­ca­tions, then the proper time dif­fer­ence be­tween those two events is imag­i­nary. To avoid deal­ing with com­plex num­bers, it is then more con­ve­nient to de­fine the “proper dis­tance” $\Delta{s}$ be­tween the two events as

\begin{displaymath}
\fbox{$\displaystyle
\Delta s = \sqrt{(\Delta x)^2+(\Delta y)^2+(\Delta z)^2 - (c \Delta t)^2}
$} %
\end{displaymath} (1.9)

Note that this is the or­di­nary dis­tance be­tween the two events if they are at the same time, i.e. $\Delta{t}$ $\vphantom0\raisebox{1.5pt}{$=$}$ 0. The proper dis­tance is dif­fer­ent from the proper time dif­fer­ence by a fac­tor $\sqrt{-c^2}$. Be­cause of the mi­nus sign un­der the square root, this fac­tor is imag­i­nary. As a re­sult, $\Delta{s}$ is imag­i­nary if $\Delta{t}_0$ is real and vice-versa.

All ob­servers agree about the val­ues of the proper time dif­fer­ence $\Delta{t}_0$ and the proper dis­tance $\Delta{s}$ for any two given events.

Physi­cists de­fine the square of the proper dis­tance to be the “space-time in­ter­val” $I$. The term is ob­vi­ously con­fus­ing, as a dic­tio­nary de­fines an in­ter­val as a dif­fer­ence in time or space, not as the square of such a dif­fer­ence. To add more con­fu­sion, some physi­cists change sign in the de­f­i­n­i­tion, and oth­ers di­vide by the square speed of light. And some rightly de­fine the in­ter­val to be $\Delta{s}$ with­out the square, un­for­tu­nately caus­ing still more con­fu­sion.

If the in­ter­val, de­fined as $(\Delta{s})^2$, is pos­i­tive, then the proper dis­tance $\Delta{s}$ be­tween the two events is real. Such an in­ter­val is called “space-like.” On the other hand, if the in­ter­val is neg­a­tive, then the proper dis­tance is imag­i­nary. In that case it is the proper time dif­fer­ence be­tween the events that is real. Such an in­ter­val is called “time-like.”

If the proper time dif­fer­ence is real, the ear­lier event can af­fect, or even cause, the later event. If the proper time dif­fer­ence is imag­i­nary how­ever, then the ef­fects of ei­ther event can­not reach the other event even if trav­el­ing at the speed of light. It fol­lows that the sign of the in­ter­val is di­rectly re­lated to “causal­ity,” to what can cause what. Since all ob­servers agree about the value of the proper time dif­fer­ence, they all agree about what can cause what.

For small dif­fer­ences in time and lo­ca­tion, all dif­fer­ences $\Delta$ above be­come dif­fer­en­tials ${\rm d}$.


1.2.3 Sub­lu­mi­nal and su­per­lu­mi­nal ef­fects

Sup­pose you stepped off the curb at the wrong mo­ment and are now in the hos­pi­tal. The pain is ag­o­niz­ing, so you con­tact one of the telecom­mu­ni­ca­tions mi­crochips buzzing in the sky over­head. These chips are ca­pa­ble of send­ing out a “su­per­lu­mi­nal” beam; a beam that prop­a­gates with a speed greater than the speed of light. The fac­tor with which the speed of the beam ex­ceeds the speed of light is called the “warp fac­tor” $w$. A beam with a high warp fac­tor is great for rapid com­mu­ni­ca­tion with space ships at dis­tant lo­ca­tions in the so­lar sys­tem and be­yond. A beam with a warp fac­tor of 10 al­lows ten times quicker com­mu­ni­ca­tion than those old-fash­ioned ra­dio waves that prop­a­gate at the speed of light. And these chips have other very help­ful uses, like for your predica­ment.

You se­lect a mi­crochip that is mov­ing at high speed away from the lo­ca­tion where the ac­ci­dent oc­curred. The mi­crochip sends out its su­per­lu­mi­nal beam. In its co­or­di­nate sys­tem, the beam reaches the lo­ca­tion of the ac­ci­dent at a time $t_m$, at which time the beam has trav­eled a dis­tance $x_m$ equal to $wct_m$. Ac­cord­ing to the Lorentz trans­for­ma­tion (1.6), in the co­or­di­nate sys­tem fixed to the earth, the beam reaches the lo­ca­tion of the ac­ci­dent at a po­si­tion and time equal to

\begin{displaymath}
t = \frac{1 - (wV/c)}{\sqrt{1-(V/c)^2}} t_m
\qquad
x = \frac{wc - V}{\sqrt{1-(V/c)^2}} t_m
\end{displaymath}

Be­cause of the high speed $V$ of the mi­crochip and the ad­di­tional warp fac­tor, the time that the beam reaches the lo­ca­tion of the ac­ci­dent is neg­a­tive; the beam has en­tered into the past. Not far enough in the past how­ever, so an­other mi­crochip picks up the mes­sage and beams it back, achiev­ing an­other re­duc­tion in time. Af­ter a few more bounces, the mes­sage is beamed to your cell phone. It reaches you just when you are about to step off the curb. The mes­sage will warn you of the ap­proach­ing car, but it is not re­ally needed. The mere dis­trac­tion of your buzzing cell phone causes you to pause for just a sec­ond, and the car rushes past safely. So the ac­ci­dent never hap­pens; you are no longer in agony in the hos­pi­tal, but on your Bermuda va­ca­tion as planned. And these mi­crochips are great for in­vest­ing in the stock mar­ket too.

Sounds good, does it not? Un­for­tu­nately, there is a hitch. Physi­cists refuse to work on the un­der­ly­ing physics to en­able this tech­nol­ogy. They claim it will not be work­able, since it will force them to think up an­swers to tough ques­tions like: “if you did not end up in the hos­pi­tal af­ter all, then why did you still send the mes­sage?” Un­til they change their mind, our re­al­ity will be that ob­serv­able mat­ter or ra­di­a­tion can­not prop­a­gate faster than the speed of light.

There­fore, ma­nip­u­lat­ing the past is not pos­si­ble. An event can only af­fect later events. Even more specif­i­cally, an event can only af­fect a later event if the lo­ca­tion of that later event is suf­fi­ciently close that it can be reached with a speed of no more than the speed of light. A look at the de­f­i­n­i­tion of the proper time in­ter­val then shows that this means that the proper time in­ter­val be­tween the events must be real, or time-like. And while dif­fer­ent ob­servers may dis­agree about the lo­ca­tion and time of the events, they all agree about the proper time in­ter­val. So all ob­servers, re­gard­less of their ve­loc­ity, agree on whether an event can af­fect an­other event. And they also all agree on which event is the ear­lier one, be­cause be­fore the time in­ter­val $\Delta{t}$ could change sign for some ob­server speeds, it would have to pass through zero. It can­not, be­cause it must be the same for all ob­servers. Rel­a­tiv­ity main­tains a sin­gle re­al­ity, even though ob­servers may dis­agree about pre­cise times and lo­ca­tions.

A more vi­sual in­ter­pre­ta­tion of those con­cepts can also be given. Imag­ine a hy­po­thet­i­cal spher­i­cal wave front spread­ing out from the ear­lier event with the speed of light. Then a later event can be af­fected by the ear­lier event only if that later event is within or on that spher­i­cal wave front. If you re­strict at­ten­tion to events in the $x,y$ plane, you can use the $z$-​co­or­di­nate to plot the val­ues of time. In such a plot, the ex­pand­ing cir­cu­lar wave front be­comes a cone, called the “light-cone.” Only events within this light cone can be af­fected. Sim­i­larly in three di­men­sions and time, an event can only be af­fected if it is within the light cone in four-di­men­sion­al space-time. But of course, a cone in four di­men­sions is hard to vi­su­al­ize geo­met­ri­cally.


1.2.4 Four-vec­tors

The Lorentz trans­for­ma­tion mixes up the space and time co­or­di­nates badly. In rel­a­tiv­ity, it is there­fore best to think of the spa­tial co­or­di­nates and time as co­or­di­nates in a four-di­men­sion­al “space-time.”

Since you would surely like all com­po­nents in a vec­tor to have the same units, you prob­a­bly want to mul­ti­ply time by the speed of light, be­cause $ct$ has units of length. So the four-di­men­sion­al “po­si­tion vec­tor” can log­i­cally be de­fined to be $(ct,x,y,z)$; $ct$ is the ze­roth com­po­nent of the vec­tor where $x$, $y$, and $z$ are com­po­nents num­ber 1, 2, and 3 as usual. This four-di­men­sion­al po­si­tion vec­tor will be in­di­cated by

\begin{displaymath}
\kern-1pt{\buildrel\raisebox{-1.5pt}[0pt][0pt]
{\hbox{\hspa...
... \left(\begin{array}{c}r_0\\ r_1\\ r_2\\ r_3\end{array}\right)
\end{displaymath} (1.10)

The hook on the ar­row in­di­cates that time has been hooked into it.

How about the im­por­tant dot prod­uct be­tween vec­tors? In three di­men­sional space this pro­duces such im­por­tant quan­ti­ties as the length of vec­tors and the an­gle be­tween vec­tors. More­over, the dot prod­uct be­tween two vec­tors is the same re­gard­less of the ori­en­ta­tion of the co­or­di­nate sys­tem in which it is viewed.

It turns out that the proper way to de­fine the dot prod­uct for four-vec­tors re­verses the sign of the con­tri­bu­tion of the time com­po­nents:

\begin{displaymath}
\kern-1pt{\buildrel\raisebox{-1.5pt}[0pt][0pt]
{\hbox{\hspa...
...
\kern-1.3pt_2 \equiv -c^2t_1t_2 + x_1 x_2 + y_1 y_2 + z_1 z_2
\end{displaymath} (1.11)

It can be checked by sim­ple sub­sti­tu­tion that the Lorentz trans­for­ma­tion (1.6) pre­serves this dot prod­uct. In more ex­pen­sive words, this in­ner prod­uct is in­vari­ant un­der the Lorentz trans­for­ma­tion. Dif­fer­ent ob­servers may dis­agree about the in­di­vid­ual com­po­nents of four-vec­tors, but not about their dot prod­ucts.

The dif­fer­ence be­tween the four-vec­tor po­si­tions of two events has a proper length equal to the proper dis­tance be­tween the events

\begin{displaymath}
\Delta s = \sqrt{(\Delta\kern-1pt{\buildrel\raisebox{-1.5pt...
...criptstyle\hookrightarrow$\hspace{0pt}}}\over r}
\kern-1.3pt)}
\end{displaymath} (1.12)

So, the fact that all ob­servers agree about proper dis­tance can be seen as a con­se­quence of the fact that they all agree about dot prod­ucts.

It should be pointed out that many physi­cist re­verse the sign of the spa­tial com­po­nents in­stead of the time in their in­ner prod­uct. Ob­vi­ously, this is com­pletely in­con­sis­tent with the non­rel­a­tivis­tic analy­sis, which is of­ten still a valid ap­prox­i­ma­tion. And this in­con­sis­tent sign con­ven­tion seems to be be­com­ing the dom­i­nant one too. Count on physi­cists to ar­gue for more than a cen­tury about a sign con­ven­tion and end up get­ting it all wrong in the end. One very no­table ex­cep­tion is [48]; you can see why he would end up with a No­bel Prize in physics.

Some physi­cists also like to point out that if time is re­placed by ${{\rm i}}t$, then the above dot prod­uct be­comes the nor­mal one. The Lorentz trans­for­ma­tion can then be con­sid­ered as a mere ro­ta­tion of the co­or­di­nate sys­tem in this four-di­men­sion­al space-time. Gee, thanks physi­cists! This will be very help­ful when ex­am­in­ing what hap­pens in uni­verses in which time is imag­i­nary, un­like our own uni­verse, in which it is real. The good thing you can say about these physi­cists is that they de­fine the dot prod­uct the right way: the ${\rm i}^2$ takes care of the mi­nus sign on the ze­roth com­po­nent.

Re­turn­ing to our own uni­verse, the proper length of a four-vec­tor can be imag­i­nary, and a zero proper length does not im­ply that the four-vec­tor is zero as it does in nor­mal three-di­men­sion­al space. In fact, a zero proper length merely in­di­cates that it re­quires mo­tion at the speed of light to go from the start point of the four-vec­tor to its end point.


1.2.5 In­dex no­ta­tion

The no­ta­tions used in the pre­vi­ous sub­sec­tion are not stan­dard. In lit­er­a­ture, you will al­most in­vari­ably find the four-vec­tors and the Lorentz trans­form writ­ten out in in­dex no­ta­tion. For­tu­nately, it does not re­quire courses in lin­ear al­ge­bra and ten­sor al­ge­bra to make some ba­sic sense out of it.

First of all, in the non­rel­a­tivis­tic case po­si­tion vec­tors are nor­mally in­di­cated by ${\skew0\vec r}$. The three com­po­nents of this vec­tor are com­monly in­di­cated by $x$, $y$, and $z$, or us­ing in­dices, by $r_1$, $r_2$, and $r_3$. To han­dle space-time, physi­cists do not sim­ply add a ze­roth com­po­nent $r_0$ equal to $ct$. That would make the mean­ing too easy to guess. In­stead physi­cists like to in­di­cate the com­po­nents of four-vec­tors by $x^0,x^1,x^2,x^3$. It is harder to guess cor­rectly what that means, es­pe­cially since the let­ter $x$ is al­ready greatly over-used as it is. A generic com­po­nent may be de­noted as $x^\mu$. An en­tire four-vec­tor can then be in­di­cated by $\{x^\mu\}$ where the brack­ets in­di­cate the set of all four com­po­nents. Need­less to say, most physi­cists for­get about the brack­ets, be­cause us­ing a com­po­nent where a vec­tor is re­quired can have hi­lar­i­ous con­se­quences.

In short,

\begin{displaymath}
\left(\begin{array}{c}ct\\ x\\ y\\ z\end{array}\right) \equ...
... \left(\begin{array}{c}x^0\\ x^1\\ x^2\\ x^3\end{array}\right)
\end{displaymath}

shows this book’s com­mon sense no­ta­tion to the left and the in­dex no­ta­tion com­monly used in physics to the right.

Re­call now the Lorentz trans­for­ma­tion (1.6). It de­scribed the re­la­tion­ship be­tween the po­si­tions and times of events as ob­served by two dif­fer­ent ob­servers A and B. These ob­servers were in mo­tion com­pared to each other with a rel­a­tive speed $V$. Physi­cists like to put the co­ef­fi­cients of such a Lorentz trans­for­ma­tion into a ta­ble, as fol­lows:

\begin{displaymath}
\Lambda
\equiv
\left(
\begin{array}{cccc}
\lambda^0{}_0...
...& 0\\
0 & 0 & 1 & 0\\
0 & 0 & 0 & 1
\end{array} \right) %
\end{displaymath} (1.13)

where

\begin{displaymath}
\gamma\equiv \frac{1}{\sqrt{1-(V/c)^2}} \qquad
\beta \equiv \frac{V}{c} \qquad
\gamma^2 - \beta^2\gamma^2 = 1
\end{displaymath}

A ta­ble like $\Lambda$ is called a ma­trix or sec­ond-or­der ten­sor. The in­di­vid­ual en­tries in the ma­trix are in­di­cated by $\lambda^\mu{}_\nu$ where $\mu$ is the num­ber of the row and $\nu$ the num­ber of the col­umn. Note also the con­ven­tion of show­ing the first in­dex as a su­per­script. That is a ten­sor al­ge­bra con­ven­tion. In lin­ear al­ge­bra, you nor­mally make all in­dices sub­scripts.

(Dif­fer­ent sources use dif­fer­ent let­ters for the Lorentz ma­trix and its en­tries. Some com­mon ex­am­ples are $\Lambda^\mu{}_\nu$ and $a^\mu{}_\nu$. The name Lorentz starts with L and the Greek let­ter for L is $\Lambda$. And Lorentz was Dutch, which makes him a Eu­ro­pean just like the Greek. There­fore $\Lambda$ is a good choice for the name of the Lorentz ma­trix, and $\Lambda$ or lower case $\lambda$ for the en­tries of the ma­trix. An $L$ for the ma­trix and $l$ for its en­tries would be just too easy to guess. Also, $\lambda$ is the stan­dard name for the eigen­val­ues of ma­tri­ces and $\Lambda$ for the ma­trix of those eigen­val­ues. So there is some po­ten­tial for hi­lar­i­ous con­fu­sion there. An a for the Lorentz ma­trix is good too: the name Lorentz con­sists of ro­man let­ters and a is the first let­ter of the ro­man al­pha­bet.)

The val­ues of the en­tries $\lambda^\mu{}_\nu$ may vary. The ones shown in the fi­nal ma­trix in (1.13) above ap­ply only in the sim­plest non­triv­ial case. In par­tic­u­lar, they re­quire that the rel­a­tive mo­tion of the ob­servers is aligned with the $x$ axes as in fig­ure 1.2. If that is not the case, the val­ues be­come a lot more messy.

In terms of the above no­ta­tions, the Lorentz trans­for­ma­tion (1.6) can be writ­ten as

\begin{displaymath}
x^\mu_{\rm B} = \sum_{\nu=0}^3 \lambda^\mu{}_\nu x^\nu_{\rm A}
\qquad\mbox{for all four values $\mu=0,1,2,3$}
\end{displaymath}

That is ob­vi­ously a lot more con­cise than (1.6). Some fur­ther short­hand no­ta­tion is now used. In par­tic­u­lar, the “Ein­stein sum­ma­tion con­ven­tion” is to leave away the sum­ma­tion sym­bol $\sum$. So, you will likely find the Lorentz trans­for­ma­tion writ­ten more con­cisely as

\begin{displaymath}
x^\mu_{\rm B} = \lambda^\mu{}_\nu x^\nu_{\rm A}
\end{displaymath}

When­ever an in­dex like $\nu$ ap­pears twice in an ex­pres­sion, sum­ma­tion over that in­dex is to be un­der­stood. In other words, you your­self are sup­posed to men­tally add back the miss­ing sum­ma­tion over all four val­ues of $\nu$ to the ex­pres­sion above. Also, if an in­dex ap­pears only once, like $\mu$ above, it is to be un­der­stood that the equa­tion is valid for all four val­ues of that in­dex.

It should be noted that math­e­mati­cians call the ma­trix $\Lambda$ the trans­for­ma­tion ma­trix from B to A, even though it pro­duces the co­or­di­nates of B from those of A. How­ever, af­ter you have read some more in this book, in­sane no­ta­tion will no longer sur­prise you. Just that in this case it comes from math­e­mati­cians.

In un­der­stand­ing ten­sor al­ge­bra, it is es­sen­tial to rec­og­nize one thing. It is that a quan­tity like a po­si­tion dif­fer­en­tial trans­forms dif­fer­ent from a quan­tity like a gra­di­ent:

\begin{displaymath}
{\rm d}x^\mu_{\rm B} =
\frac{\partial x^\mu_{\rm B}}{\part...
...\rm A}}
\frac{\partial x^\nu_{\rm A}}{\partial x^\mu_{\rm B}}
\end{displaymath}

In the first ex­pres­sion, the par­tial de­riv­a­tives are by de­f­i­n­i­tion the en­tries of the Lorentz ma­trix $\Lambda$,

\begin{displaymath}
\frac{\partial x^\mu_{\rm B}}{\partial x^\nu_{\rm A}} \equiv \lambda^\mu{}_\nu
\end{displaymath}

In the sec­ond ex­pres­sion, the cor­re­spond­ing par­tial de­riv­a­tives will be in­di­cated by

\begin{displaymath}
\frac{\partial x^\mu_{\rm A}}{\partial x^\nu_{\rm B}} \equiv
(\lambda^{-1}){}^\mu{}_\nu
\end{displaymath}

The en­tries $(\lambda^{-1}){}^\mu{}_\nu$ form the so-called in­verse Lorentz ma­trix $\Lambda^{-1}$. If the Lorentz trans­for­ma­tion de­scribes a trans­for­ma­tion from an ob­server A to an ob­server B, then the in­verse trans­for­ma­tion de­scribes the trans­for­ma­tion from B to A.

As­sum­ing that the Lorentz trans­for­ma­tion ma­trix is the sim­ple one to the right in (1.13), the in­verse ma­trix $\Lambda^{-1}$ looks ex­actly the same as $\Lambda$ ex­cept that $-\beta$ gets re­placed by $+\beta$. The rea­son is fairly sim­ple. The quan­tity $\beta$ is the ve­loc­ity be­tween the ob­servers scaled with the speed of light. And the rel­a­tive ve­loc­ity of B seen by A is the op­po­site of the one of A seen by B, if their co­or­di­nate sys­tems are aligned.

Con­sider now the rea­son why ten­sor analy­sis raises some in­dices. Physi­cists use a su­per­script in­dex on a vec­tor if it trans­forms us­ing the nor­mal Lorentz trans­for­ma­tion ma­trix $\Lambda$. Such a vec­tor is called a “con­travari­ant” vec­tor for rea­sons not worth de­scrib­ing. As an ex­am­ple, a po­si­tion dif­fer­en­tial is a con­travari­ant vec­tor. So the com­po­nents of a po­si­tion dif­fer­en­tial are in­di­cated by ${\rm d}{x}^\mu$ with a su­per­script in­dex.

If a vec­tor trans­forms us­ing the in­verse ma­trix $\Lambda^{-1}$, it is called a “co­vari­ant” vec­tor. In that case sub­script in­dices are used. For ex­am­ple, the gra­di­ent of a func­tion $f$ is a co­vari­ant vec­tor. So a com­po­nent $\partial{f}$$\raisebox{.5pt}{$/$}$$\partial{x}^\mu$ is com­monly in­di­cated by $\partial_{\mu}f$.

Now sup­pose that you flip over the sign of the ze­roth, time, com­po­nent of a four-vec­tor like a po­si­tion or a po­si­tion dif­fer­en­tial. It turns out that the re­sult­ing four-vec­tor then trans­forms us­ing the in­verse Lorentz trans­for­ma­tion ma­trix. That means that it has be­come a co­vari­ant vec­tor. (You can eas­ily ver­ify this in case of the sim­ple Lorentz trans­form above.) There­fore lower in­dices are used for the flipped-over vec­tor:

\begin{displaymath}
\{x_\mu\} \equiv (-ct,x,y,z) \equiv (x_0,x_1,x_2,x_3)
\end{displaymath}

The con­ven­tion of show­ing co­vari­ant vec­tors as rows in­stead of columns comes from lin­ear al­ge­bra. Ten­sor no­ta­tion by it­self does not have such a graph­i­cal in­ter­pre­ta­tion.

Keep one im­por­tant thing in mind though. If you flip the sign of a com­po­nent of a vec­tor, you get a fun­da­men­tally dif­fer­ent vec­tor. The vec­tor $\{x_\mu\}$ is not just a some­what dif­fer­ent way to write the po­si­tion four-vec­tor $\{x^\mu\}$ of the space-time point that you are in­ter­ested in. Now nor­mally if you de­fine some new vec­tor that is dif­fer­ent from a vec­tor that you are al­ready us­ing, you change the name. For ex­am­ple, you might change the name from $x$ to $y$ or to $x^{\rm {L}}$ say. Ten­sor al­ge­bra does not do that. There­fore the golden rule is:

The names of ten­sors are only cor­rect if the in­dices are at the right height.
If you re­mem­ber that, ten­sor al­ge­bra be­comes a lot less con­fus­ing. The ex­pres­sion $\{x^\mu\}$ is only your space-time lo­ca­tion named $x$ if the in­dex is a su­per­script as shown. The four-vec­tor $\{x_\mu\}$ is sim­ply a dif­fer­ent an­i­mal. How do you know what is the right height? You just have to re­mem­ber, you know.

Now con­sider two dif­fer­ent con­travari­ant four-vec­tors, call them $\{x^\mu\}$ and $\{y^\mu\}$. The dot prod­uct be­tween these two four-vec­tors can be writ­ten as

\begin{displaymath}
x_\mu y^\mu
\end{displaymath}

To see why, re­call that since the in­dex $\mu$ ap­pears twice, sum­ma­tion over that in­dex is un­der­stood. Also, the low­ered in­dex of $x_\mu$ in­di­cates that the sign of the ze­roth com­po­nent is flipped over. That pro­duces the re­quired mi­nus sign on the prod­uct of the time com­po­nents in the dot prod­uct.

Note also from the above ex­am­ples that sum­ma­tion in­dices ap­pear once as a sub­script and once as a su­per­script. That is char­ac­ter­is­tic of ten­sor al­ge­bra.

Ad­den­dum {A.4} gives a more ex­ten­sive de­scrip­tion of the most im­por­tant ten­sor al­ge­bra for­mu­lae for those with a good knowl­edge of lin­ear al­ge­bra.


1.2.6 Group prop­erty

The de­riva­tion of the Lorentz trans­for­ma­tion as given ear­lier ex­am­ined two ob­servers A and B. But now as­sume that a third ob­server C is in mo­tion com­pared to ob­server B. The co­or­di­nates of an event as per­ceived by ob­server C may then be com­puted from those of B us­ing the cor­re­spond­ing Lorentz trans­for­ma­tion, and the co­or­di­nates of B may in turn be com­puted from those of A us­ing that Lorentz trans­for­ma­tion. Schemat­i­cally,

\begin{displaymath}
\kern-1pt{\buildrel\raisebox{-1.5pt}[0pt][0pt]
{\hbox{\hspa...
...le\hookrightarrow$\hspace{0pt}}}\over r}
\kern-1.3pt_{\rm {A}}
\end{displaymath}

But if every­thing is OK, that means that the Lorentz trans­for­ma­tions from A to B fol­lowed by the Lorentz trans­for­ma­tion from B to C must be the same as the Lorentz trans­for­ma­tion from A di­rectly to C. In other words, the com­bi­na­tion of two Lorentz trans­for­ma­tions must be an­other Lorentz trans­for­ma­tion.

Math­e­mati­cians say that Lorentz trans­for­ma­tions must form a group. It is much like ro­ta­tions of a co­or­di­nate sys­tem in three spa­tial di­men­sions: a ro­ta­tion fol­lowed by an­other one is equiv­a­lent to a sin­gle ro­ta­tion over some com­bined an­gle. In fact, such spa­tial ro­ta­tions are Lorentz trans­for­ma­tions; just be­tween co­or­di­nate sys­tems that do not move com­pared to each other.

Us­ing a lot of lin­ear al­ge­bra, it may be ver­i­fied that in­deed the Lorentz trans­for­ma­tions form a group, {D.5}.