HPS 0410 Einstein for Everyone

Back to main course page

Origins of Special Relativity

John D. Norton
Department of History and Philosophy of Science
University of Pittsburgh

Background reading: J. Schwartz and M. McGuinness, Einstein for Beginners. New York: Pantheon.. pp. 1 - 82.

We now take Einstein's special theory of relativity for granted. The evidence in its favor is quite massive, so that there is little license for skepticism. Our real task is to learn the theory and there are many text books that develop it in an easy to understand fashion.

In 1905, however, when Einstein first introduced it, it was a strange and even shocking theory . Then Einstein did not have the luxury of a simple text book on special relativity from which he could learn the theory. Somehow he had to see that such a theory was needed. And then he had to devise the theory and know it was not crazy speculation. How did he do it? That is the present topic--the history of Einstein's discovery of special relativity. We shall see that Einstein had no crystal ball. He worked with resources and methods available to everyone. That is the fascination of the episode. We shall see how he took the same pieces everyone had and assembled a masterpiece where everyone else faltered.

Before we look at Einstein's deliberations, we need to see what came before. That provided Einstein with the foundation upon which he could build the special theory of relativity.

To foreshadow what is to come, we will find that there was no good experimental foundation for special relativity prior to Einstein's time. That is, we needed reliable results on things that move very fast before relativity theory could have solid experimental foundation. That foundation came with the electromagnetic theory of Maxwell and others in the nineteenth century. It gave the first reliable account of how some very rapidly moving things behave, including, most notably, light itself. Before Einstein's time, relativity theory could not properly emerge.

Once electromagnetic theory had been developed, there was a sense that relativity theory already lived within the theory. H. A. Lorentz already discovered the basic equations Einstein would use later within special relativity. It had become inevitable that relativity theory or something like it would emerge. It was really only a question of who would have the ingenuity and flexibility of mind to find the theory first. That person proved to be Einstein.

Origins of the Principle of Relativity

The principle of relativity tells us that we cannot detect our uniform motion. That idea became important to physics in the seventeenth century. After Copernicus, it gradually became accepted that the earth was not motionless at the center of the universe. Instead it spun on its axis and orbited the sun. Yet, as the ancient Greeks were quick to point out, if the earth moved, why didn't we have some sensation of the movement? Copernicus
Nicholas Copernicus

Earth in the Center
Sun orbits
replaced by heliocentric
Sun in the Center
Earth orbits
Isaac Newton
If Copernicus' idea was to survive, physics would have to be renewed so that one's own motion would be undetectable; that is, so that it satisfied a principle of relativity. As far as observable things were concerned, the physics Newton developed in the seventeenth century satisfied this principle. For example, he associated forces with acceleration and not simply motion. So, no matter how fast a body moved, as long as it was not accelerating, no force acted on it.


Newton prism
Newton splits light into its component colors
What altered this happy arrangement in the nineteenth century were advances in the theory of light. Newton has supposed that light consisted of rapidly moving corpuscles; they obeyed the principle of relativity as much as anything else in his universe. Following work of Fresnel and others early in the nineteenth century, this account was replaced by one of light as a propagating wave.

One of the most important indications that light was a wavelike process was the discovery of interference, shown below in Thomas Young's famous two slit experiment. Two light sources produce the characteristic interference patterns familar to anyone who has thrown two pebbles into a calm pond.


If light was a wave, it was assumed that the wave must be carried by some medium, just as sound waves are carried by air and water waves are carried by water. How else could the peak and the trough of two waves annihilate one another to produce the interference patterns if the wave was not a displacement in some medium? That medium was known as the luminiferous (=light bearing) ether. The moving earth was now supposed to be moving through a medium that must stream past the earth, much as water streams past a boat moving through the ocean.

Ether Current Experiments Fail

ether wind This ether now made plausible that our planet's absolute motion might be detectable by experiments on the earth. All we had to do was to seek to see the current of ether flowing past. It proved quite easy to devise experiments to do this. Recall that the ether carries light waves, much as air carries sound waves or water, water waves. So if the ether is flowing past us, that flow ought to be revealed in measurements on light.
A series of experiments were devised in the 19th century to detect this ether current. They were experiments on light. Typically they involved the passing of light through a combination of prisms, lenses and the like, creating inference fringes and then looking for an effect in these fringes. The striking result of all these experiments was that the flow of ether had no effect on optical experiments. In that sense, all the experiments failed. Curiously, it was as though the earth just happened to be at perfect rest in the ether. In retrospect, this is a puzzling outcome. At the time, however, there was nothing like the sense of crisis you might expect. Rather it had become a simple regularity of experiment that the ether drift was invisible to us.
The experiments could be catalogued according to the size of the effect they hoped to detect and, as a result, the sensitivity of the instruments needed. The largest effects were "first order" effects. They needed the least sensitive instruments and were easiest to conduct. Many of these first order experiments were undertaken and all failed to demonstrate an ether current.

Fresnel Ether Drag

That all first order experiments failed to reveal the earth's motion should, you migh expect, have been very puzzling. However it soon ceased to be mysterious. It could be explained by a single hypothesis, the Fresnel "ether drag" hypothesis. It supposed that the ether was dragged partially by optically dense media--the lenses and other media used in optical experiments--by an amount tuned directly to the medium's refractive index. It turned out that amount could be selected so that it would exactly cancel out any possible first order effect of an ether current.

What is the refractive index? When light enters a dense optical medium like glass, it slows down. The refractive index measures the amount of slowing. A refractive index of 1.5, a common figure for ordinary glass, means that light moves at 1/1.5 = 2/3 as fast as light in a vacuum. The greater the refractive index, the more the light is slowed and, as a result, the more the light is bent when it enters the medium.

Here's how the drag hypothesis worked. Light waves are carried by the medium of the ether, just as water waves are carried by water and sound waves by air. If the water or the air is moved at some speed, then that speed will be added to the speed of the water or sound waves. The same would be expected in the case of light if the ether is moved. The motion of the ether must be added to the motion of the light it carried.

But what does it take to move the ether? Consider a glass block. Since light waves pass through it, there must be ether inside it to carry the waves. If the block moves, does the ether move with it? The simplest case is that it does not. Then, it is as if the glass block is perfectly porous sieve that lets the ether flow freely through it.

This is the case of no ether drag illustrated opposite. A light wave propagates in the ether of empty space horizontally from the left towards the block, which is moving vertically. The light passes through the block without any deflection from the vertical motion of the block. That is because the ether is undragged; it is left behind fully by the moving block and takes on none of the block's motion.
no drag
full drag Now take the opposite case. It arises when the ether is fully trapped by the glass block and moves with it, much as air trapped inside a closed car moves with the car. In this case, the ether moves vertically with the glass block, with the same speed as the glass block. As result, the horizontal lightwave is deflected vertically with the full motion of the glass block. This is full ether drag.
Finally, there are a myriad of intermediate cases, in which the ether is only partially dragged by the glass block. In these cases, the glass block acts as a more or less porous sieve communicating less or more of its motion to the ether. These are the cases of partial ether drag. In these cases, the light wave is only partially deflected from its horizontal motion.

Assuming just the right amount of partial drag tuned exactly to the glass' refractive index was enough to eradicate any positive sign of our apparatus' motion through the ether in first order experiments.
partial ether drag

Tuning the Fresnel Ether Drag

But what is just the right amount of partial drag? And why should it be tuned so precisely to the refractive index of the optical medium? We can see how this comes about if we pursue just one simple experiment that we might try to use to detect the earth's motion through the ether. It is just one experiment. However things work out the same in many other experiments.

To begin, imagine that we are on an earth that is perfectly at rest in the ether and that we receive light from a distant star that is exactly overhead. That starlight would penetrate a glass block as shown in the figure. The light would descend vertically and keep moving vertically in the block. aberration 1
aberration 2 Now take the same case but add the fact that the earth we are standing on moves horizontally.

In the ether frame of reference, the light will continue to descend vertically towards the block. But what happens to the light when it enters the moving block? The possible effects of the motion of the block on the propagation of the light in the block are shown in the figure. The light in the block may be either undragged, partially dragged or fully dragged. Which trajectory the light follows depends on the amount of ether drag.
Now transform our viewpoint to that of the observer moving with block. The figure shows the same system, just redescribed by the moving observer. The three possible effects of the block's motion on the light are shown again.

There is a second effect. If we change our point of view to one that moves with the block, there is a corresponding alteration in the light ray outside the block. The vertically propagating light acquires an extra motion opposite to that of our motion. The light that descended vertically in the ether, is now found to descending obliquely as a result of this acquired horizontal motion. This effect is widely recognized in astromony and was observed in starlight in the 18th century. It is known as "stellar aberration" and is manifested in a slight angular shift in the apparent positions of stars, in coordination with the earth's motion.

The effect is familiar. Imagine rain falling vertically. If you drive through the rain in a car, the vertically falling rain will acquire a component of horizontal motion towards you and splash onto the windscreen.
aberration 3

The pressing question is whether we can use this effect of stellar aberration to determine that we on earth are moving in the ether. That is, can we distinguish this case from one in which we are at rest in the ether and the star is moving towards us with the same relative velocity? We could use this effect to determine our absolute motion in the ether if the incident ray of light differed in any behavior from a ray of light arriving obliquely at the glass block when the block is at rest in the ether.

The behavior of a light ray obliquely incident onto a glass block is well understood from the study of refraction in elementary optics. The incident ray is bent towards a line perpendicular to the block's surface. The amount the refracted ray is bent depends upon the refractive index of the glass according to Snell's law. The greater the refractive index, the greater the deflection.

We cannot infer our motion through the ether from the light striking a moving glass block, as long as the light incident on the moving block bends in just the same way as incident light is refracted by a block at rest in the ether. That means that the partial drag of the ether must simulate this refractive effect exactly, so that the partially dragged ray above must be bent through just the same angle as it is in ordinary refraction.

This is the how the Fresnel drag has to be tuned exactly to the refractive index of the optical medium. The greater the refractive index, the more the refracted ray is bent and, as a result, the greater the amount of ether drag needed to simulate it.

For those of you who have to know the formula that specifies the tuning, it is just this. The amount of drag is the velocity of the optical medium in the ether multiplied by (1-1/n2), where n is the refractive index.

aberration 4

We see here for the first time something that we will see again. We have an experiment that we first expect to be able to reveal the earth's motion through the ether. We might expect that the light of distant stars would behave differently in optical media that move in the ether. However a second effect arises, partial ether drag, and it exists in exactly the amount needed to cancel out any positive result that would affirm motion in the ether.

Image: http://en.wikipedia.org/wiki/File:Refraction.jpg

There was a complication. A widely known property of glass is that it refracts light differently for different colors. That is, its refractive index varies with the frequency of the light. This is what enables a prism to split light into its different colors and is responsible for the chromatic aberration of lenses that lens designers try so hard to avoid. The odd outcome of this fact is that light of different frequencies will be associated with different amounts of ether drag, according to Fresnel's formula. In effect that means that each frequency of light has its own ether. That was troubling thought even in the 19th century.

dispersion prism
Image: http://en.wikipedia.org/wiki/File:Dispersion_prism.jpg

Michelson Morley Experiment

After first order experiments came second order experiments. These sought to measure the very much smaller second order effects. They needed instruments that were a great deal more sensitive to ether currents; for they had to be able to detect the residual second order effects that might remain after the Fresnel drag had protected first order effects from detection. This added sensitivity meant that second order experiments were a great deal harder to carry out. There was only one successfully executed in the 19th century, the celebrated experiment of Albert A. Michelson and Edward W. Morley of 1887 that completed Michelson's earlier efforts at such an experiment. Indeed the experiment was so difficult that Michelson won the Nobel prize principally for his highly sensitive optical interferometer used in the experiment.
michelson figure 2

michelson paper title pagemichelson figure 1
Pages from Michelson and Morley's paper.

The basic idea of the experiment is that light moves differently on a moving earth according to whether it propagates transverse to the direction of the earth's motion or parallel to the direction of the earth's motion. In the first case the ether current flows across the propagating light, slowing it a little. In the second case, it provides a kind of head wind that slows the light more or a tail wind that speeds it up.

Here is a schematic picture of the way the experiment sought to look for these differences.


A light source sends a beam of light to a half silvered mirror that splits the beam in two. One half continues in the same direction; the other is sent off at 90 degrees. They both strike mirrors at equal distances which reflect them back to a place where they can be viewed. That the mirrors are placed at equal distances from the half-silvered mirror is represented by the two rods of equal length in the figure that connect them.

You can grasp the way the experiment works most simply if you imagine not a beam of light, but merely a pulse of light, as shown in the figure. Since the distances to the two mirrors are the same, the two pulses will require the same time to traverse the distance out and back and they will be detected at the same time.

In practice, pulses are not used. A steady lightbeam is used. However the basic analysis remains the same. Each individual peak and trough in the lightbeam behave like a single pulse. Any difference in propagation time will be manifested by the peaks and troughs of the waves misaligning when they are combined at the detecting screen. The combining of these two waves produces interference fringes at the detecting screen. Any change in the alignment of the peaks and troughs is revealed as a change in the interference fringes.

In use, the apparatus is turned very slowly so that the ether current passes over it from successively different directions. During this turning, the ether current affects the light traveling in the two directions differently and these changes are expected to be manifested as changes in the observed interference patterns.

Imagine, for example, that the horizontal direction in the figure below aligns with the direction of motion of the earth in the ether. Then, thinking classically, we expect the ether current to slow the travel time of a light pulse making the round trip in the direction transverse to the ether current. The net effect of the ether current on the pulse that makes the round trip parallel to the ether current is an even greater slowing. So, as the figure shows, by the time the transverse pulse reaches to detector, the longitudinal pulse is still traversing the apparatus.

These difference in arrival times will change as the apparatus rotates and they will be manifested as changes in the observable interference fringes.

The result was negative. Michelson and Morley found shifts in the interference fringes, but they were very much smaller that the size of the effect expected from the known orbital motion of the earth.

The Failures are Explained by H. A. Lorentz

The outcome of the 19th century tradition of experiments aimed at detecting the ether current was negative. The wave theory of light of the 19th century depended upon this ether. It was what carried the light wave, just as air carries sound waves. Yet no experiment could show the direction or magnitude of the ether current.

The puzzle was deepened and broadened by the end of the 19th century through the assimilation of optics into Maxwell's theory of electric and magnetic fields. In the 1860's, Maxwell showed that a light wave is really a wave of electric and magnetic fields, an electromagnetic wave. So now the luminiferous ether was also the ether that carried these fields.


How is it possible for Maxwell's electrodynamics to be based fundamentally upon the notion of an ether, yet no experiment can reveal the magnitude and direction of the ether current? This was the problem taken up and solved brilliantly by the great Dutch physicist H. A. Lorentz.


Lorentz first simplified Maxwell's theory into the form that it is routinely taught today. All matter, he proposed, simply consists of electric charges (called "ions" or "electrons") in the empty space of the ether. He then proceeded to show how electrodynamical theory could explain the failure of the experiments to produce a result.

If an optical medium just consists of such charges, Lorentz could show that an electromagnetic wave propagating through it would be affected in exactly the way Fresnel's ether drag hypothesis required. The ether was not really dragged in Lorentz's account. His was a fixed, immobile ether. Rather the charges that made up the medium were excited by the light wave as it passed through. They absorbed energy from the light and re-emitted it. When the incident and re-emitted light were combined, the net effect was a slowing of the propagation of light that matched exactly the effect of Fresnel's hypothesis. The ether was not dragged; it just looked like it was. The amount that light slowed in media in Fresnel's hypothesis was no longer a supposition but a demonstrated result in electrodynamics. That explained why all first order experiments failed. Fresnel_2
The second order Michelson Morley experiment was a little harder. There was a solution suggested by the fact that classically light needs more time to make the longitudinal round trip than the transverse one. So what if the apparatus contracted in length longitudinally. Then the longitudinal pulses would need less time to make the round trip and negative result could be restored. The result would look something like this:
What Lorentz was able to show was that Maxwell's theory of electromagnetism predicted precisely this much longitudinal contraction.To get this result, Lorentz modeled matter composing a body as a large collection of electric charges, all held together in equilibrium by electric and magnetic forces. lattice at rest
lattice moving The equilibrium was disturbed if the entire object was set in motion. Moving electric charges create magnetic fields that in turn act back of electric charges. All these changes settle out into a new equilibrium configuration. What Lorentz could show was that new configuration consists in a contraction of the body in the direction of motion in just the amount needed to eradicate a possible result from the Michelson Morley experiment.
The catch was that matter probably couldn't consist just of electric charges held by electric and magnetic forces. There had to be other forces as well. They had to be there, for example, to prevent Lorentz's electrons blowing themselves apart under the mutual repulsion of the like charges in different parts of an electron. So Lorentz simply supposed that these other forces would behave just like electric and magnetic forces and yield the same result.

The 20th century opened with the Maxwell-Lorentz theory of electrodynamics as the most successful physical theory of the era. While that theory was based essentially on the existence of an ether, the failure to detect ether currents was no longer a puzzle, but a prediction of the theory. Lorentz showed that the theory entailed effects whose combined import was to make the ether current invisible and the absolute motion of the earth undetectable by us. We might be moving through the ether at some definite speed and in some definite direction. But the physics of electrodynamics conspired to prevent us ever measuring that speed and direction.

At the time this seemed like a perfectly satisfactory resolution of the puzzle of the failure of all ether drift experiments. It is only if you know what is coming next that you find the resolution awkward. Or, if you are Einstein, you see more in the resolution than others then did.

A final remark: the schematic drawing of the Michelson Morley experiment above may seem oddly familiar. In fact we have already seen its essential content before. The two arms of the apparatus are light clocks. You will recall that we computed the relativistic contraction effect from the condition that moving light clocks, one transverse to and one parallel to the direction, of motion must tick at the same rate. This is the same contraction that figures in Lorentz's account.

Two light clocks

What you should know

Copyright John D. Norton. January 2001, September 2002; July 2006; January 2, 2007; January 21,February 4, 2008; January 15, 17, 27, 2010; May 15, 2011; January 28, 2013.