Friday - Sunday, 15-17 October 2010
::: Philosophy of Scientific Experimentation
::: More photos
Allan Franklin is a defining figure in the philosophy of scientific experimentation. Few discussions of the topic proceed very far without mentioning his many books on experiment or even his Stanford Encyclopedia article, "Experiments in Physics."
When it came to organizing a conference on scientific experimentation, Sloboan Perovic, the program chair, and I had settled quickly on him. It seemed quite fitting that Allan would serve on its program committee, helping us wade through the flood of proposals for papers; and that he would now be the opening keynote speaker.
Moments ago, as I sat in his talk, it was clear yet again how Allan had secured his position. Only a few minutes into the presentation, he was recounting a splendid instance of how those without an experimental orientation can miss something very important.
We are used to the idea that experiments return results. It is easy to imagine that the getting of the results is a mechanical operation. It is not so different, we easily imagine, from working a problem set in a text book. All the interesting work is done in working on the problems. Eventually we need to check our answers. We do that by looking in the back of the book. That last step is analogous to experiment. We get an authoritative answer and there's not much point in thinking too hard about how we got it. How much is there to say about flipping to the back of the book?
The reality of it is a little more subtle and a lot more interesting. The results recovered in high energy particle physics come through some occurrence that rises above the background noise. In the 1950s and 60s, Allan recalled, identifying them was simple. He flashed a graph onto the screen. There was a curve identifying background levels. On one side was a big spike. That was the result.
These were days of happiness and innocence, for they were untroubled by the worry that such spikes can happen purely by chance. The background noise was just a lot of little randomly created spikes, all thrown together. A big one that we might mistake for a result could happen but with low probability. We do need to guard against even a low probability source of error. So physicists began to demand that spikes that count had to pass a test that assured us they were not noise.
Spikes coming from random noise are governed by Gaussian probability distributions. That makes the assessment of the probabilities easy. If we measure the size of the spikes in units of SD = standard deviation, we automatically have a probability for how likely they are. Spikes of one SD or less occur with probability 0.68. Those of two SD or less occur with probability 0.95; and so on.
In the earliest efforts, a spike would count as a result if was two standard deviations above the background, or even as little as one and a half SD. While this is a standard often used in statistics, it seemed a little permissive. So by the early 1970s, the criterion stabilized at three SD. That means that there is only a probability of 0.0027 that a result is spurious, an artifact of noise.
You might think that this was stringent enough. For, as the efforts to avoid misidentifying noise become more stringent, we must worry that we are mistakenly discounting good results as noise. A 0.0027 probability of error seems like overkill.
Alas, it became clear by the mid 1970s that even this level was not stringent enough. A three SD criterion is attractive only as long as we do not consider just how many experiments are being done and how many "bins" they supply us, in which we might seek a result. If we work with a three SD criterion and we merely look in 1,000 bins, we are very likely to find at least one spurious result. The chance of that is 0.93.
So a higher standard is needed: four SD was chosen. It has a probability of erroneously passing a spurious result of merely 0.000064. If we look in 1000 bins, we've now reduced our chance of finding at least one spurious result to 0.06.
That must be the end, I thought. But no--by the 1990s, the level had risen to five SD. The chance of misidentification in one experiment is now the minuscule 0.00000057 and of a spurious result in 1000 bins of merely 0.0006. This is the standard now enforced by the premier journals Physical Review and Physical Review Letters.
It is a demanding regime. The discovery of the top quark had not risen to that level. It puts individual experimenters in an odd position. They run their experiment and get results at the three or four SD level. These are results that they could be quite assured could not come from noise. Yet the new rules prohibited them claiming their success. Now, Allan reported, the new regime was producing its own artifacts. Results are being reported that come in magically at exactly the five standard deviation limit.
This example was a splendid start to the conference. Before recounting it, Allan had provided the framing the conference needed. He recalled how things were in philosophy of science when he started his work.
The field was, he remarked, too concerned with ornithology. He delivered the bait perfectly. It was irresistible. Before we could stop ourselves, we were all wondering, what have you got against ornithology?! And then he hooked us. In those earlier days, he continued, we were very worried about whether all swans are white and whether a red tablecloth does confirm that all ravens are black. This last worry was Hempel's "paradox of the ravens." It is hard to imagine now that it had consumed us only a few decades ago.
This sort of philosophical ornithology was distant from Allan's experience as an experimental physicist. So he set out to write works in history and philosophy of physics that responded to the experimental practice he knew. That was when I first met Allan. He was a Visiting Fellow in the Center in the 1980s, immersed in his book project.
When Allan's first book came out, Bob Ackerman had reviewed it in 1988, calling it a part of the "new experimentalism." Allan listed the tenets of this new experimentalism. I was not fast enough to jot them all down. They included the idea that there are many roles for experiment and that experimental results enjoy a persistence that allows them to endure through theory change. Here's snap of the screen:
I'll now admit that our framing of the conference was a response to this idea of a new experimentalism. That movement was a healthy reaction to a remote analysis of science whose primary focus was theories, impossibly idealized as sets of sentences in first order predicate logic, closed under deductive implication. We were asking for papers that made good on the corrective promised.
It was an unrealistic request, I now saw all too clearly. We are well past that era. Philosophy of scientific experimentation is well-developed. It is maturing into a field in its own right. It is developing issues, divisions and debates of its own. It no longer needs to be defined as a reaction to something else.
Allan hit the nail on the head when he quipped, "It really should be called the middle-aged experimentalism!"
There was much more in Allan's talk. And there will be much more in this conference. (I started writing these words during the lunch break of the first day.) We have a truly impressive program. I apologize to the many later speakers and discussants for not entering their contributions and thoughts into this narrative. My excuse is that, this far down in the narrative, there are no readers left. (Well, perhaps one--you! I'm happy you stayed!)
However I'll mention two of the themes that emerged.
One is relevant to my interests in inductive inference. This community of philosophers of experiment is concerned centrally with issues of evidence. That word came up in almost every talk. Yet no one made any use of our field's best developed theory of evidence, Bayesianism. Indeed one talk at the end went out of its way to deprecate the approach.
This, I thought, is further evidence of the fragmentation of our field. The Bayesians are now pretty much isolated in the fortress of "formal epistemology," where they speak only to each other. Is the separation of sub-fields a result of the withdrawal of the Bayesians into their fortress? Or is it also an indication that philosophers of science working on topics like experiment are unable to find their approach useful? Is Bayesianism today's version of philosophical ornithology?
The second theme is this. The new experimentalism was born from frustration at the dominance of an over-idealized, purified notion of theory. Experiment has, to use Ian Hacking's famous phrase, "a life of its own."
That dictum is now a tightening collar that is starting to chafe. A repeated theme in the other many talks was that experiment is not so free of theory. I am sensing a growing impatience with the idea of the autonomy of experiment. Is this the dogma that will give rise to tomorrow's new who-knows-what-ism?
John D. Norton