I was taught about the scientific method in high school, but it was presented as a rather dry set of rules. I don't think I really understood science until I read the classic book, Fads and Fallacies in the Name of Science, by Martin Gardner. Gardner was already known to me as the author of the "Mathematical Games" section of Scientific American magazine. A picture of my copy of the book can be seen to the left. This is a Dover paperback edition, copyright 1957. Note 1 I'm pretty sure I read the book while I was still in high school, from which I graduated in 1959. It was this book that made me appreciate one of the most important aspects of the scientific method: it needs to guard against experimenter self deception. That's the point of the quote at the top of this entry from the brilliant physicist Richard Feynman, which I only heard many decades later. Although Gardner's book primarily addresses pseudo-science, along the way he gives examples of well intended scientific experiments which went wrong. Reading the book, I was very impressed by how easy it is for a scientist to inadvertently deceive himself. One chapter covers experiments on extrasensory perception ("ESP") that were done by Duke University psychologist J. B. Rhine in the 1930's. Researchers continue to make claims in that field to this day, when in fact, as someone put it, "ESP is a field in which 200 years of continuous research has yet to produce a single experiment that can be regularly replicated by skeptics." Note 2 Medicine is an important area in which we see the need for careful research. The gold standard for evidence in medicine is the controlled, double-blind experiment with a large number of subjects. Gardner discusses a number of quack medical cures in his book, but it's easy for even careful experimenters to go wrong. Note 3 For example, in a set of trials done in Lancashire, England, a large number of students were randomly assigned to the experimental and control groups. The experimental group received a nutritional milk supplement, and the control group did not. However, some well-meaning researcher allowed teachers to swap students between groups "if the assignment seems unbalanced". Quite a few teachers moved needier students into the group receiving the supplement, making the experimental and control groups no longer comparable. That one sentence in the instructions rendered the results of the expensive experiment totally useless. These days, computer programs are frequently used to keep track of the experimental subjects, and to randomize their assignment to either the experimental or the control group. But in the past, that job had to be done by hand. In one experiment set up carefully to be double-blind, that task was assigned to a particular assistant in the doctor's office where the experiment was being carried out. Each subject received either the active medication or an identical inert pill (a "placebo"), without knowing what they were getting. The doctors evaluating their progress were also ignorant of which patients were getting the active medication. The experimental results showed the medicine to be beneficial. Unfortunately, subsequent testing by other experimenters ultimately showed the medication to have no beneficial effect at all. The original researchers, their reputation at stake, returned to their records to find out where they might have gone wrong. It turned out that the assistant who had participated in assigning the patients to either the experimental group or the control group occasionally served as a receptionist in the office. She knew which patients were receiving the treatment and which were not, and she was a strong believer in the efficacy of the treatment. That breach of the double-blind protocol was what had caused the experiment to report an illusory benefit. It's important to point out that the assistant was careful to never explicitly break confidentiality. That is, she didn't reveal who was getting the treatment and who was not to either the patients or the doctors. Nevertheless, one can easily imagine how her contact with the patients might have affected the results. Imagine a Mrs. Smith coming into the waiting room, and being greeted by that receptionist, who knows (in the back of her mind) that Mrs. Smith is getting the medication. The receptionist might cheerily say, "Oh, Mrs. Smith, you're looking rather well today." Mrs. Smith carries this optimism into her subsequent interaction with the doctor, who reports that she seems to be improved. On the other hand, the receptionist might ask Mrs. Jones (a member of the control group) if she's feeling a bit tired. When information is passed, in any manner, that violates the double-blind protocol, this is called "leakage". Note 4 A particularly sad case of self-deception that I hadn't known about before reading Gardner's book occurred in 1903, when Prosper Blondlot, a reputable French physicist, thought he had found what he called "N rays". Their detection depended upon seeing a fairly subtle visual effect, and they ultimately turned out to be illusory. The exposure of his error led to Blondlot's madness and death. Gardner's book prepared me to better understand similar cases that occurred in my own lifetime. One of these was "polywater", a hypothetical polymerized form of water discovered by Russian researchers in the late 1960's. The supposed unusual properties of this material turned out to be entirely illusory. Similarly, more recently, a mechanism was proposed for what became called "cold fusion". This provoked quite a bit of excitement, but ultimately turned out to be nothing. The smallest error in an experimental design can destroy a carefully designed "blind" experiment. In one case described by Gardner, a German researcher probed for a subtle visual effect. He opened or closed a shutter controlling the stimulus, and the subject reported seeing it or not, without knowing whether the shutter was open or closed. The experimenter wrote the state of the shutter in a notebook at each trial as he opened or closed the shutter, writing an "o" (for "offen") or "g" (for "geschlossen"). The experiment found the effect to be real - the subjects could detect the stimulus at greater than chance levels. However, it turned out that the subjects were unconsciously hearing the scratching of the experimenter's pen, and detecting the different amount of time it took to write a "g" as opposed to an "o". When the experiment was later changed to eliminate that possibility, the effect disappeared. Information "leakage" of this sort often takes place at a subconscious level - the subjects don't realize what is actually giving them the correct answer. Gardner and others later wrote about an interesting experiment in "remote viewing", a branch of ESP research. People called "Senders" were sent out to view randomly selected (and secret) sites, mentally concentrating on their visual impressions at previously specified times. "Receivers" were isolated in a room, and wrote down impressions of what they were mentally "receiving" at the specified times. Note 5 The transcripts written by the Senders and Receivers were sealed in envelopes, and independent judges later tried to match them up, to see if the Senders had been able to mentally transmit their images to the Receivers. The judges proved to indeed be able to match the transcripts at far greater than chance levels - that is, the experiment confirmed the transmission of information via a psychic connection. As is done in science, the experiment was repeated by other researchers. They also scrutinized the methodology of the original researchers, and discovered a possible source of leakage. The transcripts of both the Senders and Receivers contained clues to the order in which they were written. For example, a Receiver might write, "I'm not getting a clear image, perhaps because this is the last session of a long day, and I'm tired." A Sender might write, "I have a strong image of long shadows, due to the setting sun". These clues could help the judges match these two transcripts. There were many other clues in the transcripts that hinted at the order in which they were written. Thus, when the experiment was repeated, an additional phase was added. The transcripts passed through a set of independent editors, whose job it was to remove all hints of the order in which the transcripts had been written. Only then were the transcripts passed on to the judges to be matched. When this was done, the judges became unable to match the transcripts of the Senders and Receivers at greater than chance levels. The experimenters noted an interesting effect. At the end of each day, after the transcripts had been sealed so they could not be altered, the Senders and Receivers were brought together to discuss the day's work. The purpose of this was to allow them to strengthen any actual psychic connection, which was after all what the experiment was looking for. During these meetings, the Senders and Receivers (all strong believers in psychic phenomena) were absolutely ecstatic about each day's results. Looking at their transcripts, they thought that the judges would certainly be able to match almost every single one with great accuracy. They were later stunned when the judges were unable to match the transcripts at greater than chance levels, and they demanded to talk to the judges to understand why the matches had been done the way they were. They would say things to the judges like, "Can't you see that the impression of long horizontal lines in this Receiver's transcript matches the long shadows in this Sender's transcript?". But the judge might reply, "Yes, I can see that, but I thought it much better matched the impression of long horizontal lines sent from the hall with a checkerboard pattern on the floor." The matching that had been so strongly perceived by the Senders and Receivers, who after all knew the correct answer, is called "subjective validation", sometimes considered to be a variation of "confirmation bias". So far, it appears that experimental results showing ESP invariably disappear when the experimental protocols are improved. It was Gardner's book which made me really appreciate the difficulties of doing scientific experiments properly. I hadn't gotten that depth of understanding by being taught the scientific method in high school. And despite engineering being based on science (as an engineer, I view myself as a sort of "applied scientist"), I wasn't taught these things at MIT either. In fact, I think this is something that most people never learn, and our public discourse about science suffers because of it. I've come to a conclusion that many people seem to find controversial: science is the only source we have of genuinely reliable information; there is no other. More about that next week.
Note 1: Fads and Fallacies in the Name of Science, copyright 1957 by Martin Gardner, originally published under the title In the Name of Science, copyright 1952. Dover Publications, Inc., New York. Although copies of this book are still available from Amazon, books in those days didn't have an ISBN (International Standard Book Number). [return to text] Note 2: I wish I could remember the source of this quote - I can't find it on the internet. [return to text] Note 3: "Controlled" means there's both an experimental group of subjects who receive the treatment being tested, and a "control" group that does not. "Double-blind" means that neither the subjects nor the experimenters know who is in the experimental group and who is in the control group. This implies that the control group receives a "sham" treatment which is indistinguishable from the actual treatment - if the treatment is a medication, the sham medication is called a "placebo". Note that it's important that even the experimenters don't know who's getting the treatment and who is not. Even trained scientists, professional researchers, are subject to bias if they are not kept in the dark. [return to text] Note 4: On March 26, 2014, Boston.com's "Science in Mind" blog carried an article by Carolyn Johnson entitled, ALS institute highlights irreproducible results, a major problem in science. It notes that a team could only successfully repeat 6 of 53 landmark research studies that had been cited hundreds of times, "The scientists contacted some of the original teams that produced the results and asked them to repeat the experiments, with one simple change: that the scientists conducting the experiment were blinded to which group received the intervention and which had not. With [that] simple change, they could not reproduce their results." That's appalling! To perform an experiment that could have easily have been double blind in the first place, but wasn't, is, to my mind, incompetence. And that incompetence is compounded by any journal that would accept a paper under those conditions. The scientific method only works if researchers actually use it. [return to text] Note 5:
Although I found the link in this paragraph to a Wikipedia article on the work I discuss here, I don't remember where I saw it originally described, and it's possible there are errors in my recollection. [return to text]
|