In this issue
Defending Animal Cognition Research, and Keeping Clever Hans in the Barn
By Michael J. Beran
As has no doubt happened with many of you, my email inbox has filled these past few weeks with messages from colleagues and non-scientist friends asking about misconduct and animal cognition research. Frustrating as it is to respond to those questions, more concerning issues arise from media speculations wondering whether animal cognition research will be damaged by recent events. With funding as tight as it is for comparative psychologists and animal cognition researchers, the last thing we need is the perception that our methodologies are weak, that our analyses are flawed, or that our interpretations are exaggerated. It is one thing to reassure everyone that the field remains strong, but another to convince them of that fact. This set me to thinking about current practices in animal cognition research, and what we might try to improve so that its reputation remains strong. Obviously, ethical conduct in research is critical. This goes without saying, but it also goes without saying that most in our field already conduct themselves ethically. So, I am not going to discuss that issue. Instead, I want to focus on the mistakes that even highly ethical researchers can still make, and how those mistakes can threaten our science. Much of this is old news for those of you reading this article, but it is offered as a call to arms to renew our commitment to the ideas below, and to promote them to any of our colleagues who do animal cognition research but who may not have a background in experimental and comparative psychology. As members of Division 6, we bear the burden of protecting the integrity of behavioral, neuropsychological, and cognitive research with animals, and I believe this requires addressing the concerns I raise below.
One thing that has slowly begun to rear its ugly head again in animal cognition research is the potential for inadvertent cuing of the animals that we test. So-called Clever Hans effects have been front-and-center in the consciousness of comparative psychologists for the last century, but it seems that recent years have led to more lax attempts to control for those effects in the broader animal cognition community. Although I am sure nearly all of you know the story, to remind you, Clever Hans was a horse with apparently amazing skills for judging quantities, performing math problems, and other complicated feats. His performance was extraordinarily high, and initially was confirmed by many respected comparative psychologists. However, more careful examination of exactly what Hans was doing showed that his cleverness was not his arithmetic, but rather his acute behavioral observations of the people around him. For example, he modified his behavior on the basis of how people responded to his foot tapping. When they knew the answer, and gave even the smallest of cues (through postural shifts or other often unconscious movements), he used those to change his behavior and produce what appeared to be intelli-gent responses. But, if the examiner did not know the answer, no cues were present, and Clever Hans failed.
The moral of the Clever Hans story is twofold: the main point, and one that bears repeating loudly and frequently, is that we must carefully design today's paradigms for assessing animal behavior and cognition so that they do not offer the same kinds of cues to animals that were there for Hans. Frankly, some researchers seem to have forgotten (or never realized) this important point. Papers now appear more and more frequently in which investigators present animals with discrete choices, and they record the choices themselves. There are some attempts made to avoid the most blatant cues, with the wearing of hats or sunglasses, or never making eye contact with the animals. This is good, but it is not enough. If the person scoring the response knows the correct answer, and the animal subject can see that person, cuing is possible, and I would argue that the data cannot be trusted completely. It is crucial that at least some trials be given in which the person scoring the response cannot know whether it is correct or not, or that the animal subject making the response cannot see an experimenter who does know the correct choice and records the response of the animal. Short of this, concerns could be raised about the data, and reviewers should be more stringent in requiring these controls. But, more and more frequently, they are not. Papers are published without these controls, and even more concerning is that it seems as if some students are not even aware of this need. I recently experienced this multiple times when listening to students outline their plans for a research project containing no controls for the potential cuing of animals. And, even worse, students sometimes struggle to understand the need for such controls, making the mistake of assuming that if someone else also codes the video-recorded trials for reliability, cuing must not have been occurring. Having someone else score video recorded test sessions is a good thing for ensuring that we are correctly interpreting re-sponses made by animals, but it offers no protection from the cuing of those animals and should not be mistaken for such a control.
The second and perhaps less considered lesson from Clever Hans is the need for replication. As I mentioned, Hans initially was given the stamp of approval from a panel of experts who agreed that his performance was real. It took a second examination, by Oscar Pfungst, to show the world what was really going on with Hans, and why he was really 'clever.' Today, replication is sometimes undervalued both in the publication record and certainly by the funding agencies. When big events such as retractions occur, it may suddenly be viewed as highly valuable for others to go back and re-test previous findings, but in general this is not true. At best, we value giving the same tests to new species, in an effort to broaden our phylogenetic map for some behavioral/cognitive capacity. But testing the same spe-cies in a new lab, with roughly the same methodology, often is viewed as a less valuable contribution to science. Even worse, if the replication fails, it may not be published, or if it is, it is published in a different (and, usually less prestigious) journal, where it may be ignored as the original work continues to be cited as evidence that ―species X does this. There are, of course, exceptions to this, with some areas generating years-long efforts by multiple labs to better understand a species' capacity for a given behavior, largely driven by failures to rep-licate each other's work. But, if one were to look through a year's worth of articles on animal cognition in any journal, and make a list of what capacities were claimed, only a minority of those claims would be tested subsequently by new teams in new labs. There are good―practical‖ reasons for this of course, not the least of which is that such replications are highly unlikely to be part of a funded research program, and are also―costly because of their lowered―value in terms of innovativeness.
What we should be focusing on is the idea of conceptual replication. A conceptual replication involves repeating a previous study in an attempt to show the same capacity even though you use a different methodology. This is necessary because replicating a potentially flawed study using the identi-cal methodology does not increase confidence in the interpretation of those data. I would argue that conceptual replications in particular become more important as a function of how highly complex the cognitive process is under investigation. For example, in human cognition, the idea of―false memories is strongly supported by the fact that they can be generated in so many different circumstances (e.g., eyewitness testimony, serial list memory, etc.). We should strive for the same outcome in our field, in that new methods for probing complex cognitive abilities such as theory of mind, metacognition, inequity aversion, rational decision-making, self-control, causal understanding, and others would show consistency across testing paradigms. For example, if rhesus macaques are metacognitive, they should be so in multiple kinds of tests, not just one. And, small changes to a procedure that is pre-sented as a test for a higher-order cognitive capacity should not lead to large changes in performance. If it does, we should reconsider the mechanisms supporting the observed behaviors. The hope, however, is that conceptual replications can provide converging evidence of the cognitive processes that are being sought in animal behavior.
This brings me to one final issue – the increased sense that animal cognition research should strive to confirm that animals are cognitive and not simply associative in their responding. This feels like a goal dangerously close to that of assuming we are proving the alternate hypothesis by rejecting the null hypothesis. To argue that evidence of animal cognition discounts associative processes is misguided. Our goal should be to ask what capacities animals might have that are not accounted for by associative learning alone, but this does not mean that animals never learn associatively. Of course they do, as do we. The problem is that we sometimes design clever experiments, collect exciting data, but then bypass a critical assessment of those data in terms of what mechanisms might have produced them. Instead, we either ascribe those data to some empty construct that allows us to generate the most eye-catching title for our paper, or we skip any consideration of simpler explanations before moving on to more complicated ones. It may seem passé in some corners of animal cognition research to invoke the law of parsimony, in large part because that law has come to be incorrectly assumed as synonymous with―associative explanations rather than cognitive ones. But, this completely misses the point of the law. It has no theoretical perspective, it takes no a priori position on what animals can or cannot do, and it does not require that any time associative and cognitive explanations compete that the associative explanation must win out. To the contrary, if associative, non-cognitive models of some behavior are exceedingly complex whereas the cognitive processes are not, the law of parsimony demands that the cognitive explanation be taken as the more likely one. But, much work must be done to get to this point; methodologies must control for simpler associative processes that could otherwise explain away results that one promotes as reflecting cognition. Here, too, we must be careful and err on the side of caution, especially as we improve our efforts at minimizing cuing and as we increase our efforts at exact and conceptual replications. It will be damaging for our field if the broader scientific community and the public at large lose confidence in our objectivity and conservatism as they observe failed replications and confused efforts to classify the mechanisms that underlie animal behavior. A return to parsimony, more careful methodological design, and conservatism in promoting cognitive capacities--prior to conceptual replications that support such promotion--will strengthen our field, and demonstrate our resolve to remain objective, ethical scientists. I call on members of Division 6 to take the lead in these efforts.