Despite evidence to the contrary,1 references to “validated” and “reliable” provocative canine behavior evaluation instruments continue to appear in the literature and in conversations among shelter personnel, researchers, and other stakeholders. Patronek, Bradley, and Arps demonstrate in a 2019 article in the Journal of Veterinary Behavior that no canine behavior evaluation used for shelter dogs meets accepted scientific criteria that would justify routine use in shelters, and argue that there should be, “…a moratorium on any uses of… [formal behavior] evaluations as the sole determinant of a dog’s fate”.2
Drawing on widely accepted, species-neutral principles used for validating diagnostic tests in medicine and psychology relying on subjective assessments, the authors evaluated 25+ years of research on the topic through an in-depth review of 17 published studies that assessed the validity or reliability of battery test evaluations “used or intended for screening shelter dogs for behavior labeled aggressive and/or for adoption suitability.”2 As the paper’s Figure and Table show, while a few specific measures for reliability and validity have been statistically significant within individual experiments, often in populations of owned dogs, the studies ultimately present no evidence that any canine behavior evaluation has come close to meeting accepted standards for reliability and validity.2 “It would also not be appropriate,” the authors point out, “to daisy chain the sporadic statistically significant results from inconclusive studies to argue that the collective body of work is somehow greater than the sum of the parts, thereby disregarding the bulk of the evidence.”2
These findings2 build upon those of a 2016 paper in the same journal which demonstrated mathematically why, even if there were a scientifically validated behavior evaluation, “for any plausible combination of sensitivity, specificity, and prevalence of biting and warning behaviors, a positive test would at best be not much better than flipping a coin, and often be much worse, because many of the dogs who test positive will be false positives.”3,4,5
The 2019 paper confirms that in fact, in the populations studied in these experiments, some of which specifically attempted to enroll dogs with problematic behaviors, the false-positive error rates reported and/or calculated ranged from 11.8% to 53.7%.2 These errors were estimated to be much larger (28.8% to 84%) if used in shelter dogs with the typical low-prevalence of adoption-preventing behaviors.2
Patronek, Bradley & Arps argue that one reason (among many) that behavior evaluations have performed so poorly in evaluation studies is that they lack what is known as “Face validity” – in other words, that the very premise that the provocations used at a single time during a dog’s stressful experience in a shelter will predict future behavior at a different time and place may be fatally flawed.2
Finally, the authors question the logic of repeated calls for more research, and instead contend that given the lack of success of prior efforts, despite being conducted by skilled investigators under good to ideal conditions that were much more meticulously controlled than could ever be expected in the broad population of animal shelters in daily life, further studies are not likely to be any more successful.2
The ethical issues are profound in a real-life setting. As has been noted in a leading textbook about human diagnostic test evaluation, “A scale that places a ‘normal’ person in the abnormal range (a false-positive result) or misses someone who actually does have some disorder (false-negative result) is not too much of an issue when the scale is being used solely within the context of research. It will lead to errors in the findings, but that is not an issue for the individual completing the scale. It is a major consideration, though, if the results of the scale are used to make a decision regarding that person – a diagnosis or admission into a programme. In these circumstances, the issue of erroneous results is a major one, and should be one of the top criteria determining whether or not a scale should be used.”6
Read the Article in Full Here:
https://www.sciencedirect.com/science/article/pii/S1558787819300012