To best understand this article in the context of the behavior evaluation literature, please see National Canine Research Council’s complete analysis here.

Article Citation:

Bollen, K. S. & Horowitz, J. (2008). Behavioral evaluation and demographic information in the assessment of aggressiveness in shelter dogs. Applied Animal Behavior Science, 112, 120-135. doi: 10.1016/j.applanim.2007.07.007

National Canine Research Council Summary and Analysis:

Bollen and Horowitz’s (2008) study is an interesting attempt at combining a retrospective and prospective approach to validating a behavior evaluation. They tested dogs who were owner-relinquished and comparing owner-reported canine behavior histories with subsequent evaluation results and then with follow up with adopters. The authors hoped that in conjunction with behavior histories and demographic data, the evaluation would prove useful in predicting aggression and in-home behaviors. While the approach was novel compared to other validity studies, the study falls short in that adequate controls were missing and the conclusions reach beyond the reported data.

Over 2000 dogs were subjects in this two year long longitudinal study in New England, USA. The majority of the dogs in the sample were owner-relinquished. Upon intake, relinquishing owners were asked to complete a survey with open-ended questions regarding the dog’s behavioral history. There were several stages of data collection, beginning when dogs entered the shelter. Demographic variables (breed, age, sex), behavior history, behavior evaluation (Sue Sternberg’s Assess-A-Pet), follow-up phone calls, and returns of the dog within six months of adoption were recorded and analyzed using multiple statistical models. For analyses, all data were grouped categorically.

Though breed was recorded as a variable, it was determined based on morphology; because extensive research has shown that visual breed identification is unreliable, the breed comparisons are not meaningful and thus not further discussed in the present review.

The behavior evaluation was comprised of nine parts and was conducted by one of the study authors. Descriptions of behavior during each test were recorded on a standard form. The tests included components such as teeth examination, arousal during play, food and toy possession (assessed with a rubber hand), and on-leash introduction to another dog, among others. Each of the nine tests was judged as pass or fail. “Stiffening” and “slight growling” were deemed as mildly aggressive. Dogs who failed three or more of the nine tests were classified as failing the entire evaluation. Dogs that exhibited behaviors deemed as serious aggression, such as attempting to bite, or lunging at the evaluator while growling or snarling, failed the evaluation based on that behavior regardless of how many component tests it was exhibited in. (It’s worth noting that the study’s definition for aggression was “overt behavior or intent by an organism to injure or otherwise inflict noxious stimulation towards another organism.” Stiffening does not seem to fit this definition. Moreover, most behaviorists would agree that defining a behavior by the perceived intent of the animal is problematic; we do not know the animal’s intent, thus only overt, observable behavior should be recorded and analyzed.)

Thirty-nine percent of the subjects (n = 796) failed their behavior evaluation, of which 759 were not placed for adoption and were euthanized. As a result, no post-adoption comparison could be conducted for dogs who had failed behavior evaluations.  Eliminating this large percentage of the study group from the prospective aspect of the validation process renders any calculations of sensitivity, specificity, or positive or negative predictive value impossible, so we will not include these findings in the remainder of this review and focus on the retrospective part of the study, the comparison of evaluation results with owner surrender reports.  Of the 217 subjects whose intake history indicated aggressive behaviors in the past, 195 failed at least one component of the evaluation, though it is unclear how many of this subset failed the evaluation entirely.

From a design perspective, the study lacked some experimental controls. The evaluator knew which dogs had been relinquished for “aggression.” The authors argue that standard scoring sheets were used to prevent bias, but unconscious biases can have profound effects on the data. For example, the researcher might be more tense when working with a dog that was deemed aggressive by a former owner, which could in turn affect the dog’s behavior. This knowledge could also bias the evaluator when documenting behaviors. Furthermore, only one evaluator recorded the dogs’ behavior; there is no documentation of a second rater or a check for inter-rater reliability.


Some of the conclusions that were made in this study are not supported by the data. For example, the authors note that returns for aggression, as well as overall return rates, decreased following implementation of the behavior evaluation. They imply a causal relationship between the introduction of behavior evaluations and a slight reduction in returns (from 19% to 14%), but they do not account for potential confounding variables such as community education programs, disease, weather, economy, etc. or simply statistically normal fluctuation. Any number of history effects could be responsible, and to account for this there would need to be at least one additional treatment manipulation (i.e., removal of behavior evaluations) to see if return rates reverted to the baseline measure.

A puzzling statement was made regarding subjects who failed the evaluation. Bollen and Horowitz suggest that the dogs who were euthanized due to failing the evaluation would have likely been aggressive in the home because the majority failed “multiple component tests.” However, we cannot say what the euthanized dogs would have or would not have done in the home. That is the crux of the problem not just in this study, but with behavior evaluations in general; for dogs who fail the evaluation, we do not know whether it was a true or false positive because the dog is killed before in-home behavior can ever be assessed.

Finally, the exceptions made in this study are revealing; six dogs who failed the evaluation were nonetheless placed for adoption. This suggests that shelter staff and researchers may lack faith in the predictive power of the tests they are testing and doubt that dogs who fail truly represent a danger to the community.

Abstract and Link to Purchase Full Text of the Original Article: