Library

Popular Keywords

Categories

No Record Found

View All Results

Research Library

Extensive collection of research studies, literature reviews, and policy papers, greatly benefit professionals in a variety of fields, including academia, government, and industry.

Policy Library

A collection of research and articles related to public policy and breed restrictions.

SPARCS Video Library

A collection of talks from the Society for the Promotion of Applied Research in Canine Science.
Behavior, Genetics, and Labels

Canine Genome Infographic

Dog Behavior in Animal Shelters

Is Breed Relevant

Dogs & Genetics

Visual Identification of Dogs

Labeling Non-Purebred Dogs
Public Policy

BREED RESTRICTIONS

Breed-Specific Legislation

BSL Map

Insurance Restrictions

Renting to Dog Owners

Effective Policy

Policy Library

LAW ENFORCEMENT RESOURCES

Law Enforcement and Dog Encounters

DOG BITES

Reported Bites Decreasing

Medically Attended Bites

Causes and Prevention

Dog Bite Studies & Breed

Dog-Related Insurance Claims

Dog Bite-Related Fatalities
Insights & Articles

Summary & Analysis: Comparison of SAFER behavior assessment results in shelter dogs at intake and after a 3-day acclimation period

Overview

This 2015 study by Bennett et al. offers one of the few direct investigations into the test-retest reliability of a widely used shelter behavior evaluation, the SAFER assessment, and raises serious concerns about its consistency and practical value. By administering the test to the same dogs just three days apart, the researchers found that scores often shifted substantially, sometimes in ways that could alter adoption or euthanasia decisions. While some subtests showed moderate agreement, others—especially those intended to detect aggression or fear—had poor reliability. Behavioral changes were neither consistently in one direction nor clearly attributable to stress reduction, therefore undermining assumptions that behavior stabilizes with time in the shelter. These findings reinforce the conclusion that behavior evaluations like SAFER lack the consistency needed for high-stakes decision-making. In practice, this means that the outcome of a dog’s life may depend more on the timing of the test than the dog’s actual personality, thereby highlighting the need to rethink reliance on standardized behavior assessments in sheltering contexts.

Summary and Analysis

To best understand this article in the context of the behavior evaluation literature, please see National Canine Research Council’s complete analysis here.

Article Citation:

Bennett, S. L., Weng, H., Walker, S. L., Placer, M. & Litster, A. (2015). Comparison of SAFER behavior assessment results in shelter dogs at intake and after a 3-day acclimation period. Journal of Applied Animal Welfare Science, 18(2), 153-168. doi: 10.1080/10888705.2014.999916

National Canine Research Council Summary and Analysis:

This article is included as one of the few studies that examined test-retest reliability. Based on previous studies that showed changes in stress over time in shelters, the researchers hypothesized that behavior evaluation results would reflect parallel changes. If so, this would beg the question of which results are more representative of the animal’s typical and likely behavior in a home and would have overall negative implications for the validity of behavior testing. Bennett, Weng, Placer, and Litster (2015) investigated this possibility by administering the Meet Your Match Safety Assessment for Evaluating Rehoming (SAFER) behavior evaluation twice to 33 shelter dogs.

The authors chose to use SAFER because it is commonly used in shelters throughout the United States. Interestingly, it has not been validated in the peer-reviewed literature, but because of its association with the ASPCA it is well regarded and widely implemented. This choice bolsters the study’s external validity.

Seven subtests were administered. First, the experimenter held the dog’s head and gazed into its eyes (“Look). The second test involved gently grasping fur and skin along the dog’s body (“Sensitivity”). The third subtest was an attempt to initiate play by speaking excitedly and lightly poking the dog (“Tag”). During the fourth subtest the evaluator said, “squeeze” and then gently squeezed the dog’s leg and paw (“Squeeze”). This was repeated to see if the dog would respond to the vocal cue. The fifth and sixth subtests used a plastic hand to take away food and toy items, respectively (“Food Behavior” and “Toy Behavior”). Finally, in the seventh subtest the subject was lead into a room occupied by a second, passive dog. Initial approach behavior was recorded, but the dogs were not allowed to touch or interact further (“Dog to Dog Behavior”).

The same certified evaluator was used on both days, as was the same assessor. The same helper dog was used on day 0 and day 3 for half (17) of the dogs tested. The remaining 16 dogs experienced different dogs on days 0 and 3.

For each subtest, dogs received a score from 1-5, with higher numbers indicating escalating aggressive behaviors. Specific behaviors were not listed for every score, but a 3 might indicate “signs of fear, high arousal, or inhibited aggression,” and a 5 includes growling, lunging, or attempting to bite. According to the assessment’s creator, a score of 3 should be interpreted as a recommendation that the dog might benefit from behavior management or modification. For a more detailed description of the scoring procedure, see the summary and analysis of Bennett et al., 2012).

When analyzing the data, the researchers were particularly interested in cases where scores changed by at least 2 points from day 0 to day 3; these differences were used to calculate percent discordance, or the percent of the sample for each subtest in which scores changed at least 2 points between the two tests. They felt this was of practical importance because differences of this magnitude could conceivably result in different recommendations and vastly different outcomes for dogs (i.e., life or death).

There was at best moderate agreement for 3 out of 4 tests’ results studied between days 0 and 3 for this sample. There was little agreement between days 0 and 3 for the first subtest (“Look”); discordance was 15% and weighted kappa was 0.28. Moderate agreement was found for the “Sensitivity” and “Tag” subtests (4% discordance for both, and kappa equal to 0.59 and 0.41, respectively). The first “Squeeze” test showed poor to moderate agreement (8% discordance, kappa of 0.22) while the second squeeze test showed no discordance (kappa = 0.78). Ninety-two percent of the tested dogs scored a 1 or 2 for both “Squeeze” assessments. More than half of the subjects did not have data on both days for the “Food Behavior” subtest due to lack of interest in the food. For the dogs that did have both data points, agreement was poor; discordance was 18% and kappa was 0.50. There was excellent agreement (no discordance) for the “Toy Behavior” subtest with all dogs (except one) scoring a 1 for both assessments. The remaining dog scored a 1 and a 2 on day 0 and day 3, respectively. Finally, moderate agreement across assessments was reported for the “Dog to dog Behavior” subtest; discordance was 3% and kappa = 0.33.

It is interesting to note that for the subtest with the highest agreement (“Toy Behavior”), the topography of behavior was on the lower end of the scale. On higher behavior scores, discordance increased and kappa decreased. Moreover, when behavior did change between assessments, it did not change in a consistent direction. For example, for the “Look” subtest, there were two dogs who scored 5 on day 0 but a score of 1 and 2 on day 3, and there were two dogs who scored 2 on day 0 but had a score of 5 on day 3. There is no clear directional change in behavior.

The most important finding from this study is that even over as short a time period as 3 days, dogs’ behaviors can change drastically which may result in vastly different recommendations when these are based on a behavior assessment. This points to a fundamental lack of reliability and external validity for this behavior assessment, which again raises the question of whether these types of evaluations should be used to determine a dog’s fate. The authors recommend avoiding testing when dogs are particularly stressed as well as seriously considering the dog’s welfare when determining the time of test, and they attributed the differences in results to changes in stress level due to acclimatization to the shelter environment. However nothing in the data demonstrates that the change was anything other than simple unreliability of the test.

Abstract and Link to Purchase Full Text of the Original Article:

https://www.ncbi.nlm.nih.gov/pubmed/25603466

What’s in a Name? Effect of Breed Perceptions & Labeling on Attractiveness, Adoptions & Length of Stay for Pit-Bull-Type Dogs

October 2, 2023

Comparison of behavioural tendencies between “dangerous dogs” and other domestic dog breeds–Evolutionary context and practical implications

September 26, 2023

Ancestry-inclusive dog genomics challenges popular breed stereotypes

June 10, 2022

Saving Normal: A new look at behavioral incompatibilities and dog relinquishment to shelters

January 6, 2022

Canine Behavior Research & Policy eNewsletter

follow:

Popular Keywords

Categories