A Hard Test of Individual Heterogeneity in Response Scale Usage: Evidence From Qatar
Abstract
A common approach to correcting for interpersonal differences in response category
thresholds in surveys is the use of anchoring vignettes. Here we present results from
the first applications of anchoring vignettes in Qatar and, to our knowledge, the Arab
world. We extend previous findings both geographically and substantively to show
that a range of social and demographic variables account for important variation in
response scale use in the domains of economic well-being and political efficacy, and
that this variation leads to substantively misleading conclusions when not appropriately
modeled. Qatar’s exceptionally homogeneous citizenry presents a uniquely hard
test of response scale heterogeneity, and our results suggest that potentially obfuscating
differences in individual reporting styles are even more ubiquitous than previously
known.
When using surveys to measure complex questions and concepts, the issue of
interpersonal incomparability must be addressed. Individuals understand concepts
and questions differently: A Yemeni’s moderate economy may appear as
destitution to a Kuwaiti; the political openness of a typical Latvian may be
illiberal to most Swedes. The question of how to account for such interpersonal
heterogeneity in response scale usage, also called differential item functioning
(DIF), continues to garner much scholarly investigation (e.g., Aldrich
& McKelvey, 1977; Alvarez & Nagler, 2004; Brady, 1985; King, Murray,
Salomon, & Tandon, 2004; King & Wand, 2007; Stegmueller, 2011).
Since its introduction in political science more than a decade ago by King
and colleagues (2004), the use of anchoring vignettes to correct for DIF in survey responses has spread to diverse areas of social, economic, and health
research (e.g., Bratton, 2010; Chevalier & Fielding, 2011; Hopkins & King,
2010; Kapteyn, Smith, & van Soest, 2007; King & Wand, 2007; Kristensen &
Johansson, 2008; Paccagnella, 2013; Rice, Robone, & Smith, 2011; Salomon,
Tandon, & Murray, 2004; Wand, 2013). The approach measures and controls
for individual differences in response scale by first asking general self-assessment
questions. These self-assessments are then supplemented with follow-up
vignettes that portray relevant aspects of the lives of hypothetical individuals,
which respondents rate according to the same scale. Because the vignettes are
anchored to concrete cases, variability in assessment can be attributed directly
to differences in the subjective scales used by respondents, offering both an
individual-level measure of, and method of correction for, DIF.
Given obvious cross-country disparities in social, economic, and political
experiences, much of the resulting research agenda has used anchoring vignettes
as a way to adjust for differences in understanding concepts across
distinct national populations. Such cross-group comparison is expected to
introduce DIF on account of often unspecified underlying differences in ‘‘culture,’’
frequently operationalized as a simple dummy variable. A common
outcome is that, after accounting for variability in response category thresholds,
anomalous or curious findings—for instance, higher self-ratings of political
efficacy among citizens of a nondemocratic state compared with those of
a democracy (King et al., 2004)—are shown to be spurious.
In practice, then, original theoretical concern over interpersonal incomparability
bias in surveys has largely proceeded instead as investigation of intergroup
incomparability, understating or ignoring the effects of response scale
heterogeneity within culturally cohesive populations—that is, among individuals
qua individuals. Yet, more recent anchoring vignette applications, most
notably in the area of health, have demonstrated that even basic demographic
factors such as sex, education, and work status can impact how people use
survey response scales, and that this individual-level heterogeneity can lead to
misleading conclusions if not modeled appropriately (Angelini, Cavapozzi, &
Paccagnella, 2012; Grol-Prokopczyk, 2014; Grol-Prokopczyk, Freese, &
Hauser, 2011). There is also evidence, again from the field of health, that
scale use may vary over time among individuals (Angelini, Cavapozzi, &
Paccagnella, 2011). However, it remains unclear the extent to which these
findings apply to other, nonhealth domains of research, or outside the context
of the United States and Western Europe where extant studies have been
conducted.
Here we extend the analysis of the individual sources of DIF through the
first application of anchoring vignettes in the Persian Gulf emirate of Qatar
(and, to our knowledge, the Arab Middle East), administered in three original
and nationally representative surveys conducted during 2013–2015. Beyond geographical extension to a new world region, we also expand thematically on
the list of topics studied with a view to understanding inter-individual differences
in survey scale usage, examining for the first time the issue of selfassessed
economic well-being in addition to feelings of political efficacy. We
find that a range of social and demographic variables—age, sex, education, and
social class—account for important variation in response scale use even within
the cohesive cultural group represented by Qatari nationals. Qatar’s exceptionally
high (citizen) homogeneity thus presents a uniquely hard test of individual
heterogeneity in response behavior, and our positive findings demonstrate that
demographically linked scale use differences are even more ubiquitous than
previously known.
Collections
- Social & Economic Survey Research Institute Research [280 items ]