This is the second of two articles detailing a research experiment that Â鶹´«Ã½AV undertook in 2023 to identify and address potential data quality issues in nonprobability opt-in web panels.
In the first article, we compared data quality across six panel providers and found that not all opt-in panels are alike. This second article describes the results of several approaches we investigated for handling “careless responders” -- respondents who fail to read questions or pay sufficient attention to them, and who provide random, dishonest or inattentive answers.
WASHINGTON, D.C. -- While opt-in panels can provide valuable research opportunities, they also present unique methodological challenges, partly because they can garner higher rates of careless responding compared with probability-based sampling approaches.
To address these challenges, first, researchers need to flag careless responders (for example, embedding questions designed to catch inattentive responders or using survey response time to catch those speeding through items). Then, researchers need to decide how to address these flagged cases. This is not a simple matter of removing all flagged cases, because doing so could inadvertently exclude valid respondents who fail a flag for reasons unrelated to data quality, such as misunderstanding a specific question or encountering technical issues.
Additionally, some flags are subjective. For instance, the literature disagrees on how to determine the appropriate threshold for speeding, and straightlining (selecting the same answer repeatedly on a series of questions to which varying responses would be expected) may sometimes reflect genuine, thoughtful responses. Overly strict removal criteria also risk introducing bias because they may disproportionately affect certain demographic groups, which could ultimately skew results.
In this article, we evaluate several strategies for managing careless responses in opt-in panel data. These results come from the same study described in Part 1 of this series but collapses data across the six opt-in vendors to form one combined dataset (N=6,178).
Key Findings
Commitment statements do little to prevent careless responding.
Much of the extant research[1] related to careless responding involves indexes designed to detect it after a participant completes a survey, such as failing attention checks or speeding through items. Survey design strategies that prevent careless responding from happening have been studied less.
Some researchers have suggested that asking respondents to make a commitment to providing high-quality responses before taking the survey may lead to better response quality, because it encourages them to reflect on the importance of thoughtful participation and primes them to approach the survey more conscientiously.
Â鶹´«Ã½AV tested the utility of presurvey commitment statements at preventing careless responding. Participants were randomly assigned to one of the following three groups: a control group in which no commitment statement was provided (n=2,084), or one of two test groups in which participants were asked to agree to answer the survey either honestly[2] (“This survey is for research purposes only. It is important that you answer each question honestly. There are no right or wrong answers. Do you commit to being honest when answering the questions in this survey?”; n = 2,035) or attentively[3] (“The quality of the answers that you provide will be checked for accuracy using sophisticated statistical control methods. Do you commit to paying attention when answering the questions in this survey?”; n = 2,059).
The results show that neither of the commitment statements significantly reduced the prevalence of careless responding compared with the control group, suggesting that this commonly used approach may not justify the additional space and time it occupies in surveys. These results were consistent when each of the 20 possible careless-responding flags were analyzed individually, and when each of the six panels were looked at individually.
Removing careless responders marginally affects data quality.
The survey research field lacks consensus on how strict data-cleaning procedures ought to be to improve data quality in opt-in panels. While it is common to remove respondents with multiple flags, determining how many flags are too many, and whether all flags are equally important, is subjective.
Therefore, we compared our results after using four different participant-removal strategies, based on the 20 possible careless-responding flags:
- Lenient (flagged 10 or more times; removed 1.02% of the total sample)
- Moderate (flagged six or more times; removed 9.79% of the total sample)
- Strict (flagged two or more times; removed 43.62% of the total sample)
- Custom (flagged for at least one of seven key[4] flags; removed 45.89% of the total sample)
The results revealed that revised estimates for the total sample did not change much across any of these strategies. For instance, the accompanying table depicts the average absolute percentage-point difference between observed results and 24 benchmark items (the difference between each item’s result and its benchmark value, treating all differences as a positive value, then finding the mean of the differences across all 24 items; see the PDF link at the end of Part 1 for more information on these benchmark items). While the two most stringent strategies for removing careless responders resulted in data that were closer to established benchmarks, these marginal improvements were achieved at the expense of removing almost half of the collected sample in each case, which may be untenable in most research studies.
While this pattern is consistent across all panels, the magnitude of improvements differed across panels. Removing careless responders was more helpful for panels with poorer data quality (like panels C and E; see Part 1). However, even then, removing careless responders had a relatively minor impact.
Poor-quality data can come from any demographic group.
We found that for all four removal strategies, the demographics of the excluded participants were somewhat skewed toward self-reporting as younger, lower educated, Hispanic, and from the South. It is unclear why these patterns may be more common in some groups than others. It may have to do with certain respondent characteristics, such as cultural differences in how people perceive or respond to questions, or the conditions of the opt-in panels themselves, such as people from certain groups receiving greater incentives or survey opportunities.
Regardless, researchers should always check demographic distributions when removing cases and make sure they can replace respondents who will be removed, in order to align with quotas and/or avoid demographically biasing their sample.
To further explore whether poor-quality data were more prevalent within certain demographic groups, we first looked at careless-responding rates. Although there were some patterns by age and ethnicity, we found that such rates were not always consistent or in the expected directions, especially when additional demographic variables were considered together. For instance, while Hispanic respondents were flagged more often than other ethnicities overall, once age was also considered, that result only held for middle-aged participants in our sample.
We also compared each demographic group’s proximity to demographic-specific benchmarks where available (without any removals). Here, we found that each demographic group’s proximity to its respective benchmark was also frequently inconsistent; in other words, deviations were not isolated to any one group.
For example, compared with a probability-based sample of the same ethnicities, Hispanic opt-in respondents provided responses that were 14.6 percentage points higher, on average, to a question about whether they had smoked at least 100 cigarettes in their entire life, while non-Hispanic White participants provided responses only 3.9 points higher, on average, to the same question.
However, these Hispanic participants also provided responses that were only 0.1 points higher, on average, to a question about having healthcare coverage, while non-Hispanic White participants provided responses that were 7.2 points lower to the same question.
Across every demographic-specific benchmark we could find from our 24 benchmark items, we observed some trends that warrant careful consideration, such as further proximity from many benchmarks for participants of color. However, it would be problematic to assume a priori that certain underrepresented groups in survey research are going to provide such poor-quality data that they ought not to be trusted through any opt-in provider. We offer the more balanced (and perhaps more unsatisfactory) conclusion that data quality issues might be observed within any demographic subgroup, and within any panel, and should therefore be considered on a case-by-case basis.
Conclusions
This study’s findings highlight several key insights and recommendations for how to address potential measurement-quality issues in opt-in panels.
First, rather than relying on commitment statements to reduce careless responding, researchers should focus on the basics of creating engaging survey designs. This includes ensuring that all items and instructions are extremely clear and interpretable to participants, and avoiding leading questions or making screeners so obvious that respondents might be enticed to be untruthful so they could screen in and earn rewards.
However, we also acknowledge it is difficult to mask certain screeners. For example, some respondents may be motivated to falsely claim they have certain demographic characteristics to gain larger incentives or more survey opportunities, and it is possible that respondents who are willing to lie about their demographics are also more likely to carelessly respond to surveys. This tendency may vary across opt-in providers, based on differences in their methods.
Second, we recommend relying on a few easily scalable and interpretable careless-responding detection methods and removing only the most flagrant careless responders via a multiple-hurdle approach. There is no silver bullet approach that fits all research agendas that will reliably and substantially improve data quality, so it is essential to be flexible and report the methodology used for all data-cleaning steps, for all studies.
While we found that the flagged careless responders in our study tended to be from underrepresented groups, we also found indicators of high and low data quality across and within all demographic groups. High-quality research can use opt-in samples, but we must be transparent about the various data quality issues that may arise on a study-by-study basis, including potential skews in the reliability of estimates across demographics.
Addressing data quality issues due to careless responders in opt-in panels is a multifaceted and evolving challenge that requires a flexible, nuanced approach. While detection methods, prevention strategies and removal strategies can help identify and eliminate some issues, they are not a panacea. Researchers must carefully select their data providers, design engaging surveys and transparently report their methodologies to ensure analytical integrity. By continuing to explore and refine these strategies, researchers can enhance the reliability and validity of data they obtain from opt-in panels.
[1] e.g, Abbey & Meloy, 2017; Arthur et al., 2021; Bowling et al., 2023; Curran, 2016; DeSimone et al., 2015; Huang & Wang, 2021; Meade & Craig, 2012; Schroeders et al., 2022; Vecchio et al., 2020; Ward & Meade, 2023.
[2] Adapted from Ward, M. K., & Pond, S. B. (2015). Using virtual presence and survey instructions to minimize careless responding on Internet-based surveys. Computers in Human Behavior, 48, 554–568. https://doi.org/10.1016/j.chb.2015.01.070
[3] Adapted from:
Paas, L. J., Dolnicar, S., & Karlsson, L. (2018). Instructional Manipulation Checks: A longitudinal analysis with implications for MTurk. International Journal of Research in Marketing, 35(2), 258-269. https://doi.org/10.1016/j.ijresmar.2018.01.003
and
Huang, J. L., Curran, P. G., Keeney, J., Poposki, E. M., & DeShon, R. P. (2012). Detecting and Deterring Insufficient Effort Responding to Surveys. Journal of Business and Psychology, 27(1), 99-114. https://doi.org/10.1007/s10869-011-9231-8
[4] These include the following flags, which often appear as reliable indicators of careless responding in the literature: instructed item (difficult), low-incidence items, postsurvey honesty item, postsurvey attentive item, postsurvey UseMe item, Speeding, and Maximum longstring straightlining.
To stay up to date with the latest Â鶹´«Ã½AV News insights and updates, follow us on X .