Designed and built by Laurel Aynne Cook
© Laurel Aynne Cook 2024
AMAZON MTURK (& OTHERS) vs. QUALTRICS PANELS vs. STUDENT PARTICIPANTS
Crowdsourced data (e.g., using CrowdFlower, Clickworker, or Amazon Mechanical Turk as sampling pools) is a fascinating
topic. As an early adopter of crowdsourced data collection methods, I have been delighted to speak about this topic in a
variety of venues. For example, I was invited to give a presentation on this topic during the 2017 Marketing and Public Policy
preconference and again last fall during a doctoral seminar. Recent developments in crowdsourced data (e.g., foreign and
domestic “click farms” and self-selection bias) threaten researchers’ ability to use theory-driven samples (Goodman and
Paolacci 2017). While some research has examined the quality of various Internet-based samples (e.g., MTurk, Qualtrics,
Prolific), there is a gap in our understanding of (A) participants’ motivations to misrepresent themselves and (B) measures
that prevent misrepresentation. Misrepresentation is a predominant issue in studies that use screeners. For example, if I
have a study designed to test the effects of nutrition label formats for consumers with diet-related diseases, MTurk
participants can easily become “imposters” by pretending (through self-report measures) to have diabetes or hypertension.
Related to this discussion on crowdsourced data is the concept of participant investment- a less frequent, but positive
outcome from using online marketplaces for research. A review of the literature yielded no results for this phenomenon. My
interest in this topic came from recent experiences with MTurk participants. To provide my undergraduate students with
anonymous reviews of their team-created ads, I created a study where MTurk participants were shown 15 (or more) ads.
Student-selected measures were provided and participants could provide their own comments after each ad was shown.
Surprisingly, the average survey duration exceeded one hour despite a very modest payment of $0.25 (i.e., I was using my
personal funds). Participants were careful in their responses and provided qualitative feedback, in most cases, for each ad.
Moreover, the average response included 738 words (≈ 2 ½ pages). If you’re interested in this topic, download my
presentation (below) or contact me to discuss my current findings.
WORRIED ABOUT DATA QUALITY FROM ONLINE SAMPLING POOLS?
In the summer of 2018, a number of posts were made in a variety of social science outlets (e.g., Psychological Methods
Discussion Facebook group, Blogs, TurkPrime, Twitter, et cetera) about the quality of responses collected from online
sources (e.g., MTurk, Positly). The primary concerns about this type of data appear to center around (A) click-farms, (B) non-
human responses (survey bots), (C) foreign participants (i.e., people using VPN to bypass U.S. geolocation survey
restrictions), and/or (D) low involvement/ motivation (e.g., speeders). While I continue to strongly support the use of
crowdsourced data (I use Connect), there is undoubtedly a need to communicate to reviewers/AEs/editors the thorough data
cleansing process we employ (& to require a 99% approval rating!). High quality data are imperative to our research.
Fortunately, there are a number of helpful tools designed to make the data-cleansing process easier. Reference the updated
“Issues with Crowdsourcing Data” PDF (above) and the following:
(1) Attention Checks- these can be useful, but incorporate them into your survey with regard for a participant’s survey
experience. MTurk workers, for example, dislike feeling “tricked” out of their compensation when a survey is inundated with
attention checks/ speedbump Qs (e.g., “select none of the above”). Also, I recommend using attention checks that are
nondiscriminatory. For example, the traditional Stroop Test may discriminate against those who are colorblind. I’ve created a
Numeric Stroop Test that elicits a similar cognitive load, yet doesn’t require color-based answers. To consider an alternative
opinion about Attention Checks, read this Qualtrics blog from Dr. David L. Vannette (June 2017).
(2) Honeypot Questions- these are questions that can be read by bots (web crawling), but are not visible to human
participants. This suggestion is a personal favorite of mine. These sorts of questions lure garbage responses by bots (these
are often copy-and-pasted from Google search inquiries). This tool helps researchers easily identify bot responses. You’ll
need Javascript to code your question (i.e., in Qualtrics) properly. I can e-mail the script if you’d like a copy.
(3) Image-Based Responses- I hadn’t heard about this novel idea for survey responses until I read this thread on Twitter.
The idea is simple- have participants respond to a prompt [which may be related to your manipulations, for example, or as a
fun way to reduce “bubble hell” (survey monotony)] with a self-provided image. Computer Vision API can then be used to
code the images. Oriol J. Bosch and his colleagues have a useful article on this topic (see below).
(4) Identify Suspicious ISPs and GPS Coordinates- Use tools embedded into the TurkPrime platform (e.g., make sure the
“Block Suspicious Geocode Locations” and “Block Universal Exclude List Workers” boxes are checked- Tab 6. Plus,
inexpensive Pro features include “Block Duplicate IP Addresses” and “Block Duplicate Geolocation”. Additionally, you can
import a .CSV file into this site with columns that include the IP address, latitude, and/or longitude. Click the ‘Analyze’ button
and the site will carefully retrieve the ISPs and/or analyze your geolocation data. Afterwards, you can click on the ‘Download
Results’ button and rows with flagged GPS coordinates and ISPs will be provided.
(5) Seriousness Checks- These are simple checks where research participants are asked about the seriousness of their
participation. Researchers can then exclude self-declared “non-serious” participants from analysis.
(5) Commitment Requests- An experimental study in 2022 with nearly 4,000 participants showed that a simply question-
framed as a commitment check, resulted in the fewest quality-related issues. “We care about the quality of our survey data.
For us to get the most accurate measures of your opinions, it is important that you provide thoughtful responses to each
question in this survey. Do you commit to providing thoughtful answers to questions in this survey?” Choices include: “I can’t
promise either way,” “Yes, I will” and “No, I will not.”
REFERENCES
Aust, F., Diedenhofen, B., Ullrich, S. and Musch, J. (2013), “Seriousness checks are useful to improve data validity in online
research,” Behavior Research Methods, 45 (2), 527-535. (Link)
Bai, H. (2018), “Evidence that A Large Amount of Low Quality Responses on MTurk Can Be Detected with Repeated GPS
Coordinates.” (Link)
Byrd, N. (2023), “Reflection-Philosophy Order Effects and Correlations: Aggregating and comparing results from mTurk,
CloudResearch, Prolific, and undergraduate samples. (Preprint Link)
Bosch, O. J., Revilla, M., & Paura, E. (2018), “Answering Mobile Surveys with Images: An Exploration Using a Computer Vision
API,” Social Science Computer Review. (Link)
Chandler, J., Rosenzweig, C., Moss, A. J., Robinson, J., & Litman, L. (2019), "Online panels in social science research: Expanding
sampling methods beyond Mechanical Turk," Behavior Research Methods, 51(5), 2022-2038. (Link)
Chandler, J., Sisso, I., & Shapiro, D. (2020), “Participant carelessness and fraud: Consequences for clinical research and
potential solutions,” Journal of Abnormal Psychology, 129(1), 49. (Link)
CloudResearch. (2018), “After the Bot Scare: Understanding What's Been Happening with Data Collection on MTurk and How
to Stop it.” (Link) **ALL of the CloudResearch blog posts are useful. I recommend reading them regularly!**
CloudResearch. (2021), “4 Strategies to Improve Participant Retention In Online Longitudinal Studies.” (Link)
Curran, P. G., & Hauser, K. A. (2019), "I’m paid biweekly, just not by leprechauns: Evaluating valid-but-incorrect response rates
to attention check items," Journal of Research in Personality, 82, 103849. (Link)
DeSimone, J.A., Harms, P.D. and DeSimone, A.J. (2015),” Best practice recommendations for data screening,” Journal of
Organizational Behavior, 36 (2), 171-181. (Link)
Dennis, S. A., Goodson, B. M., & Pearson, C. (2018), “MTurk Workers' Use of Low-Cost 'Virtual Private Servers' to Circumvent
Screening Methods: A Research Note.” (Link)
Goodman, J. K., & Paolacci, G. (2017). Crowdsourcing consumer research. Journal of Consumer Research, 44(1), 196-210.
(Link) JCR Tutorials in Consumer Research
Hauser, D.J. and Schwarz, N., (2016), “Attentive Turkers: MTurk participants perform better on online attention checks than
do subject pool participants,” Behavior Research Methods, 48 (1), 400-407. (Link)
Jaeger, S. R., & Cardello, A. V. (2022), “Factors affecting data quality of online questionnaires: Issues and metrics for sensory
and consumer research,” Food Quality and Preference, 102, 104676. (Link)
Kees, J., Berry, C., Burton, S., and Sheehan, K. (2017), "An analysis of data quality: Professional panels, student subject pools,
and Amazon's Mechanical Turk," Journal of Advertising, 46(1), 141-155. (Link)
Kim, D., McCabe, C., Yamasaki, B., Louie, K. & King, K., (2018), “Detecting random responders with infrequency scales using
an error-balancing threshold,” Behavior Research Methods, 50 (5), 1960-1970. (Link)
Kostyk, A., Zhou, W., & Hyman, M. R. (2019), "Using surveytainment to counter declining survey data quality," Journal of
Business Research, 95, 211-219. (Link)
Litman, L., Moss, A., Rosenzweig, C., and Robinson, J. (2021), "Reply to MTurk, Prolific or panels? Choosing the right audience
for online research," SSRN Preprint. (Link)
Matherly, T. (2019), “A panel for lemons? Positivity bias, reputation systems and data quality on MTurk,” European Journal of
Marketing, 53 (2), 195-223. (Link)
Paolacci, G., & Chandler, J. (2014), “Inside the Turk: Understanding Mechanical Turk as a participant pool,” Current Directions
in Psychological Science, 23(3), 184-188. (Link)
Prims, J. P., Sisso, I., & Bai, H. (2018), “Suspicious IP Online Flagging Tool.” (Link)
Salehi, Niloufar, and Michael S. Bernstein (2018), “Ink: Increasing Worker Agency to Reduce Friction in Hiring Crowd
Workers,” ACM Transactions on Computer-Human Interaction (TOCHI), 25 (2), 10-27. (Link)
Sharpe Wessling, K., Huber, J. and Netzer, O. (2017), “MTurk Character Misrepresentation: Assessment and Solutions,”
Journal of Consumer Research, 44(1), 211-230. (Link)
JCR Tutorials in Consumer Research
Crowdsourcing Data Collection
Explore some of the things I’ve found helpful in my career as a researcher