Laurel-Cook.com

Designed and built by Laurel Aynne Cook © Laurel Aynne Cook 2024

AMAZON MTURK (& OTHERS) vs. QUALTRICS PANELS vs. STUDENT PARTICIPANTS Crowdsourced data (e.g., using CrowdFlower, Clickworker, or Amazon Mechanical Turk as sampling pools) is a fascinating topic. As an early adopter of crowdsourced data collection methods, I have been delighted to speak about this topic in a variety of venues. For example, I was invited to give a presentation on this topic during the 2017 Marketing and Public Policy preconference and again last fall during a doctoral seminar. Recent developments in crowdsourced data (e.g., foreign and domestic “click farms” and self-selection bias) threaten researchers’ ability to use theory-driven samples (Goodman and Paolacci 2017). While some research has examined the quality of various Internet-based samples (e.g., MTurk, Qualtrics, Prolific), there is a gap in our understanding of (A) participants’ motivations to misrepresent themselves and (B) measures that prevent misrepresentation. Misrepresentation is a predominant issue in studies that use screeners. For example, if I have a study designed to test the effects of nutrition label formats for consumers with diet-related diseases, MTurk participants can easily become “imposters” by pretending (through self-report measures) to have diabetes or hypertension. Related to this discussion on crowdsourced data is the concept of participant investment- a less frequent, but positive outcome from using online marketplaces for research. A review of the literature yielded no results for this phenomenon. My interest in this topic came from recent experiences with MTurk participants. To provide my undergraduate students with anonymous reviews of their team-created ads, I created a study where MTurk participants were shown 15 (or more) ads. Student-selected measures were provided and participants could provide their own comments after each ad was shown. Surprisingly, the average survey duration exceeded one hour despite a very modest payment of $0.25 (i.e., I was using my personal funds). Participants were careful in their responses and provided qualitative feedback, in most cases, for each ad. Moreover, the average response included 738 words (≈ 2 ½ pages). If you’re interested in this topic, download my presentation (below) or contact me to discuss my current findings. WORRIED ABOUT DATA QUALITY FROM ONLINE SAMPLING POOLS? In the summer of 2018, a number of posts were made in a variety of social science outlets (e.g., Psychological Methods Discussion Facebook group, Blogs, TurkPrime, Twitter, et cetera) about the quality of responses collected from online sources (e.g., MTurk, Positly). The primary concerns about this type of data appear to center around (A) click-farms, (B) non- human responses (survey bots), (C) foreign participants (i.e., people using VPN to bypass U.S. geolocation survey restrictions), and/or (D) low involvement/ motivation (e.g., speeders). While I continue to strongly support the use of crowdsourced data (I use Connect), there is undoubtedly a need to communicate to reviewers/AEs/editors the thorough data cleansing process we employ (& to require a 99% approval rating!). High quality data are imperative to our research. Fortunately, there are a number of helpful tools designed to make the data-cleansing process easier. Reference the updated “Issues with Crowdsourcing Data” PDF (above) and the following: (1) Attention Checks- these can be useful, but incorporate them into your survey with regard for a participant’s survey experience. MTurk workers, for example, dislike feeling “tricked” out of their compensation when a survey is inundated with attention checks/ speedbump Qs (e.g., “select none of the above”). Also, I recommend using attention checks that are nondiscriminatory. For example, the traditional Stroop Test may discriminate against those who are colorblind. I’ve created a Numeric Stroop Test that elicits a similar cognitive load, yet doesn’t require color-based answers. To consider an alternative opinion about Attention Checks, read this Qualtrics blog from Dr. David L. Vannette (June 2017). (2) Honeypot Questions- these are questions that can be read by bots (web crawling), but are not visible to human participants. This suggestion is a personal favorite of mine. These sorts of questions lure garbage responses by bots (these are often copy-and-pasted from Google search inquiries). This tool helps researchers easily identify bot responses. You’ll need Javascript to code your question (i.e., in Qualtrics) properly. I can e-mail the script if you’d like a copy. (3) Image-Based Responses- I hadn’t heard about this novel idea for survey responses until I read this thread on Twitter. The idea is simple- have participants respond to a prompt [which may be related to your manipulations, for example, or as a fun way to reduce “bubble hell” (survey monotony)] with a self-provided image. Computer Vision API can then be used to code the images. Oriol J. Bosch and his colleagues have a useful article on this topic (see below). (4) Identify Suspicious ISPs and GPS Coordinates- Use tools embedded into the TurkPrime platform (e.g., make sure the “Block Suspicious Geocode Locations” and “Block Universal Exclude List Workers” boxes are checked- Tab 6. Plus, inexpensive Pro features include “Block Duplicate IP Addresses” and “Block Duplicate Geolocation”. Additionally, you can import a .CSV file into this site with columns that include the IP address, latitude, and/or longitude. Click the ‘Analyze’ button and the site will carefully retrieve the ISPs and/or analyze your geolocation data. Afterwards, you can click on the ‘Download Results’ button and rows with flagged GPS coordinates and ISPs will be provided. (5) Seriousness Checks- These are simple checks where research participants are asked about the seriousness of their participation. Researchers can then exclude self-declared “non-serious” participants from analysis. (5) Commitment Requests- An experimental study in 2022 with nearly 4,000 participants showed that a simply question- framed as a commitment check, resulted in the fewest quality-related issues. “We care about the quality of our survey data. For us to get the most accurate measures of your opinions, it is important that you provide thoughtful responses to each question in this survey. Do you commit to providing thoughtful answers to questions in this survey?” Choices include: “I can’t promise either way,” “Yes, I will” and “No, I will not.” REFERENCES Aust, F., Diedenhofen, B., Ullrich, S. and Musch, J. (2013), “Seriousness checks are useful to improve data validity in online research,” Behavior Research Methods, 45 (2), 527-535. (Link) Bai, H. (2018), “Evidence that A Large Amount of Low Quality Responses on MTurk Can Be Detected with Repeated GPS Coordinates.” (Link) Byrd, N. (2023), “Reflection-Philosophy Order Effects and Correlations: Aggregating and comparing results from mTurk, CloudResearch, Prolific, and undergraduate samples. (Preprint Link) Bosch, O. J., Revilla, M., & Paura, E. (2018), “Answering Mobile Surveys with Images: An Exploration Using a Computer Vision API,” Social Science Computer Review. (Link) Chandler, J., Rosenzweig, C., Moss, A. J., Robinson, J., & Litman, L. (2019), "Online panels in social science research: Expanding sampling methods beyond Mechanical Turk," Behavior Research Methods, 51(5), 2022-2038. (Link) Chandler, J., Sisso, I., & Shapiro, D. (2020), “Participant carelessness and fraud: Consequences for clinical research and potential solutions,” Journal of Abnormal Psychology, 129(1), 49. (Link) CloudResearch. (2018), “After the Bot Scare: Understanding What's Been Happening with Data Collection on MTurk and How to Stop it.” (Link) **ALL of the CloudResearch blog posts are useful. I recommend reading them regularly!** CloudResearch. (2021), “4 Strategies to Improve Participant Retention In Online Longitudinal Studies.” (Link) Curran, P. G., & Hauser, K. A. (2019), "I’m paid biweekly, just not by leprechauns: Evaluating valid-but-incorrect response rates to attention check items," Journal of Research in Personality, 82, 103849. (Link) DeSimone, J.A., Harms, P.D. and DeSimone, A.J. (2015),” Best practice recommendations for data screening,” Journal of Organizational Behavior, 36 (2), 171-181. (Link) Dennis, S. A., Goodson, B. M., & Pearson, C. (2018), “MTurk Workers' Use of Low-Cost 'Virtual Private Servers' to Circumvent Screening Methods: A Research Note.” (Link) Goodman, J. K., & Paolacci, G. (2017). Crowdsourcing consumer research. Journal of Consumer Research, 44(1), 196-210. (Link) JCR Tutorials in Consumer Research Hauser, D.J. and Schwarz, N., (2016), “Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants,” Behavior Research Methods, 48 (1), 400-407. (Link) Jaeger, S. R., & Cardello, A. V. (2022), “Factors affecting data quality of online questionnaires: Issues and metrics for sensory and consumer research,” Food Quality and Preference, 102, 104676. (Link) Kees, J., Berry, C., Burton, S., and Sheehan, K. (2017), "An analysis of data quality: Professional panels, student subject pools, and Amazon's Mechanical Turk," Journal of Advertising, 46(1), 141-155. (Link) Kim, D., McCabe, C., Yamasaki, B., Louie, K. & King, K., (2018), “Detecting random responders with infrequency scales using an error-balancing threshold,” Behavior Research Methods, 50 (5), 1960-1970. (Link) Kostyk, A., Zhou, W., & Hyman, M. R. (2019), "Using surveytainment to counter declining survey data quality," Journal of Business Research, 95, 211-219. (Link) Litman, L., Moss, A., Rosenzweig, C., and Robinson, J. (2021), "Reply to MTurk, Prolific or panels? Choosing the right audience for online research," SSRN Preprint. (Link) Matherly, T. (2019), “A panel for lemons? Positivity bias, reputation systems and data quality on MTurk,” European Journal of Marketing, 53 (2), 195-223. (Link) Paolacci, G., & Chandler, J. (2014), “Inside the Turk: Understanding Mechanical Turk as a participant pool,” Current Directions in Psychological Science, 23(3), 184-188. (Link) Prims, J. P., Sisso, I., & Bai, H. (2018), “Suspicious IP Online Flagging Tool.” (Link) Salehi, Niloufar, and Michael S. Bernstein (2018), “Ink: Increasing Worker Agency to Reduce Friction in Hiring Crowd Workers,” ACM Transactions on Computer-Human Interaction (TOCHI), 25 (2), 10-27. (Link) Sharpe Wessling, K., Huber, J. and Netzer, O. (2017), “MTurk Character Misrepresentation: Assessment and Solutions,” Journal of Consumer Research, 44(1), 211-230. (Link) JCR Tutorials in Consumer Research

Issues with Crowdsourcing Data Collection

Crowdsourcing Data Collection

Explore some of the things I’ve found helpful in my career as a researcher

Designed and built by Laurel Aynne Cook © Laurel Aynne Cook 2024