In This Issue

Using Amazon's Mechanical Turk (MTurk) to Recruit Military Veterans: Issues and Suggestions

Amazon's Mechanical Turk: How useful is it for military research?

By Brigid Mary-Donnell Lynn and Jessica Kelley Morgan

Access to the military veteran population for the purposes of research is rightly limited and can be difficult. Amazon's Mechanical Turk (MTurk) provides an avenue of access that exists outside of health care and reduces some of the risks associated with conducting research within the health care context. However, there are different considerations that need to be made, and precautions that should be taken, both in terms of participant protection and the quality of data being collected. A multiphased study was conducted to determine the feasibility of using MTurk as a way to recruit military veterans. In this article, we first provide a brief background of MTurk and suggestions from the civilian literature. We then provide an overview of the present multiphased study and the details of our development of a protocol to use this platform to recruit military veterans. Finally, we outline conclusions and describe future work.

MTurk is a crowdsourcing website designed to assist with task completion, allowing data collection to be framed within the context of an open online marketplace in which MTurk provides the workforce (Buhrmester, Kwang, & Gosling, 2011; Chandler & Shapiro, 2016). Researchers comparing traditional online data collection with MTurk data collection procedures found that MTurk is a time- and cost-effective way to gather high-quality data (Buhrmester et al., 2011; Chandler & Shapiro, 2016; Mason & Suri, 2012). A study examining the use of MTurk in psychology found that MTurk participants are slightly more demographically diverse than standard Internet samples and are significantly more diverse than typical American college samples (Buhrmester et al., 2011). In addition, participants can be recruited rapidly and inexpensively, and realistic compensation rates do not affect data quality (Buhrmester et al., 2011). Recent research suggests that MTurk is a viable source of data, with data quality comparable to or greater than that of more traditional methods (Bartneck, Duenser, Moltchanova, & Zawieska, 2015; Buhrmester et al., 2011; Casler, Bickel, & Hackett, 2013; Chandler & Shapiro, 2016).

Suggestions have been made in the civilian literature for using MTurk to include screening questions that gauge attention, to avoid questions with factual answers and to consider how individual differences in financial and social domains may influence the results (Goodman, Cryder, & Cheema, 2013).

Present Study

The present multiphased study sought to develop a protocol for the recruitment of military veterans using the MTurk platform. Based on suggestions in previous research, a series of self-screening criteria, veteran status screening questions and attention gauge questions was developed to assist with verifying veteran status and determining the quality of the data collected (Berinsky, Margolis, & Sances, 2014; Buhrmester et al., 2011). The protocol was developed in two phases. In Phase 1, we conducted a focus group with military veterans to determine appropriate screening questions, and then conducted a study recruiting veteran and nonveteran samples using these screeners (Lynn, 2014). In Phase 2, we built on and adapted the protocol for the recruitment of veterans in another study, and recruited two samples—one with and one without the protocol measures in place (Morgan, 2015).

Protocol Development 

Phase 1

Focus group. Nine veterans completed an online survey and four participated in the follow-up focus group. The primary purpose was to discuss the effectiveness of screening and attention gauge questions. Based on the information gathered from the focus group, changes were made before the launch of Study 1.

Initially, there was one question about putting officer ranks in order. During the focus group, a participant expressed concern about enlisted personnel taking offense that enlisted ranks were not included. Therefore, a question to determine which branch of the military the participant served in was included and linked to the enlisted ranks ordering question, followed by the officer rank question. Participants also expressed concern about missing the veteran status screening questions. A participant who served 20 years in the military missed Question 1 (below). There are various reasons why a veteran may not have attended basic training (for example, if they went to a military academy or did Reserve Officers' Training Corps [ROTC]). This question was kept in the survey, and the decision was made that participants' data were to be removed if they missed three or more of the five veteran status (Berinsky et al., 2014).

Veteran Status Screening Questions
  • What is the acronym for the locations where final physicals are taken prior to shipping off for basic training? (four letters)
  • What is the acronym for the generic term the military uses for various job fields? (three letters)
  • Please put these officer ranks in order: (participants were given visual insignia to rank order).
  • Please put these enlisted ranks in order: (contextualized branch-specific question; participants were given visual insignia to rank order)
  • In which state is your basic training base located? (contextualized branch-specific question)
Attention Gauge Questions
  • What was this study about? (instructions asked participants to select “Other” and type in “Decision Making”)
  • Please answer “Strongly agree” for this question.

Study 1. A convenience sample of veterans and nonveterans was recruited via MTurk as two different jobs with different parameters set by the requester. The study was designed to reach two specific populations: veterans who have experienced deployments during Operation Enduring Freedom/Operation Iraqi Freedom (OEF/OIF) and a nonveteran comparison group. All participants were MTurk “workers” with an approval rating of at least 98 percent who had completed at least one other “job.” Additionally, the user location was limited to the United States. Veterans were required to have had deployment experience during OEF/OIF, defined as any assignment that led to deployment in support of OEF/OIF. Because of the underrepresentation of females in the military as a whole, this study was limited to only males.

Workers were notified by MTurk of the available Human Intelligence Task (HIT). A HIT is a brief description of the task and includes estimated time required for task completion and compensation rate. The MTurk job description and survey informed consent invited participants to complete an anonymous, voluntary survey about well-being. The use of MTurk allows for anonymity, although the participants were redirected to a survey through Qualtrics for easier informed consent and additional screening. The Qualtrics settings were such that no identifying information (including the users' IP address) was collected. Upon completion of the survey, the participants were given a completion code and redirected back to MTurk to submit task completion and receive compensation.

Upon consent, participants were asked a series of demographic questions to verify whether they met the outlined criteria, as well as questions about military experiences if the participant identified as a veteran. These were force-answered questions but were not used as screening questions (as outlined below in Study 2). These questions were followed by veteran status screening questions, used as a validation check to confirm participants' self-report of veteran status. Based on focus group results, if participants missed three or more of the five veteran status questions, their data were removed. Two questions to gauge attention were included, one at midsurvey and the other close to the end of the survey, to make sure participants were reading the questions. Participants missing multiple attention gauge screener questions are those paying the very least attention to the survey questions and most researchers remove the associated data (Goodman et al., 2013). However, missing one screener question out of several does not necessarily predict quality of data (Berinsky et al., 2014). Therefore, if participants missed both attention gauge screener questions, their data were removed, but the data were kept if participants missed only one. Figure 1 shows the flow of participant recruitment.

Figure 1. Participant flow through veteran checks. HIT Human Intelligence Test; bio biological; Grp group; OEF/OIF Operation Enduring Freedom/Operation Iraqi Freedom.
Figure 1. Participant flow through veteran checks. HIT Human Intelligence Test; bio biological; Grp group; OEF/OIF Operation Enduring Freedom/Operation Iraqi Freedom.

Finally, a completion code was located at the end of the survey that participants had to submit via MTurk to receive compensation. The anonymous nature of the survey meant the screener questions could not be a source of rejecting MTurk workers' HITs. However, if they did not provide the correct completion code, their work was rejected and they did not earn compensation. The rejection of work could potentially lower their MTurk approval rating score (a score documenting a worker's work history). However, MTurk workers are allowed to inquire why their work was not approved. A total of six MTurk participants had their work rejected. None of the workers reached out inquiring about their rejected work.

The mean age of the sample was 34 years (with one 21-year-old veteran and one 73-year-old nonveteran). The majority of participants (82 percent) were aged 25 to 41 years, reported being non-Hispanic (92 percent) and European American/White (82 percent). Fifty-five percent of the participants reported being single and 38 percent reported being married. The reported locations of military experience were Iraq (41 percent), Afghanistan (37 percent), another overseas country (16 percent), and within the United States (6 percent). One hundred thirty-five of the military participants (79 percent) reported their deployment was a combat deployment. The average time served in the military was seven years, average time since separated from the military was fouryearsand average time since last deployment was five years. The majority of the participants served in the Army (55 percent), followed by the Air Force (18 percent) and Marine Corps (15 percent).

To determine the effectiveness of veteran screening questions, three of the five questions were also asked to the nonveteran sample (ordering military officer rank insignia, answering questions about military occupational specialty [MOS] and military entrance processing station [MEPS]).

Pearson chi-square analyses were conducted to test for significant differences in whether the questions were answered correctly or not; results indicated there were significant differences—veterans were more likely to answer the questions correctly: officer rank, χ2 (1) = 194.08, p < .01, Ф = .59; MOS, χ2 (2) = 177.12, p < .01, Ф = .57; and MEPS, χ2 (2) = 148.37, p < .01, Ф = .52. Most veterans answered the questions correctly (66 percent for officer rank, 84 percent for MOSand 83 percent for MEPS). Most nonveterans did not answer the questions correctly (9 percent for officer rank, 25 percent for MOS, and 27.5 percent for MEPS).

 Figure 2. Study 2 participant flow through veteran checks.
Figure 2. Study 2 participant flow through veteran checks.

Phase 2

For Phase 2, we completed two separate launches of a survey about adversity to determine the effects of veteran status check questions. For Launch 1, there were no exclusion criteria listed in the HIT, and keywords included “survey,” “military” and “veteran.” There was also no restriction on IP location. Results indicated that the mean age of participants for Launch 1 (without veteran check questions) was 35.83 ( SD = 10.61, range = 17–65). The sample was 86 percent male and 14 percent female. Participants were predominately European American/Caucasian/White (54 percent), 35 percent were Asian, 6 percent reported being American Indian/Alaska Native, 3 percent were African American/Black and 2 percent were biracial/multiracial. The majority of respondents (92 percent) were not of Hispanic/Latino ethnicity. Participants accessed the survey at approximately 15 participants per hour. This launch produced an overrepresentation of Asian participants, likely due to the lack of IP address location restriction.

For Launch 2, the same keywords were used. A HIT approval rate of 95 percent or greater was required, and the number of HITs approved was required to be greater than zero. The final requirement was that the user's location be in the United States. Several validation questions were included. Perhaps most importantly, participants were given the option at each validation question to respond by choosing “I am not a veteran” instead of forcing a response. This allowed the majority of nonveterans to self-select out without the researcher having to discard their data. The first question asked respondents to choose which criterion he or she met for veteran status. Options were included based on a definition provided at the Office of Personnel Management's faqs site. Additionally, participants could select “I am not a veteran.” This changes depending on the criteria to be used, and we allowed everyone who chose “other” to remain in the study. Three validation checks from Study 1 were used (Questions 1, 2 and 5).

Again, each question also included an option of “I am not a veteran.” One attention check (“I have lied a lot on this survey” [true or false]) was also included in the middle of the survey. Figure 2 shows the flow of participant recruitment.

For the participant to receive payment, the participant needed an MTurk validation code obtained at the end of the survey. As people self-select out by not consenting or saying “I am not a veteran,” they simply end the survey. Because those who answer that they are not a veteran receive a “Thank You” screen, no additional data are collected and there is no need to compensate them or separate these data later. If the participant went through every question, they needed to be compensated, per institutional review board requirements. In this study, we paid 17 people who left all scales blank and only answered questions required for validation.

For Launch 2 (with veteran checks), the mean age of participants was 36.01 years ( SD = 10.94, range = 21–71). The sample was 69.4 percent male and 30.6 percent female, for an overrepresentation of female veterans compared with the national population of veterans (9 percent; National Center for veteran Analysis & Statistics, 2014). This is not surprising, given that 70 percent of American “Turkers” are female (Ipeirotis, 2014). Participants were predominately European American/Caucasian/White (82.7 percent, n = 163), while 8.6 percent ( n = 17) were African American/Black, 5.1 percent ( n = 10) were biracial/multiracial, 2 percent ( n = 4) were Asian, one respondent was American Indian/Alaska Native and one respondent was Native Hawaiian/Pacific Islander. The majority of respondents ( n = 171, 89.5 percent) were not Hispanic/Latino, and 10.5 percent ( n = 20) reported Hispanic/Latino ethnicity.

Respondents were from all five branches of the military, with the majority being Army veterans (47.7 percent, n = 94), followed by Air Force (19.3 percent, n = 38), Navy (16.2 percent, n = 32), Marines (12.7 percent, n = 25) and Coast Guard (4.1 percent, n = 8). The majority also reported being Active Duty at the time of service (71.9 percent, n = 138), compared with Reserves (19.3 percent, n = 37) and National Guard (8.9 percent, n = 17). Participants accessed the survey at approximately three participants per hour, but this launch produced a much more representative sample of veterans, with the exception of female overrepresentation.

Conclusions

MTurk is a convenient and fast data collection tool and, if proper considerations and precautions are made, can be effective for reaching target populations that have historically been difficult to reach. Researchers looking to use MTurk may need to consider the type of research being conducted; clearly not all research is going to be suited for data collection via MTurk (Chandler & Shapiro, 2016). Some of these considerations may include the risks, the subject matter, the target sample, and how well-established the research is in the area of interest. If MTurk is appropriate for a particular research question, there are steps that can be taken to ensure the target population is being reached and to increase the potential for high-quality data. En masse, the results of our multiphase study suggest that veteran status screening and attention gauge questions may be effective for determining whether or not an MTurk worker is a veteran.

Future research should continue to strengthen the protocol for the use of MTurk for surveying veteran populations. We suggest including several veteran status questions tailored for the target population as well as options to self-select out of the survey. We also suggest including a section after the consent that reads, “I understand that it is illegal to impersonate a veteran for the purpose of obtaining financial gains (e.g., monetary incentives from MTurk).” By using available information and tools on online crowdsourcing, MTurk may provide a cost- and time-effective mechanism for obtaining high-quality data with difficult-to-reach populations.

Additional resources to help with the process of collecting data via amazon Mechanical Turk are:

Footnote

For further information, please contact: Brigid Mary-Donnell Lynn.

References

Bartneck, C., Duenser, A., Moltchanova, E., & Zawieska, K. (2015). Comparing the similarity of responses received from studies in Amazon's Mechanical Turk to studies conducted online and with direct recruitment. PLoS One, 10, e0121595. 10.1371/journal-pone.0121595

Berinsky, A. J., Margolis, M. F., & Sances, M. W. (2014). Separating the shirkers from the workers? Making sure respondents pay attention on self-administered surveys. American Journal of Political Science, 58, 739–753. 10.1111/ajps.12081

Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon's Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3–5. 10.1177/1745691610393980

Casler, K., Bickel, L., & Hackett, E. (2013). Separate but equal? A comparison of participants and data gathered via Amazon's MTurk, social media, and face-to-face behavioral testing. Computers in Human Behavior, 29, 2156–2160. 10.1016/j.chb.2013.05.009

Chandler, J., & Shapiro, D. (2016). Conducting clinical research using crowdsourced convenience samples. Annual Review of Clinical Psychology, 12, 53–81. 10.1146/annurev-clinpsy-021815-093623

Goodman, J. K., Cryder, C. E., & Cheema, A. (2013). Data collection in a flat world: The strengths and weaknesses of Mechanical Turk samples. Journal of Behavioral Decision Making, 26, 213–224. 10.1002/bdm.1753

Ipeirotis, P. (2014). Demographics of Mechanical Turk. Department of Information, Operation, and Management Sciences, Leonard N. Stern School of Business, New York University, New York, NY.

Lynn, B. M.-D. (2014). Shared sense of purpose and well-being among Veterans and non-Veterans (Doctoral dissertation). North Carolina State University, Raleigh, NC. Retrieved from http://catalog.lib.ncsu.edu/record/NCSU3130893

Mason, W., & Suri, S. (2012). Conducting behavioral research on Amazon's Mechanical Turk. Behavior Research Methods, 44, 1–23. 10.3758/s13428-011-0124-6

Morgan, J. K. (2015). Examining growth outcomes in military Veterans: Posttraumatic growth, core beliefs, and temporality (Unpublished master's thesis). North Carolina State University, Raleigh, NC.

National Center for Veteran Analysis and Statistics. (2014). Veteran population. Retrieved from http://www.va.gov/vetdata/veteran_population.asp. United States Department of Veterans Affairs.