A “mystery shopper survey” can assess trainees’ skills on the job, providing insight to strengthen employment readiness programs.
A “mystery shopper” survey can provide valuable insight into employee performance. Image by Oleg Magni on Pexels
In communities with high youth unemployment rates, technical training programs can seem like an effective path to prepare young people for the workplace. But it can be difficult to know how effective these programs are — especially tracking young people once they graduate and assessing their performance on the job. This blog outlines one approach to assess trainees once they are employed to gain insights that will strengthen technical training programs¹.
Example scenario: John² was an East African youth who was neither in school nor employed. He decided to enter a train-to-work program at his local technical institute. He went through an intensive training course and was hired by a retail employer. The training institute he attended knew that John did well in the course, but wanted to understand how his performance would translate to the workplace. Would he be able to interact with customers? Would he have the “soft skills” necessary to perform under pressure?
There are several ways to assess an employee’s performance. As an evaluator, we could ask employees to grade themselves on the characteristics emphasized in their training. If we’re worried about biased answers, we might ask supervisors to conduct this grading. But this can be time-intensive if there are large numbers of employees to assess and characteristics to investigate; in addition, it might be hard for supervisors to remember salient details. We might be worried about social desirability bias, as supervisors could feel pressure to grade their employees positively, as proof that they are effective managers. Alternatively, we could survey customers as they leave businesses about their experience. While these exit surveys would allow us to obtain multiple snapshots of an employee’s performance, we’ve found it incredibly challenging to link the customer’s experience to individual employees, especially in large retail stores.
Our answer: the “mystery shopper survey.”
In a mystery shopper survey, enumerators (surveyors) pose as customers and have standardized interactions with the employee. After each interaction, the enumerator completes a brief questionnaire in which she assesses the employee’s performance along with several, previously defined criteria.
Mystery shopper surveys are one way to obtain independent assessments of employees’ performance, adherence to training guidelines, or company regulations. This information allows technical training institutes to tailor their programs. It also enables them to identify employees who could benefit from retraining. These activities would help employees improve their performance and increase customer satisfaction.
Mystery shopper surveys overcome most of the challenges that other assessments present. For example, we’re able to link a customer’s experience to individual employees, we obtain assessments that are as objective as possible from standardized interactions, and we can (in theory) obtain snapshots of performance in different situations. In our case, we opted to obtain a single performance snapshot from a mystery shopper interaction and assessed this alongside a supervisor’s ratings.
We learned that facilitating enumerators to pose as customers isn’t as simple as it sounds. Our goal is to obtain an objective measure of performance — without employees knowing they are being assessed — and assuming instead that they are interacting with real customers. This requires conceptualizing representative customer interactions and having a solid plan for enumerators to construct these. Here, we share key learnings from our measurement approach and survey logistics.
As a first step, you need to decide which performance dimensions to measure. In our case, the client had a well-developed theory of which characteristics the training should impart and what “good performance” along these characteristics would look like. We used this to determine the types of employee behaviors we wanted to capture (e.g. was the employee proactive in greeting the customer, was the employee knowledgeable about the products in the store?). In total, we focused on six key behaviors³.
We decided to focus on a few salient behaviors to keep the interaction short and the questionnaire focused on key decisions the organization could make to improve their training.
This is because enumerators will have to complete the survey form all from recall, immediately after they exit the store.
Next, you need to standardize the interaction between employee and customer as much as possible from the customer side. That way, we will be more confident that heterogeneity in performance ratings is due to legitimate performance differences among employees versus differences in the employee-customer interaction. We attempted to do this in several ways.
First, we developed a scripted interaction that enumerators had to strictly follow. This involved greeting the employee, asking to be shown a specific product, and asking for that product in a lower price range. We developed this script through “pilots” to local retail shops to determine which products were often purchased, we observed typical customer-employee behavior and explored possible employee responses through simulated interactions. For example, it was important for us to assess whether an employee would maintain composure under pressure. Through piloting, we realized that it would not be possible to simulate this through an “angry customer” without causing noticeable disruption in the store. Consequently, we opted to turn up the pressure by attempting to rush the employee to find something, which proved to be more realistic.
Standardizing interactions also relied heavily on enumerator training. We spent many hours having the enumerators role-play the script again and again. Furthermore, we provided enumerators guidance on what good and bad customer service behavior look like for each question. For example, if an employee expressed annoyance when asked for assistance, this would receive a poor score whereas an employee who would offer to take the customer to the aisle would receive a high score.
One of the key challenges we faced was how to identify employees. As a first step, we needed to work directly with HR managers to get lists of all entry-level retail staff for our sample. This proved challenging given the employer’s capacity constraints to provide relevant data as well as sensitivities with sharing personally identifiable data. As such, it’s important to allow ample time for this process — both to adequately brief employers on the goals of the research as well as actually gathering the data you need to identify and interact with employees. It might also be helpful to consider the value that the research can provide the employers themselves as a way to generate buy-in. For instance, we offered employers a brief memo on the average performance across their employees benchmarked against the performance of other employees in our sample⁴.
Second, for sampled employees, you’ll want to find out when you’re likely to encounter them at work. Then once you are in the store, you need to be able to identify the correct employee. For both steps, we found it extremely helpful to engage with employee supervisors. Supervisors generously provided information on their employees’ shift schedules ahead of time which we used to structure our data collection schedule. We also found that many of the employees in our sample did not wear name tags, which complicated our identification process. As such, the enumerator would discreetly find the supervisor upon arriving in the store, and then the supervisor would signal the correct employee.
Finally, retaining anonymity and plausibility is key. We had to design our data collection timeline so that any enumerator would only visit a single branch once as repeat customers would likely arouse suspicion among the employees. Furthermore, enumerators needed to take care to not carry around any identifying equipment such as nametags, notebooks, or tablets. It was also important that the enumerator only fill out the questionnaire once s/he was out of sight of the store. To keep with the part, we matched enumerator demographics and appearance with likely customer demographics and appearance when appropriate, for example, by sending only female enumerators to stores that sell cosmetics or asking enumerators to dress well when entering high-end clothing stores.
While not easy to implement, mystery shopper surveys result in valuable, individual-level performance assessments that allow you to further refine and tailor your training programs.
Conducting mystery shopper surveys involves a higher level of effort both in the conceptualization and implementation phases. That said, we feel this added level of effort is well worth it. This is especially true when training institutes intend to use objective assessments of individual performance to further improve and better tailor their programs. As training institutes strengthen their programs, employers will then have higher-performing employees, which helps them increase revenue and also makes these training programs more cost-effective.
¹The client for our work will remain confidential.
²Name has been changed for anonymity.
³Pre-defined performance categories will not exist for all training programs. In that case, evaluators will have to develop these categories in close collaboration with the training provider to accurately reflect training objectives.
⁴Sharing personal, identifiable information linked to performance with employers would raise several questions around consent. In our case, we received consent from the employers to conduct mystery shopper surveys in their stores. As we were recording observations only within employer premises and obtained permission, we did not need to obtain consent from employees (analogous to a performance appraisal or the company hiring their own mystery shoppers). We also did not share any personally identifiable information back with employers (or the client) — only aggregated scores for all their employees — so that supervisors would not be able to discern an employee’s individual score.
1 March 2019
7 March 2019
2 April 2019
1 May 2019