What is the rubric used?
In the book The Effortless Experience there is a strong case made that it is less about “delighting” or “dazzling” a customer and more on making the experience low effort which is described in the following statement:
Loyalty is driven by how well a company delivers on its basic promises and solves day-to-day problems, not on how spectacular its service experience might be. Most customers don’t want to be “wowed”; they want an effortless experience. And they are far more likely to punish you for bad service than to reward you for good service.
So, to measure ways of making the experience effortless the approach that my team had was to weigh transactional and interactional items equally. It was not just what was done but how it was done.
Quality vs Quantity?
Are 5 calls a month enough? Ten calls? 20 calls? Should they be a random sampling or does there need to be a quota per activity? Do newly hired people need more calls than veterans? Should there be a different scale used?
These are definitely questions I have wrestled with. My heuristic is based on having enough of a sample to give actionable feedback on. If the scores are tied to a reward system, then it has to be fair across all people in that reward system. For example, if performance is tied to bonus pay then it is not fair to assess 15 calls for Person A and only 5 calls for Person B.
Generally, the base number I have used is 10 and then added calls across cohort group based on new skills or targeted needs.
Pass/Fail vs A-F vs 0-100?
In my experience, people rarely look at feedback given on the calls marked as “pass” and instead concentrate on the ones marked as fail. Someone can be marked as a pass but maybe they barely passed. It might be better to assign a letter grade. Once again, this could be tricky if tied to an extrinsic reward system. A numeric score might work better if all elements are consistent across participants and per call. Whichever method is selected, it is the next step on how it is used for coaching that is key.
Automation on sentiment or trained professionals?
I have only worked with systems that involved people evaluating recordings directly. I have seen demos that showed systems that listed to EVERY call and analyze for sentiment to allow a holistic view of interactional style. I have worked with web analytics tools to know that there are ways to create “correct paths” to match transactional items. This pathing is very system dependent and can be a huge challenge if people are interacting with a multitude of systems. For a time I think there would still be a need for individuals to analyze recordings to enhance or validate automated findings.