UX Considerations for
Machine Learning

This article is an abridgement of a presentation originally given at the Cisco’s 2020 Data Science Summit in Prague.

[Stefano Meschiari, Data Scientist]
For this presentation, we’re going to tell you about our experience running a private beta of a threat detection system in front of customers. We hope you can take away from this presentation:

The first is that now that we’re uncovering pain points for our users that are a direct consequence of building this system around ML models. There’s tension between the properties of a statistical system and users’ expectations that need to be taken into account. These pain points were uncovered during user research led by Design. During this phase, we observed our users interacting with a fully-realized system and their own data for the first time, and asked them about their thought process while working through workflows. Addressing these pain points is as much of a design problem as it is a data science problem. Trying to adhere to a set of design principles is invaluable to narrowing the types of solutions (whether it be new models, heuristics, rules, etc.).

Machine Learning Meets Real World Threat Detection Specialists

The system we’re going to talk today about is called Duo Trust Monitor. Data Science has been building this system in collaboration with multiple teams at Duo. The audience of this system is security analysts. The internal persona we use is called “Gary”, and we’ll be using this name as a shorthand in the following slides. Jillian will dig more into what we know about Gary, his needs and concerns in a sec.

The goal of this system is to surface events that Gary thinks should be investigated, offer them tools to quickly understand and take an action on these detections. We analyze properties of incoming authentications, and flag a small number of them for followup by Gary using ML models (we call those flagged auths “security events”). The idea is to save Gary time by analyzing data on their behalf and only surface events that are important to them.

Last year we had two presentations in this venue, where we went over some of the details of how we were thinking about infrastructure, modeling, and evaluation. We also had the opportunity to work with a small number of customers and asking them to evaluate preliminary detections. This process was literally us generating spreadsheets from the pipeline, handing them over, and asking them to give us an opinion. The difference is that we hadn’t really put a fully functioning, closed-loop workflow in front of our intended audience of security analysts until now. So Jillian will now walk you through what our intended audience expects out of this system, and what this system looks like now.

[Jillian Haller, Lead Designer]

Understanding Gary the Security Analyst

Gary is the primary persona for Duo Trust Monitor. In order to understand his pain points, we need to first understand his role, and the challenges he faces in order to be successful in his role. Gary bears the weight of securing the entire organization. If there is a security event, the buck stops with him. He is strapped for time - often deploying new software, investigating security issues, pulling data from multiple sources to solve problems, and staying on top of industry trends all in the same week.

Gary will see security events populated in the Threat Board. From here, he can view more attributes of an event and either triage or mark an event as uninteresting. The model will show base detections, we ask Gary to configure priority assets to further refine what appears here.

As we’ve interviewed customers for Private Beta, we have uncovered several pain points that may hinder adoption, endangering the success of DTM. We’ll review the emerging pain points we have learned in last few weeks of the Private Beta. These insights are a work in progress. We expect our understanding to evolve as we continue to collect feedback.

But first, how does Design at Duo plan and conduct user research?

We start with the planning phase. We determine the high-level questions that need answering. What are the goals of the customer sessions? From there, we decide the best way to answer those questions. Some methods of investigation include: user walk-through with a clickable prototype, diagramming with the user, or just interviewing without visuals. We document these details in the Research Plan and create a discussion guide: which is a script of open-ended questions. Once the customer sessions are completed, we move into the synthesis phase. The purpose of synthesis is to pull insights and action items from our customer discussions. We share these insights with the larger team for discussion and prioritization.

Emerging Pain Points

These pain points are not in any particular order or ranking:

Gary fears he will miss security events by unintentionally training the model poorly

Gary is frustrated when he feels there is a mismatch between his level of effort and the benefits of using Duo Trust Monitor

Gary is not confident that his selections made during configuration are being represented on the Threat Board

Gary dreads noise that could distract him from true threats, or from his many other responsibilities

Gary is blocked when Threat Detection does not work how he assumes it should

Gary and his team need proper visibility into each other's decisions to maintain accountability, share knowledge, and train newer team members

Let’s focus on the first pain point: Gary fears he will miss security events by unintentionally training the model poorly.

What does this mean? What is the challenge or design? And how are we thinking of solving it?

Ultimately, Gary is afraid his actions will have unintentional consequences, because he cannot predict the long-term impact of his decisions. He’s not sure how his evaluations over time impact the model. He’s not sure how often or how much of his feedback get incorporated. He wants Trust Monitor to be more precise, but he’s worried about missing out on a security event because he marked similar events in the past as “Uninteresting.”

Giving Gary the power to predict outcomes would improve his cognitive fluency.

Cognitive Fluency

What is cognitive fluency? It’s the learner’s ease of understanding. Easily understood, predictable processes feel more trustworthy to end users. The more confident a user feels about her actions, the more likely she is to perform the action. Hesitation to perform a task in the product directly leads to shallow adoption, making products less sticky or feel less critical to our customers’ work.

The takeaway for Design: we need to empower Gary to feel confident during configuration and evaluating. And we need to empower him without overwhelming him with the detailed mechanics of data science that powers Trust Monitor. We can leverage the Gary Design Principles help us think of solutions.

Design Principles as a Guide

The Gary Design principles are guidelines for making easy-to-use, pleasurable user experience for Gary.

These principles are the result of over 5 years of qualitative research across multiple projects at Duo.

They also align with Duo’s brand - easy, effective, trustworthy, and enduring. Armed with the Design principles, what is a possible solution to this problem? We do have a few ideas we’re considering.

One idea is to allow Gary to decide himself what impacts the model This aligns nicely with the principle “Reinforce Trust.”

How would this look from a UI/UX perspective? We would explicitly ask Gary how he wants his evaluation to be considered instead of inferring. We would use language that describes the result, for example “Would you like to see more or fewer events like this in the future?” “Would you like to evaluate without impacting the scoring of similar events in the future?” We would also allow him to undo these decisions; so he isn’t paralyzed with fear of making an incorrect, unchangeable selection.

[Stefano Meschiari]

As Jillian mentioned, we want to build a feedback loop in a way that aligns with the “Reinforce Trust” principle. What does this mean from a modeling perspective? At least at this stage, following this principle and taking into account what we hear from Gary means that we’ll want to favor solutions that are transparent and easy to configure for Gary. Inferring contextual rules from their feedback and ask Gary; let Gary undo their choice; let Gary manually configure those rules; make it easy to set rules as a team. When we design a feedback loop that also feeds into the ML models, we’ll want to continue taking the pain point into consideration.

Author: Jillian Haller
Presenters: Jillian Haller and Stefano Meschiari