A new tool helps humans better understand and develop artificial intelligence models by searching and highlighting representative scenarios.

Explaining, interpreting, and understanding the human mind presents a unique set of challenges. 

Doing the same for the behaviors of machines, meanwhile, is a whole other story. 

As artificial intelligence (AI) models are increasingly used in complex situations — approving or denying loans, helping doctors with medical diagnoses, assisting drivers on the road, or even taking complete control — humans still lack a holistic understanding of their capabilities and behaviors. 

Existing research focuses mainly on the basics: How accurate is this model? Oftentimes, centering on the notion of simple accuracy can lead to dangerous oversights. What if the model makes mistakes with very high confidence? How would the model behave if it encountered something previously unseen, such as a self-driving car seeing a new type of traffic sign?

In the quest for better human-AI interaction, a team of researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have created a new tool called Bayes-TrEx that allows developers and users to gain transparency into their AI model. Specifically, it does so by finding concrete examples that lead to a particular behavior. The method makes use of  “Bayesian posterior inference,” a widely-used mathematical framework to reason about model uncertainty.

In experiments, the researchers applied Bayes-TrEx to several image-based datasets, and found new insights that were previously overlooked by standard evaluations focusing solely on prediction accuracy. 

“Such analyses are important to verify that the model is indeed functioning correctly in all cases,” says MIT CSAIL PhD student Yilun Zhou, co-lead researcher on Bayes-TrEx. “An especially alarming situation is when the model is making mistakes, but with very high confidence. Due to high user trust over the high reported confidence, these mistakes might fly under the radar for a long time and only get discovered after causing extensive damage.”

For example, after a medical diagnosis system finishes learning on a set of X-ray images, a doctor can use Bayes-TrEx to find images that the model misclassified with very high confidence, to ensure that it doesn't miss any particular variant of a disease. 

Bayes-TrEx can also help with understanding model behaviors in novel situations. Take autonomous driving systems, which often rely on camera images to take in traffic lights, bike lanes, and obstacles. These common occurrences can be easily recognized with high accuracy by the camera, but more complicated situations can provide literal and metaphorical roadblocks. A zippy Segway could potentially be interpreted as something as big as a car or as small as a bump on the road, leading to a tricky turn or costly collision. Bayes-TrEx could help address these novel situations ahead of time, and enable developers to correct any undesirable outcomes before potential tragedies occur. 

In addition to images, the researchers are also tackling a less-static domain: robots. Their tool, called “RoCUS”, inspired by Bayes-TrEx, uses additional adaptations to analyze robot-specific behaviors. 

While still in a testing phase, experiments with RoCUS point to new discoveries that could be easily missed if the evaluation was focused solely on task completion. For example, a 2D navigation robot that used a deep learning approach preferred to navigate tightly around obstacles, due to how the training data was collected. Such a preference, however, could be risky if the robot’s obstacle sensors are not fully accurate. For a robot arm reaching a target on a table, the asymmetry in the robot’s kinematic structure showed larger implications on its ability to reach targets on the left versus the right.

“We want to make human-AI interaction safer by giving humans more insight into their AI collaborators,” says MIT CSAIL PhD student Serena Booth, co-lead author with Zhou. “Humans should be able to understand how these agents make decisions, to predict how they will act in the world, and — most critically — to anticipate and circumvent failures.”  

Booth and Zhou are coauthors on the Bayes-TrEx work alongside MIT CSAIL PhD student Ankit Shah and MIT Professor Julie Shah. They presented the paper virtually at the AAAI conference on Artificial Intelligence. Along with Booth, Zhou, and Shah, MIT CSAIL postdoc Nadia Figueroa Fernandez has contributed work on the RoCUS tool.
 

More in category

Understanding dimensionality reduction in machine learning models
May 16, 2021Venture Beat
Understanding dimensionality reduction in machine learning models
How AIOps can benefit businesses
May 16, 2021Venture Beat
How AIOps can benefit businesses
Marlin bets big on Abbyy’s digital intelligence initiatives
May 15, 2021Venture Beat
Marlin bets big on Abbyy’s digital intelligence initiatives
Understanding the differences between biological and computer vision
May 15, 2021Venture Beat
Understanding the differences between biological and computer vision
GPT-3’s free alternative GPT-Neo is something to be excited about
May 15, 2021Venture Beat
GPT-3’s free alternative GPT-Neo is something to be excited about
New deep learning model brings image segmentation to edge devices
May 14, 2021Venture Beat
New deep learning model brings image segmentation to edge devices
When AI meets BI: 5 red flags to watch for
May 14, 2021Venture Beat
When AI meets BI: 5 red flags to watch for
Facebook’s new technique helps AI systems forget irrelevant information
May 14, 2021Venture Beat
Facebook’s new technique helps AI systems forget irrelevant information
For language models, analogies are a tough nut to crack, study shows
May 13, 2021Venture Beat
For language models, analogies are a tough nut to crack, study shows
Soniox taps unsupervised learning to build speech recognition systems
May 13, 2021Venture Beat
Soniox taps unsupervised learning to build speech recognition systems
Code-scanning platform BluBracket nabs $12M for enterprise security
May 13, 2021Venture Beat
Code-scanning platform BluBracket nabs $12M for enterprise security
Yelp built an AI system to identify spam and inappropriate photos
May 12, 2021Venture Beat
Yelp built an AI system to identify spam and inappropriate photos
DataRobot’s Zepl acquisition bridges the AI divide
May 12, 2021Venture Beat
DataRobot’s Zepl acquisition bridges the AI divide
Phishing attacks exploit cognitive biases, research finds
May 12, 2021Venture Beat
Phishing attacks exploit cognitive biases, research finds
OpsRamp unveils software to migrate enterprises to the public cloud
May 12, 2021Venture Beat
OpsRamp unveils software to migrate enterprises to the public cloud
Onestream: Data analysis, AI tools usage increased in 2021
May 11, 2021Venture Beat
Onestream: Data analysis, AI tools usage increased in 2021
Appian debuts new low-code features for enterprise
May 11, 2021Venture Beat
Appian debuts new low-code features for enterprise
Edge AI chipset developer Sima.ai raises $80M
May 11, 2021Venture Beat
Edge AI chipset developer Sima.ai raises $80M
Mythic raises $70M to disrupt AI chips with analog and flash components
May 11, 2021Venture Beat
Mythic raises $70M to disrupt AI chips with analog and flash components
Redwood Software raises $379M for enterprise process automation
May 11, 2021Venture Beat
Redwood Software raises $379M for enterprise process automation
IBM’s Project CodeNet wants to teach AI how to code
May 11, 2021AI News
IBM’s Project CodeNet wants to teach AI how to code