SOInter: A Novel Deep Energy-Based Interpretation Method for Explaining Structured Output Models

Part of International Conference on Representation Learning 2024 (ICLR 2024) Conference

Bibtex Paper

Authors

S. Fatemeh Seyyedsalehi, Mahdieh Baghshah, Hamid Rabiee

Abstract

This paper proposes a novel interpretation technique to explain the behavior of structured output models, which simultaneously learn mappings between an input vector and a set of output variables. As a result of the complex relationships between the computational path of output variables in structured models, a feature may impact an output value via other output variables. We focus on one of the outputs as the target and try to find the most important features adopted by the structured model to decide on the target in each locality of the input space. We consider an arbitrary structured output model available as a black-box and argue that considering correlations among output variables can improve explanation quality. The goal is to train a function as an interpreter for the target output variable over the input space. We introduce an energy-based training process for the interpreter function, which effectively considers the structural information incorporated into the model to be explained. The proposed method's effectiveness is confirmed using various simulated and real data sets.