1 minute read

Last January I participated in the Fairness, Accountability, Confidentiality, and Transparency course. We were tasked with replicating a recent paper on one of the FACT topics. Me and my partner worked on the ‘Attention is not Explanation’ paper by Jain and Wallace. Because this was one of the papers for which the code had already been open-sourced, we were instructed to come at it with an original angle.

UPDATE: We later discovered that the results observed in this paper were caused by a bug in the implementation of the top-k rank correlation we used.

One of the experiments in ‘Attention is not Explanation’ looks at the amount of correlation between the values assigned by attention mechanisms and certain feature importance metrics (leave-one-out and gradient-based). They observe little to no correlation, but Jain and Wallace note that it is possible that the correlation metric they use (Kendall-$\tau$) is considering irrelevant features, thereby adding noise. They propose comparing only the top-$k$ inputs of attention and other explanations, but mention that choosing a $k$ is not trivial.

We propose using sparse attention, and letting $k$ equal the amount of non-zero values produced by the attention mechanism. When calculating the correlation (using a a top-$k$ generalisation of Kendall-$\tau$) in this way, we find equally strong correlation between attention and the feature importance metrics as they have with eachother for tanh attention. For scaled dot product attention mechanisms we do not observe this.

I do hope to get a chance to work on this more in the future though; because I think it is possible that we could results similar to those we got for tanh, if we let the model learn the amount of sparsity to use. We had originally planned to do this during the course, but did not end up having enough time. To make this possible, we had intended to use $\alpha$-entmax.

You can get the paper here, the code is available at https://github.com/michaeljneely/sparse-attention-explanation.