Back to Blog

ICCV Decoded: Explainability Through Model Representations

Explainability
Computer Vision
Events
Yotam Azriel

Understanding deep learning representations with applied explainability

As someone deeply involved in deploying deep learning systems to production, I keep encountering the same challenge: models can be highly accurate and still fundamentally difficult to trust. When they fail, the reasons are often opaque. Internal representations are hard to reason about, hidden failure modes emerge late, and many post-hoc explainability tools stop at surface-level signals without answering a more fundamental question: what has the model actually learned?

This question was very much on my mind at ICCV, where I attended a talk by Thomas Fel. His presentation immediately stood out. The way he covered the space- both technically and philosophically- was precise, rigorous, and deeply thoughtful. He framed interpretability not as a visualization problem or an auxiliary debugging step, but as a way to reason about models at the level where decisions are formed: their internal representations. That framing resonated strongly with me.

Interpreting models through internal representations

In his NeurIPS 2023 paper, Thomas and his co-authors introduce Inverse Recognition (INVERT), a method for automatically associating hidden units with human-interpretable concepts. Unlike many prior approaches, INVERT does not rely on segmentation masks or heavy supervision. Instead, it identifies which concepts individual neurons discriminate and quantifies that alignment using a statistically grounded metric. This enables systematic auditing of representations, surfacing spurious correlations, and understanding how concepts are organized across layers- without distorting the model or making causal claims the method cannot support.

From research insight to applied practice

This line of work aligns closely with how we think about applied explainability at Tensorleap. For practitioners, interpretability only matters if it leads to action: better debugging, more reliable validation, and clearer decision-making around models in real systems. Representation-level analysis addresses a critical gap- cases where a model appears to perform well, but for the wrong reasons.

Beyond the research itself, Thomas is also a remarkably clear and insightful speaker. That was a key reason I invited him to join us for a Tensorleap webinar. Our goal was not to simplify the research, but to explore how these ideas translate into practical insight for engineers working with complex models in production.

If you’re grappling with understanding what your models have learned, or why they fail in unexpected ways, I strongly recommend watching the full webinar recording to dive deeper into representation analysis and applied interpretability.