XAI in Adversarial Environments

Modern deep learning methods have long been considered black boxes due to the lack of insights into their decision-making process. However, recent advances in explainable machine learning have turned the tables. Post-hoc explanation methods enable precise relevance attribution of input features for otherwise opaque models such as deep neural networks. This progression has raised expectations that these techniques can uncover attacks against learning-based systems such as adversarial examples or neural backdoors. Unfortunately, current methods are not robust against manipulations themselves.

"SoK: Explainable Machine Learning in Adversarial Environments," IEEE S&P 2024 (paper, poster)

@InProceedings{Noppel2024SoK,
  author    = {Maximilian Noppel and Christian Wressnegger},
  booktitle = {Proc. of the 45th {IEEE} Symposium on Security and Privacy ({S\&P})},
  title     = {{SoK}: {E}xplainable Machine Learning in Adversarial Environments},
  year      = {2024},
  month     = may,
  day       = {20.-23.},
}

"A Brief Systematization of Explanation-Aware Attacks," AI 2024 (paper)

@InProceedings{Noppel2024Brief,
  author    = {Maximilian Noppel and Christian Wressnegger},
  booktitle = {Proc. of 47th German Conference on Artificial Intelligence},
  title     = {A Brief Systematization of Explanation-Aware Attacks},
  year      = 2024,
  month     = sep
}

Attacks Against XAI

Recent research has shown a close connection between explanations and adversarial examples. It thus is not surprising that methods for explaining machine learning have successfully been attacked in a similar setting. With such input-manipulation attacks, it is possible for an adversary to effectively deceive explainable machine-learning methods. An input sample is modified in a way that it shows a specific explanation or generates uninformative output. These attacks are tailored towards individual input samples, limiting their reach. If, however, it were possible to trigger an incorrect or uninformative explanation for any input, an adversary could disguise the reasons for a classifier’s decision and even point towards alternative facts as a red herring on a larger scale.

"Composite Explanation-Aware Attacks," IEEE DLSP 2025 (paper)

@InProceedings{Noppel2025Composite,
  author    = {Maximilian Noppel and Christian Wressnegger},
  booktitle = {Proc. of the 8th {IEEE} Deep Learning Security and Privacy Workshop ({DLSP})},
  title     = {Composite Explanation-Aware Attacks},
  year      = {2025},
  month     = may,
  day       = {15.}
}

"Model-Manipulation Attacks Against Black-Box Explanations," ACSAC 2024 (project page, paper, code)

@InProceedings{Hegde2024Model,
  author    = {Achyut Hegde and Maximilian Noppel and Christian Wressnegger},
  booktitle = {Proc. of the 40th Annual Computer Security Applications Conference ({ACSAC})},
  title     = {Model-Manipulation Attacks Against Black-Box Explanations},
  year      = {2024},
  month     = dec,
  day       = {9.-13.}
}

"Disguising Attacks with Explanation-Aware Backdoors," IEEE S&P 2023 (project page, paper, video, code)

@InProceedings{Noppel2023Disguising,
  author    = {Maximilian Noppel and Lukas Peter and Christian Wressnegger},
  booktitle = {Proc. of the 44th {IEEE} Symposium on Security and Privacy ({S\&P})},
  title     = {Disguising Attacks with Explanation-Aware Backdoors},
  year      = {2023},
  month     = may,
  day       = {22.-25.}
}

"Poster: Fooling XAI with Explanation-Aware Backdoors," CCS 2023 (poster)

@InProceedings{Noppel2023Poster,
  author    = {Maximilian Noppel and Christian Wressnegger},
  booktitle = {Proc. of 30th ACM Conference on Computer and Communications Security ({CCS})},
  title     = {{Poster}: {F}ooling {XAI} with Explanation-Aware Backdoors},
  year      = 2023,
  month     = nov,
}

"Explanation-Aware Backdoors in a Nutshell," AI 2023 (paper)

@InProceedings{Noppel2023ExplanationAware,
  author    = {Maximilian Noppel and Christian Wressnegger},
  booktitle = {Proc. of 46th German Conference on Artificial Intelligence},
  title     = {Explanation-Aware Backdoors in a Nutshell},
  year      = 2023,
  month     = sep
}

XAI for Computer Security

Machine learning models have become ubiquitous in the computer security domain for tasks like malware detection, binary analysis or vulnerability detection. One drawback of these methods is, however, that their decisions are opaque and leave back the practitioner with the question "What has my model actually learned?". This is especially true for neural networks which show great performance in many tasks but use millions of parameters in complex decision functions at the same time.

"LeaX: Class-Focused Explanations for Locating Leakage in Learning-based Profiling Attacks," ARES 2025 (paper)

@InProceedings{Lei2021LeaX,
  author    = {Qi Lei and Christian Wressnegger},
  booktitle = {Proc. of the International Conference on Availability, Reliability and Security ({ARES})},
  title     = {{LeaX}: Class-Focused Explanations for Locating Leakage in Learning-based Profiling Attacks},
  year      = {2025},
  month     = aug,
}

"TagVet: Vetting Malware Tags using Explainable Machine Learning," EuroSec 2021 (paper, video, code)

@InProceedings{Pirch2021TagVet,
  author    = {Lukas Pirch and Alexander Warnecke and Christian Wressnegger and Konrad Rieck},
  booktitle = {Proc. of 14th European Workshop on System Security ({EUROSEC})},
  title     = {{TagVet}: Vetting Malware Tags using Explainable Machine Learning},
  year      = {2021},
  month     = apr,
  day       = {25.}
}

"Evaluating Explanation Methods for Deep Learning in Security," EuroS&P 2020 (project page, paper, demo, code)

@InProceedings{Warnecke2020Evaluating,
  author    = {Alexander Warnecke and Daniel Arp and Christian Wressnegger and Konrad Rieck},
  booktitle = {Proc. of 5th {IEEE} European Symposium on Security and Privacy ({EuroS&P})},
  title     = {Evaluating Explanation Methods for Deep Learning in Security},
  year      = {2020},
  month     = sep
}

Explainable AI & Computer Security

XAI in Adversarial Environments

Attacks Against XAI

XAI for Computer Security