Explainable AI & Computer Security

https://intellisec.de/research

XAI in Adversarial Environments

Modern deep learning methods have long been considered black boxes due to the lack of insights into their decision-making process. However, recent advances in explainable machine learning have turned the tables. Post-hoc explanation methods enable precise relevance attribution of input features for otherwise opaque models such as deep neural networks. This progression has raised expectations that these techniques can uncover attacks against learning-based systems such as adversarial examples or neural backdoors. Unfortunately, current methods are not robust against manipulations themselves.

  • "A Brief Systematization of Explanation-Aware Attacks," AI 2024 (paper)
    @InProceedings{Noppel2024Brief,
      author    = {Maximilian Noppel and Christian Wressnegger},
      booktitle = {Proc. of 47th German Conference on Artificial Intelligence},
      title     = {A Brief Systematization of Explanation-Aware Attacks},
      year      = 2024,
      month     = sep
    }

Attacks Against XAI

Recent research has shown a close connection between explanations and adversarial examples. It thus is not surprising that methods for explaining machine learning have successfully been attacked in a similar setting. With such input-manipulation attacks, it is possible for an adversary to effectively deceive explainable machine-learning methods. An input sample is modified in a way that it shows a specific explanation or generates uninformative output. These attacks are tailored towards individual input samples, limiting their reach. If, however, it were possible to trigger an incorrect or uninformative explanation for any input, an adversary could disguise the reasons for a classifier’s decision and even point towards alternative facts as a red herring on a larger scale.

  • "Model-Manipulation Attacks Against Black-Box Explanations," ACSAC 2024 (project page, paper, code)
    @InProceedings{Hegde2024Model,
      author    = {Achyut Hegde and Maximilian Noppel and Christian Wressnegger},
      booktitle = {Proc. of the 40th Annual Computer Security Applications Conference ({ACSAC})},
      title     = {Model-Manipulation Attacks Against Black-Box Explanations},
      year      = {2024},
      month     = dec,
      day       = {9.-13.}
    }
  • "Disguising Attacks with Explanation-Aware Backdoors," IEEE S&P 2023 (project page, paper, video, code)
    @InProceedings{Noppel2023Disguising,
      author    = {Maximilian Noppel and Lukas Peter and Christian Wressnegger},
      booktitle = {Proc. of the 44th {IEEE} Symposium on Security and Privacy ({S\&P})},
      title     = {Disguising Attacks with Explanation-Aware Backdoors},
      year      = {2023},
      month     = may,
      day       = {22.-25.}
    }
  • "Poster: Fooling XAI with Explanation-Aware Backdoors," CCS 2023 (poster)
    @InProceedings{Noppel2023Poster,
      author    = {Maximilian Noppel and Christian Wressnegger},
      booktitle = {Proc. of 30th ACM Conference on Computer and Communications Security ({CCS})},
      title     = {{Poster}: {F}ooling {XAI} with Explanation-Aware Backdoors},
      year      = 2023,
      month     = nov,
    }
  • "Explanation-Aware Backdoors in a Nutshell," AI 2023 (paper)
    @InProceedings{Noppel2023ExplanationAware,
      author    = {Maximilian Noppel and Christian Wressnegger},
      booktitle = {Proc. of 46th German Conference on Artificial Intelligence},
      title     = {Explanation-Aware Backdoors in a Nutshell},
      year      = 2023,
      month     = sep
    }

XAI for Computer Security

Machine learning models have become ubiquitous in the computer security domain for tasks like malware detection, binary analysis or vulnerability detection. One drawback of these methods is, however, that their decisions are opaque and leave back the practitioner with the question "What has my model actually learned?". This is especially true for neural networks which show great performance in many tasks but use millions of parameters in complex decision functions at the same time.

  • "Evaluating Explanation Methods for Deep Learning in Security," EuroS&P 2020 (project page, paper, demo, code)
    @InProceedings{Warnecke2020Evaluating,
      author    = {Alexander Warnecke and Daniel Arp and Christian Wressnegger and Konrad Rieck},
      booktitle = {Proc. of 5th {IEEE} European Symposium on Security and Privacy ({EuroS&P})},
      title     = {Evaluating Explanation Methods for Deep Learning in Security},
      year      = {2020},
      month     = sep
    }
  • "TagVet: Vetting Malware Tags using Explainable Machine Learning," EuroSec 2021 (paper, video, code)
    @InProceedings{Pirch2021TagVet,
      author    = {Lukas Pirch and Alexander Warnecke and Christian Wressnegger and Konrad Rieck},
      booktitle = {Proc. of 14th European Workshop on System Security ({EUROSEC})},
      title     = {{TagVet}: Vetting Malware Tags using Explainable Machine Learning},
      year      = {2021},
      month     = apr,
      day       = {25.}
    }