EU AI Act Article 15: What Adversarial Testing Actually Means

Louis Sanchez May 7, 2026 10 min read
EU AI Act Article 15 - adversarial attack on a neural network with 12 weeks until the August 2 2026 deadline

August 2, 2026. Roughly twelve weeks from today. A done-right Article 15 testing program — including remediation and retesting — needs at least six weeks of focused work. Anyone telling you they can deliver it in less is cutting corners that auditors and notified bodies will catch.

The EU AI Act (Regulation 2024/1689) entered into force on August 1, 2024. It is the first comprehensive horizontal AI regulation anywhere in the world, and the cybersecurity obligations buried inside it are stricter than most security teams realize. Article 15 is where the teeth are.

This guide walks through what Article 15 actually requires for high-risk AI systems, who's in scope (including non-EU companies), what adversarial testing looks like in practice, and how to prepare in the time you have left before the August 2, 2026 deadline. No regulatory hand-waving — just the specific obligations and what they mean operationally.

The Bottom Line Up Front

Article 15 of the EU AI Act requires high-risk AI systems to be resilient against adversarial attacks, data poisoning, model poisoning, prompt injection, and other AI-specific threats. The August 2, 2026 deadline applies to providers and deployers, including non-EU companies whose AI output is used in the EU. Penalties run up to €15 million or 3% of global annual turnover. Adversarial security testing is, in operational terms, mandatory.

What Article 15 Actually Says

Article 15 is the EU AI Act's accuracy, robustness, and cybersecurity requirements clause for high-risk AI systems. The cybersecurity-specific obligations are concentrated in three sub-paragraphs:

Article 15(1) — Design Requirement

High-risk AI systems must be "designed and developed in such a way that they achieve an appropriate level of accuracy, robustness, and cybersecurity, and that they perform consistently in those respects throughout their lifecycle." This is the high-level mandate; the specifics come later.

Article 15(3) — Resilience to Unauthorised Alteration

The system "shall be resilient against errors, faults or inconsistencies that may occur within the system or the environment in which the system operates." Read narrowly, this is fault tolerance. Read in context with Article 15(5), it's the start of the cybersecurity story.

Article 15(5) — Resilience Against Unauthorised Third Parties

Here's the operative cybersecurity clause: high-risk AI systems "shall be resilient as regards attempts by unauthorised third parties to alter their use, outputs or performance by exploiting system vulnerabilities." This is where security testing becomes mandatory in practice.

Recital 77 — The Technical Specifics

If Article 15 is the legal mandate, Recital 77 is the technical brief. It calls out the specific attack categories that the technical solutions must address:

This list is roughly the AI-security adaptation of what penetration testers already do for traditional applications, with three categories that don't have direct analogues outside AI: data poisoning, model poisoning, and adversarial examples. We cover each below.

What "Adversarial Testing" Actually Requires

The term "adversarial testing" doesn't appear in the regulation in those exact words — it shows up in the official guidance and in industry shorthand. What Article 15 and Recital 77 effectively mandate is a security testing program that exercises the AI system against the threats they enumerate. The closest existing standards are the OWASP LLM Top 10 (2025 edition) and the NIST AI Risk Management Framework. ENISA's Multilayer Framework for Good Cybersecurity Practices for AI is the EU's own reference document.

In operational terms, an Article 15 testing engagement covers six layers:

1. The AI Application Layer

The application that surrounds and exposes the AI — APIs, web interfaces, mobile apps, prompts. This is where traditional web application and API penetration testing applies. Authentication, authorization, input validation, rate limiting, and abuse-case logic. The AI doesn't get tested in isolation.

2. Adversarial Examples and Evasion

Inputs crafted specifically to cause the model to produce a wrong output. For computer-vision systems, this is the canonical "stop sign with stickers" attack. For text models, it's prompt manipulation that bypasses content filters or alters classification. Testing here exercises whether the model degrades safely under crafted inputs.

3. Data Poisoning Resilience

Reviewing training-data integrity controls and supply-chain provenance. If your model retrains on user inputs or scraped data, can an attacker inject poisoned samples that bias the model's future outputs? This is partly an architecture review, partly a controls test on the data ingestion pipeline.

4. Model Poisoning and Backdoor Detection

If you fine-tune on top of a foundation model, who controls that base model and the fine-tuning data? Pre-trained components can carry backdoors — triggers that activate malicious behavior on specific inputs. Testing involves reviewing model-supply-chain controls, checkpoint integrity, and detection of trigger patterns.

5. Confidentiality Attacks

Model extraction (stealing the model via repeated queries), model inversion (reconstructing training data from outputs), and membership inference (determining whether a specific record was in the training set). These are particularly relevant for systems trained on regulated data — health records, financial records, biometric templates.

6. Prompt Injection (Generative AI)

For systems built on large language models, prompt injection is the dominant threat surface. The OWASP LLM Top 10 puts it at LLM01 for a reason. Testing covers direct injection (user manipulates the model's instructions), indirect injection (injection delivered via retrieved content or tool output), and the cascading effects when the model has agentic capabilities or tool access.

How This Differs From a Regular Pentest

A traditional penetration test exercises the system around the AI — authentication, infrastructure, APIs. Article 15 testing also exercises the model itself: input space, training pipeline, model-supply chain, and confidentiality boundaries. Both are required. Skipping either leaves a gap auditors will find.

Who Article 15 Applies To

Article 15 obligations attach to providers and deployers of high-risk AI systems. The categories are listed in Annex III:

  1. Biometrics — including remote biometric identification and emotion recognition
  2. Critical infrastructure — AI used as a safety component in road, rail, water, gas, electricity, etc.
  3. Education and vocational training — admissions, evaluation, monitoring
  4. Employment and worker management — recruitment, performance evaluation, task allocation
  5. Access to essential private and public services — credit scoring, public benefits, emergency dispatch, insurance underwriting
  6. Law enforcement — risk assessment, polygraph-like systems, evidence evaluation
  7. Migration, asylum, and border control — visa eligibility, polygraphs, document verification
  8. Administration of justice and democratic processes — judicial decision support, election influence systems

Plus: AI systems used as a safety component of products covered by the Union harmonisation legislation in Annex I (medical devices, machinery, toys, lifts, radio equipment, etc.).

Providers vs. Deployers vs. Distributors

The obligations differ slightly by role:

Extraterritorial Reach

Article 2 extends the regulation to providers and deployers established outside the EU when the output produced by the AI system is used in the EU. This is the clause most US-based companies underestimate. A US fintech using an AI model to underwrite loans for EU customers is in scope. A US SaaS embedding an AI feature that processes EU user data is in scope. A US healthcare-tech vendor whose AI is used by an EU clinic is in scope.

The Compliance Timeline

The EU AI Act phases in over several years. Here's what's already in force and what's coming:

What "Substantial Modification" Means

If you're relying on the 2027 grace period, watch for substantial modifications that pull you back to the 2026 deadline. A substantial modification is, roughly, a change that alters the intended purpose or that affects compliance with the regulation. Adding new training data is generally considered a substantial modification. Retraining the model is generally considered substantial. So is deploying the model in a new high-risk use case.

Penalties

Article 99 sets the administrative fines:

For SMEs and startups the fine is the lower of the two amounts rather than the higher — small consolation if you're a Series B fintech.

What the Testing Looks Like in Practice

An Article 15 conformity-supporting engagement runs longer than a standard pentest because the AI-specific layers add work that traditional testing doesn't cover. A compressed but still-defensible scope for a high-risk AI system used in financial underwriting looks like this:

Three weeks is the floor for a real Article 15 engagement on a non-trivial system. The exact scope varies — systems using a frozen foundation model with no fine-tuning collapse the data-integrity review, while continuous-learning pipelines require an entirely separate review track for the training pipeline itself.

Red Flag: "We Can Do It in a Week"

Several boutique AI consultancies are now advertising Article 15 testing in 5-7 days at compressed rates. That's not enough time to credibly cover six attack categories and the surrounding application surface and document Annex IV evidence. A one-week engagement is a vulnerability scan with AI-themed branding. Auditors and notified bodies are increasingly aware of the difference.

Common Misconceptions

"We're not in the EU, so this doesn't apply."

Article 2 disagrees. If your AI's output is used in the EU, you're a provider or deployer in scope. The regulation explicitly anticipated this and pulled non-EU companies in.

"We use a foundation model from OpenAI/Anthropic/Google. They handle compliance."

They handle their compliance for the foundation model. You handle yours for the system you build on top. The foundation-model GPAI obligations took effect August 2025 — and those obligations don't transfer their downstream compliance to your high-risk system. You still have to test how your system uses the model, the prompts and tool integrations, and whether your application introduces new attack surface.

"A regular pentest will cover this."

A good pentest covers the application around the AI. It doesn't cover adversarial input testing against the model, training-data poisoning resilience, model-supply-chain integrity, or confidentiality attacks against the model itself. The OWASP LLM Top 10 categories are not in a typical web pentest scope.

"Our AI isn't 'high-risk.'"

Read Annex III carefully before concluding this. Credit scoring, hiring algorithms, performance evaluation, insurance underwriting, and biometric verification are all in scope. Many SaaS companies have at least one feature that lands in a high-risk category they didn't realize was regulated.

"We have a NIST AI RMF assessment, that should cover it."

The NIST AI RMF is a good starting point and many of the controls map across, but it isn't a conformity assessment for the EU AI Act. Annex IV requires specific technical documentation that goes beyond NIST's voluntary framework. Use NIST AI RMF as scaffolding; use the Annex IV checklist as the actual deliverable.

How to Prepare in 6 Weeks

Six weeks is the minimum sprint for a high-risk AI system that hasn't started yet. Anything shorter compromises one of the three things auditors look for: testing coverage, remediation evidence, or technical-file documentation. Here's how the time breaks down:

Week 1: Scoping and Gap Analysis

Week 2: Architecture and Documentation Review

Weeks 3-4: Adversarial Testing

Week 5: Remediation

Week 6: Retest and Conformity Documentation

Six weeks is the floor, not the comfortable timeline. If you're reading this and the deadline is closer than six weeks out, prioritize the testing first — gap analysis can run in parallel and Annex IV documentation can be assembled after the testing produces evidence to document. But understand: any provider promising a faster turnaround is selling you a checkbox, not compliance.

Frequently Asked Questions

Does the EU AI Act actually require penetration testing?

Yes, in operational terms, for high-risk AI systems. Article 15(3) and (5) require resilience against attempts to alter the system's use, outputs, or performance. Recital 77 specifies the technical solutions must address data poisoning, model poisoning, adversarial examples, confidentiality attacks, and model flaws. There's no other practical way to demonstrate resilience to those threats than to test against them.

How is Article 15 testing different from a regular penetration test?

A traditional pentest covers the application, infrastructure, and authentication around the AI. Article 15 testing also covers the model itself: adversarial inputs, training-data integrity, model-supply-chain controls, and confidentiality boundaries. Both are required for full compliance.

Does Article 15 apply if I'm not based in the EU?

Yes, if your AI's output is used in the EU. Article 2 explicitly extends the regulation to non-EU providers and deployers when the output is used inside the EU. This pulls in most US-based companies serving EU customers.

What's the penalty for non-compliance?

Up to €15 million or 3% of total worldwide annual turnover (whichever is higher) for non-compliance with operator and provider obligations. Higher fines (up to €35 million or 7%) for prohibited AI practices.

Can our internal team do the adversarial testing?

Technically yes, but most regulators and notified bodies expect testing performed by an independent function or third party. Internal testing also tends to suffer from the "we built it, we know how it works" bias that adversarial testing specifically tries to defeat. An independent LLM/AI penetration test produces evidence that's stronger for conformity purposes.

What about general-purpose AI (GPAI) models?

GPAI obligations are separate (Articles 51-55) and took effect August 2, 2025. If you provide a GPAI model with systemic risk, you have your own testing, documentation, and incident reporting obligations on top of Article 15 if you also deploy it as a high-risk system.

Need Article 15-Ready Testing Before August?

We deliver AI red-teaming and adversarial testing aligned to Article 15, Recital 77, and the OWASP LLM Top 10 — with conformity-ready reporting designed to slot into your Annex IV technical file.

EU AI Act Testing Services Request a Quote

Related Reading