Methodology

Risk Assessment Framework

Five core dimensions that measure the intrinsic risk of AI agents, with a regulatory and industry overlay that adds workflow-specific compliance and governance assessment. Quantitative, regulatory-native, and built for both initial assessment and continuous monitoring.

Core Dimensions

Overlay — Applied across all dimensions

01

Output Integrity

Is what the agent produces accurate, appropriate, and trustworthy?

A hallucinated interest rate in a disclosure. A fabricated compliance citation. An unexplainable credit decision. In financial services, every output the agent produces carries potential regulatory, financial, and reputational liability — and that liability scales with every interaction. We measure not just whether the agent gets things right, but how it behaves when it doesn’t know, when it’s pushed off-topic, and when it needs to explain itself.

What we assess

1.1Factual Accuracy & Hallucination

Measured hallucination rates under realistic conditions. Severity classification — cosmetic, material, or dangerous. Behavior when information is insufficient: fabrication, hedging, or escalation. Consistency across repeated queries.

1.2Scope Adherence

Domain boundary robustness. Resistance to being led off-topic under multi-turn conversational pressure. Behavior when asked to operate outside its intended scope.

1.3Explainability & Reasoning

Faithfulness of explanations to actual reasoning — not post-hoc rationalizations. Detail sufficient for regulatory inquiry, adverse action notices, and customer complaints.

1.4Content Appropriateness

Prevention of harmful, misleading, or inappropriate outputs. Tone suitability for the customer segment. IP and licensing considerations for generated content.

Continuous monitoring metrics

Hallucination rateScope violationsEscalation rateExplanation consistency

Four Assessment Channels

Every dimension is assessed through four independent channels. Where they converge, confidence is high. Where they diverge, we investigate further.

Governance Review

Documentation, policies, oversight mechanisms, vendor contracts, incident response plans, and organizational readiness — the controls that wrap around the technology.

Adversarial Testing

Automated and manual red-teaming with attack scenarios designed for the specific deployment context. Prompt injection, boundary probing, data extraction, social engineering — tested the way real adversaries operate.

Production Analysis

Analysis of actual agent behavior in production or staging — error rates, near-miss patterns, behavioral drift, anomalous outputs, escalation frequency. What the agent actually does, not what documentation says it should do.

Regulatory Mapping

Explicit mapping of every finding to applicable regulations. Gap identification against SR 11-7, EU AI Act, TPRM guidance, state laws, and sector-specific requirements. Output formatted for regulatory examination.

Regulatory Crosswalks

Every finding maps to the regulations your compliance team reports against. The assessment output is directly usable in regulatory examinations — no translation required.

SR 11-7

Model Risk Management

EU AI Act

High-Risk AI Systems

TPRM Guidance

Third-Party Risk

NIST AI RMF

Risk Management Framework

EU DORA

Operational Resilience

State AI Laws

CO, IL, CA, NYC

How an Assessment Works

A bank asks Arc to assess a KYC/AML agent before deployment.

01

Scope

Identify workflow type, agent capabilities, regulatory jurisdictions, and deployment context. The right regulatory modules activate automatically based on the use case.

02

Assess

Run all five core dimensions through four independent assessment channels — governance review, adversarial testing, production analysis, and regulatory mapping.

03

Measure

Produce quantified risk metrics for every dimension and sub-dimension. Hallucination rates, action error rates, leakage scores, compliance gaps — numbers, not checkboxes.

04

Report

CRO receives an executive risk profile. Model risk team receives detailed regulatory-mapped findings. Agent provider receives a prioritized remediation roadmap.

Built differently

Quantitative, not binary

Generic frameworks record pass or fail. We measure hallucination rates, action error frequencies, leakage resistance scores, and compliance gap counts. Every metric that can produce a number produces a number — because quantified risk is actionable risk.

Regulatory-native, not regulatory-mapped

We don’t build a framework and crosswalk it to regulations as an afterthought. We start from what SR 11-7, the EU AI Act, and state laws actually require, and organize our checks to satisfy those requirements directly. The assessment output is usable in regulatory examinations without translation.

Workflow-specific, not agent-generic

A KYC agent and a customer service chatbot have completely different risk profiles — different regulations, different failure modes, different liability exposure. Our regulatory modules assess the agent against the specific requirements of the specific workflow it operates in.

Outputs separated from actions

A hallucinated response and an unauthorized transaction are both failures — but they have different testing methodologies, different controls, different loss mechanics, and different remediation paths. We assess them as distinct risk surfaces because that’s what they are.

Built for continuous monitoring

Every dimension works in both assessment mode and monitoring mode. The initial assessment establishes a baseline. Continuous monitoring tracks drift from that baseline — so risk estimates update before incidents occur, not after.

Governance embedded, not bolted on

Governance isn’t a standalone dimension disconnected from the risk surfaces it’s supposed to manage. It’s assessed as a cross-cutting overlay applied to every core dimension — because the quality of change management, oversight, and accountability directly affects every risk surface.

Assess your AI agent deployments

Talk to our team about running an Arc assessment for your workflows.

Request an assessment