Risk Assessment Framework
Five core dimensions that measure the intrinsic risk of AI agents, with a regulatory and industry overlay that adds workflow-specific compliance and governance assessment. Quantitative, regulatory-native, and built for both initial assessment and continuous monitoring.
Core Dimensions
Overlay — Applied across all dimensions
Output Integrity
Is what the agent produces accurate, appropriate, and trustworthy?
A hallucinated interest rate in a disclosure. A fabricated compliance citation. An unexplainable credit decision. In financial services, every output the agent produces carries potential regulatory, financial, and reputational liability — and that liability scales with every interaction. We measure not just whether the agent gets things right, but how it behaves when it doesn’t know, when it’s pushed off-topic, and when it needs to explain itself.
What we assess
Measured hallucination rates under realistic conditions. Severity classification — cosmetic, material, or dangerous. Behavior when information is insufficient: fabrication, hedging, or escalation. Consistency across repeated queries.
Domain boundary robustness. Resistance to being led off-topic under multi-turn conversational pressure. Behavior when asked to operate outside its intended scope.
Faithfulness of explanations to actual reasoning — not post-hoc rationalizations. Detail sufficient for regulatory inquiry, adverse action notices, and customer complaints.
Prevention of harmful, misleading, or inappropriate outputs. Tone suitability for the customer segment. IP and licensing considerations for generated content.
Continuous monitoring metrics
Four Assessment Channels
Every dimension is assessed through four independent channels. Where they converge, confidence is high. Where they diverge, we investigate further.
Governance Review
Documentation, policies, oversight mechanisms, vendor contracts, incident response plans, and organizational readiness — the controls that wrap around the technology.
Adversarial Testing
Automated and manual red-teaming with attack scenarios designed for the specific deployment context. Prompt injection, boundary probing, data extraction, social engineering — tested the way real adversaries operate.
Production Analysis
Analysis of actual agent behavior in production or staging — error rates, near-miss patterns, behavioral drift, anomalous outputs, escalation frequency. What the agent actually does, not what documentation says it should do.
Regulatory Mapping
Explicit mapping of every finding to applicable regulations. Gap identification against SR 11-7, EU AI Act, TPRM guidance, state laws, and sector-specific requirements. Output formatted for regulatory examination.
Regulatory Crosswalks
Every finding maps to the regulations your compliance team reports against. The assessment output is directly usable in regulatory examinations — no translation required.
SR 11-7
Model Risk Management
EU AI Act
High-Risk AI Systems
TPRM Guidance
Third-Party Risk
NIST AI RMF
Risk Management Framework
EU DORA
Operational Resilience
State AI Laws
CO, IL, CA, NYC
How an Assessment Works
A bank asks Arc to assess a KYC/AML agent before deployment.
Scope
Identify workflow type, agent capabilities, regulatory jurisdictions, and deployment context. The right regulatory modules activate automatically based on the use case.
Assess
Run all five core dimensions through four independent assessment channels — governance review, adversarial testing, production analysis, and regulatory mapping.
Measure
Produce quantified risk metrics for every dimension and sub-dimension. Hallucination rates, action error rates, leakage scores, compliance gaps — numbers, not checkboxes.
Report
CRO receives an executive risk profile. Model risk team receives detailed regulatory-mapped findings. Agent provider receives a prioritized remediation roadmap.
Built differently
Quantitative, not binary
Generic frameworks record pass or fail. We measure hallucination rates, action error frequencies, leakage resistance scores, and compliance gap counts. Every metric that can produce a number produces a number — because quantified risk is actionable risk.
Regulatory-native, not regulatory-mapped
We don’t build a framework and crosswalk it to regulations as an afterthought. We start from what SR 11-7, the EU AI Act, and state laws actually require, and organize our checks to satisfy those requirements directly. The assessment output is usable in regulatory examinations without translation.
Workflow-specific, not agent-generic
A KYC agent and a customer service chatbot have completely different risk profiles — different regulations, different failure modes, different liability exposure. Our regulatory modules assess the agent against the specific requirements of the specific workflow it operates in.
Outputs separated from actions
A hallucinated response and an unauthorized transaction are both failures — but they have different testing methodologies, different controls, different loss mechanics, and different remediation paths. We assess them as distinct risk surfaces because that’s what they are.
Built for continuous monitoring
Every dimension works in both assessment mode and monitoring mode. The initial assessment establishes a baseline. Continuous monitoring tracks drift from that baseline — so risk estimates update before incidents occur, not after.
Governance embedded, not bolted on
Governance isn’t a standalone dimension disconnected from the risk surfaces it’s supposed to manage. It’s assessed as a cross-cutting overlay applied to every core dimension — because the quality of change management, oversight, and accountability directly affects every risk surface.
Assess your AI agent deployments
Talk to our team about running an Arc assessment for your workflows.
Request an assessment