Public Framework — Private Engine

Trust Center

The NGIQ-ATE™ evaluation framework is openly documented at the domain level. The scoring engine — calibration weights, threshold logic, and behavioral test scenarios — is proprietary IP.

What We Evaluate

NGIQ-ATE™ evaluates AI agents and the organizations that operate them — not just the model. This is the key differentiator: an agent with a strong LLM backbone but no governance structure cannot achieve a passing trust level.

Our methodology synthesizes internationally recognized standards including the NIST AI Risk Management Framework, Cloud Security Alliance frameworks, and EU digital identity regulations — applied across six evaluation domains.

The domain framework is published here. The weighted scoring formula, calibration thresholds, and behavioral test scenarios that produce a final score are proprietary and not disclosed.

Evaluation Tiers

L-AAssessed

Evidence and governance documentation review. For enterprise agents behind firewalls that cannot be accessed externally.

Best for: Enterprise / firewall-restricted

L-TTested

Evidence review plus structured behavioral assessment via our proprietary testing pipeline.

Best for: Cloud-hosted / API-accessible

L-VVerified

All of Tested, plus continuous monitoring via heartbeat integration and anomaly detection.

Best for: Production / financial / regulated

Evaluation Domains

Ownership

Every deployed agent must have a named human principal who accepts accountability for its actions. We evaluate the completeness of ownership documentation, the clarity of escalation paths, and whether the operator has defined and tested a shutdown procedure. Without clear ownership, no level of technical sophistication can produce a trustworthy agent.

Identity

An agent that cannot reliably prove its own identity creates systemic risk in multi-agent and enterprise environments. We assess whether the agent holds a globally unique identifier, publishes a verifiable agent card, and authenticates to external systems using scoped, time-limited credentials rather than shared or standing access.

Task

Agents must operate within declared boundaries and resist attempts to expand their scope. We evaluate whether capability boundaries are formally declared, whether inputs are validated against injection and manipulation, and whether tool authorization is enforced at the platform level rather than relying on prompt instructions alone.

Safety

When an agent fails or encounters adversarial conditions, the consequences must be bounded and recoverable. We review consequential action logging, reversibility design, human oversight gates for high-stakes decisions, and documented refusal behavior. An agent with no shutdown mechanism cannot achieve a passing evaluation.

Governance

Trust requires institutional process, not just good engineering. We assess the operator's policy documentation, alignment with established AI risk management frameworks, update and patch cadence, immutable audit trail coverage, and the maturity of the incident response procedure.

Runtime

Governance documents and architecture diagrams describe intent. Runtime evaluation validates reality. For Tested and Verified tier engagements, we assess whether observed behavior is consistent with declared capabilities, and whether monitoring infrastructure can detect and alert on behavioral drift over time.

Issuer Policy

NextGenIQ is a DID-resolvable issuer at did:web:nextgeniq.app. Our issuer policy — covering evidence requirements, scoring dispute process, badge suspension, and revocation rules — is published separately.

Request an Audit →