0% Complete

Production Readiness Audit

Score production readiness for an AI workflow in under 25 minutes

Overview

Teams ship pilots quickly, then stall at production because reliability, security, and operating model work starts too late.

This self-assessment identifies the gaps that block a production launch and helps you prioritize the next three actions that reduce risk and rework.

Who should complete it:

Scoring Rubric

0: Not in place

1: Partially in place. Ad hoc, inconsistent, or undocumented

2: Mostly in place. Measured and repeatable for the core workflow

3: Fully in place. Standardized, measured, and audit-defensible

Business Outcome and Scope

Max Score: 18 points
The workflow is defined with one primary business KPI and two supporting KPIs.
Acceptance criteria include correctness, latency, and cost targets.
The solution boundary is explicit. Systems, users, and data domains are in scope or out of scope.
A baseline exists for current performance. The target improvement is quantified.
Failure modes are documented. Unsafe or low confidence outputs trigger a safe fallback.
A named executive sponsor owns the outcome and approves the production go live.

Data and Integration Readiness

Max Score: 18 points
Source data owners are known. Access, retention, and sharing constraints are documented.
Data pipelines have clear freshness SLAs and known upstream dependencies.
PII and sensitive fields are identified and handled with least privilege access.
Inputs are validated and schema drift is detected before downstream impact.
Integration points use stable contracts. API and schema versioning exists.
A backfill plan exists for data corrections, reprocessing, and recovery.

Model and Prompt Reliability

Max Score: 18 points
A test set exists that represents real user requests and edge cases.
Evaluation includes groundedness and citation quality for knowledge based answers.
Prompt and retrieval changes run through regression tests before release.
Hallucination risk is reduced with tool use, constraints, and refusal behavior.
Human in the loop review exists for high risk actions or regulated outputs.
Model choice, temperature, tools, and context limits are documented and justified.

Security, Privacy, and Compliance

Max Score: 18 points
Authentication and authorization are enforced end to end, including retrieval permissions.
Secrets are stored in a managed vault. No secrets exist in code or prompts.
Logging captures prompts, retrieved context IDs, tool calls, and outputs with trace IDs.
Data minimization is applied. Only the required context is retrieved and stored.
Threat modeling covers prompt injection, data exfiltration, and indirect tool abuse.
A compliance review path exists for regulated data, including evidence collection.

Reliability, Monitoring, and Cost Controls

Max Score: 18 points
SLOs exist for latency, error rate, and answer quality. Owners receive alerts.
Observability includes request tracing across retrieval, model calls, and downstream tools.
Rate limits, retries, circuit breakers, and graceful degradation are implemented.
Cost budgets exist per workflow. Token usage and vendor costs are monitored.
A rollback plan exists for model, prompt, retrieval index, and feature flag changes.
Incident response includes runbooks, on call routing, and post incident reviews.

Operating Model and Change Management

Max Score: 18 points
A RACI exists for product, data, security, and platform responsibilities.
Release management exists. Changes ship via CI CD with approvals and audit trails.
User training and adoption measurement exist for the target roles.
Feedback loops exist. Misfires, low confidence outputs, and user corrections are captured.
Vendor risk management is completed for any model or tool providers.
A quarterly roadmap exists for evaluation improvements, cost reduction, and scale.

Get Your Detailed Results

Submit your information to receive a comprehensive Production Readiness Assessment.

Your Production Readiness Score

0

Score Breakdown by Section

Recommended Next Steps