Know when your model will fail — before it happens

Language models in production break in ways you cannot predict by looking at inputs alone. ACT monitors the output — the only thing you can always observe — and tells you when something is changing.

The problem

Every organization deploying LLMs faces the same blind spot: you don't control what the model does. You can filter inputs. You can add guardrails. But the model's behavior can shift — due to adversarial manipulation, silent vendor updates, context accumulation, or simply because the model operates differently at scale than it did in testing.

Input-side defenses catch known patterns. They miss everything else. White-box monitoring requires access to weights most deployers don't have. Using another LLM to judge the first one is expensive, unreliable, and circular.

What you need is a way to detect that the model's behavior is changing — regardless of why. That's what ACT does.

Three components, one system

ACT is built on three independent research efforts that work together to give you full visibility over your model's behavior.

ACT — Passive Monitoring

Attractor Conflict Telemetry

ACT computes 24 deterministic metrics on every model response. No access to weights, no API to the model's internals — just the text it produces.

The principle is straightforward: when a model's internal behavioral balance shifts — whether due to adversarial pressure, degraded alignment, or any other cause — the output text changes in measurable ways. Less hedging. Different sentence structures. Shifts in vocabulary distribution. These changes are often invisible to a human reader but statistically detectable.

The metrics are organized across six levels of analysis, from basic token statistics to composite scoring. Each metric is compared against a deployment-specific baseline using z-scores, producing calibrated alerts — green, yellow, red — with no false-positive noise from irrelevant thresholds.

What this means for you:

— Continuous, real-time visibility into model behavior in production
— Detect behavioral drift before it reaches your users
— Works with any model, any provider — no integration required beyond reading the output
— Deterministic: same input always produces same measurement, no stochastic variability

ACTIVE — Controlled Probing

Attractor Conflict Testing via Induced Elicitation

Passive monitoring tells you what the model is doing. Active probing tells you what the model would do under specific conditions.

ACTIVE sends controlled stimuli — carefully designed prompts — to the model and measures how it responds. This allows you to map the model's behavioral boundaries: where does it start refusing? Where does it become unreliable? How does it react to different framing of the same request?

The system classifies behavioral changes into distinct patterns: progressive drift (slow degradation over many turns), acute collapse (sudden failure), boundary oscillation (inconsistent behavior on similar inputs), and sub-threshold migration (changes too small to trigger alerts individually, but significant over time).

What this means for you:

— Proactively test your model's reliability before problems reach production
— Compare models objectively — same tests, same metrics, quantified differences
— Detect silent vendor updates that change model behavior without notice
— Understand where your model's behavioral limits are, not just that they exist

SIGTRACK — Behavioral Memory

Signature Tracking for Regime and Attractor Conflict Knowledge

ACT measures. ACTIVE tests. SIGTRACK remembers.

Every behavioral observation is recorded in an append-only, hash-chained forensic ledger — an immutable audit trail with cryptographic integrity. When an incident occurs, you can reconstruct exactly what the model was doing in the hours and days before it happened.

Beyond storage, SIGTRACK extracts behavioral signatures — recurring patterns in metric trajectories that characterize specific types of behavioral change. Once the system has seen a pattern, it recognizes it instantly the next time. This means faster detection, fewer false positives, and a system that gets better with operational experience.

What this means for you:

— Full audit trail for compliance, incident investigation, and regulatory evidence
— Pattern recognition that improves over time — the system learns from confirmed events
— Forensic reconstruction: trace any incident back to its earliest warning signs
— Tamper-proof records — cryptographic hash chain ensures data integrity

Why output-only monitoring works

The core principle behind ACT is simple: when a model's behavior changes, the text it produces changes too. Not just in content — in statistical structure. Sentence length distributions shift. Vocabulary diversity changes. The balance between cautious and assertive language moves.

These changes happen because the model's internal state determines its output distribution. A model under adversarial pressure doesn't produce the same text as a model operating normally — it can't, because the probability distribution it's sampling from has been altered. The behavioral shift and the statistical signature are the same phenomenon, observed at different levels.

This means that any effective manipulation leaves a trace in the output. An attacker who eliminates all statistical signatures has, by definition, failed to change the model's behavior. There is a fundamental trade-off between attack effectiveness and evasion — and ACT sits exactly at that boundary.

Start monitoring your models

24 metrics. Real-time analysis. No model access required.

Get Started API Docs