Anthropic shipped a quiet but useful API change alongside Opus 4.8: you can now inject a {"role": "system"} message anywhere in the messages array, not just at the top-level system field. The instruction applies from that point in the conversation forward, with system-level priority over user turns, and without invalidating the prompt cache for everything that came before.

The feature is called mid-conversation system messages. It is available on Claude API and AWS Claude Platform with Opus 4.8 — no beta header required.


The Problem It Solves

Prompt caching hashes the request in order: tools → system → messages. The top-level system field sits near the beginning of that hash. Editing it — even appending a single sentence — produces a different hash, and every cached turn after it misses the cache.

For long-running agents, this is expensive. A session with 50 cached exchanges where you need to update the system prompt mid-run means paying to reprocess those 50 turns on every subsequent request, instead of reading them from cache at the $0.50/MTok rate.

The three workarounds builders have used, and why each falls short:

Fake user turn. Add the new instruction as a user message. Works for cache preservation, but the model treats user content as data to interpret, not instructions with system-level authority. Results are inconsistent — Claude may follow the instruction or argue with it.

Reconstruct the system prompt. Edit the top-level system field and rebuild from scratch. Authoritative, but kills the cache for the entire conversation.

Keep all future instructions in the original system prompt. Anticipate everything upfront. Brittle — you cannot always predict what a long-running agent session will need.

Mid-conversation system messages close this gap by letting you append the instruction after the stable cached prefix, where it does not change the hash.


How It Works

Add a message with "role": "system" to the messages array. The instruction applies from that point onward. When instructions conflict, later system messages take precedence over earlier ones, and mid-conversation system messages take precedence over the top-level system field for turns that follow them.

Placement rules:

  • A mid-conversation system message must immediately follow a user turn (or an assistant turn that ends in a server tool use)
  • It must either be the last entry in messages or be followed by an assistant turn
  • It cannot be the first message in the array — use the top-level system field for that
  • Consecutive system messages are not allowed — merge instructions or wait for the next user turn

Violating placement returns a 400 error.

Python example:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=4096,
    cache_control={"type": "ephemeral"},
    system="You are a code review assistant. Flag issues; be concise.",
    messages=[
        {
            "role": "user",
            "content": "Review the auth module for security issues."
        },
        {
            "role": "assistant",
            "content": "Found three issues: no rate limiting on /login, "
                       "session tokens lack HttpOnly flag, bcrypt rounds at 8 "
                       "(recommend 12+)."
        },
        {
            "role": "user",
            "content": "Now review the payment module."
        },
        # Tool result discovered the user is on an enterprise plan
        # with PCI-DSS requirements. Inject this as system-level
        # authority without rewriting the cached history above.
        {
            "role": "system",
            "content": "This session is now operating under PCI-DSS audit mode. "
                       "Flag any finding that could fail a PCI audit. "
                       "Do not suggest fixes that require SDK upgrades without "
                       "noting the change-freeze risk."
        }
    ],
)

The previous turns stay byte-identical. The cache built on those turns still hits. Only the new system message and the user’s latest request are processed as fresh input.


Practical Use Cases

Permission grants mid-session. A session-level mode switch (like Opus 4.8’s Dynamic Workflows) can use a mid-conversation system message to grant the agent standing permission to spawn parallel subagents. A short refresher every several turns keeps the permission active; an exit notice turns it off. No need to rebuild the full prompt.

Token budget adjustments. A monitoring layer detects that a session is approaching its limit. Append a system message instructing Claude to compress responses and skip exhaustive examples. This does not require interrupting the user flow or reconstructing history.

Tool-driven policy switches. A tool call returns that a customer is on a premium enterprise plan. Inject a system message that updates behavioral expectations — “do not suggest downgrade options,” “always include SLA guarantees in responses.” The instruction has system authority, not user authority.

Mid-session constraint additions. A code review session discovers that the target codebase has a strict no-external-dependencies policy. Append a system message with that constraint. All subsequent suggestions respect it automatically.

Persona changes. An agent shifts from discovery mode to execution mode. A system message marks the transition, overriding tone, verbosity, and output format instructions from the original system prompt.


Combining With Prompt Caching

Caching must be explicitly enabled — a mid-conversation system message does not activate caching on its own. Use either the top-level cache_control: {"type": "ephemeral"} field for automatic caching, or place an explicit cache_control breakpoint on a specific content block.

The pattern:

  1. Place the cache breakpoint at the end of your stable prefix (typically after your tool definitions or a stable system prompt block)
  2. Append new system messages after the breakpoint — they do not change the prefix hash
  3. On the next turn, the new system message becomes part of the stable history and is itself cacheable

Avoid editing or removing a system message you have already sent. Like any other change to earlier turns, rewriting a mid-conversation system message invalidates the cache from that point forward. If the instruction needs to evolve, append a new system message rather than rewriting the old one.


What It Is Not

Not a security boundary. A mid-conversation system message gives an instruction system-level priority over user turns in the conversation structure. It does not make untrusted content trustworthy. If you are building instructions from tool results or third-party data, follow the standard guidance on prompt injection mitigation — the role of the message does not sanitize its content.

Not a substitute for agent orchestration. Mid-conversation system messages are good for updating a single agent’s behavioral context mid-run. They are not a mechanism for spawning subagents or coordinating multi-agent workflows — that is Dynamic Workflows’ job.


Platform Availability

Platform Available
Anthropic API (claude.ai/api) Yes
AWS Claude Platform Yes
Amazon Bedrock No
Google Vertex AI No
Microsoft Azure Foundry No

Requires Opus 4.8. The feature is not available on any prior Claude model.


Builder Checklist

  • Identify agent sessions where you currently edit the top-level system field mid-run or fake a user turn for instruction updates
  • Switch those to {"role": "system"} entries appended after the latest user turn
  • Enable cache_control: {"type": "ephemeral"} if not already set — the cache savings only apply when caching is active
  • If on Bedrock, Vertex, or Foundry: this feature is not available; continue with your existing approach or migrate to Anthropic API
  • Do not edit previously sent system messages — append new ones instead
  • If injecting system messages from tool results or third-party data, apply the same prompt injection mitigations you use for user content — the system role does not sanitize the content