Anthropic shipped a quiet but useful API change alongside Opus 4.8: you can now inject a {"role": "system"} message anywhere in the messages array, not just at the top-level system field. The instruction applies from that point in the conversation forward, with system-level priority over user turns, and without invalidating the prompt cache for everything that came before.
The feature is called mid-conversation system messages. It is available on Claude API and AWS Claude Platform with Opus 4.8 — no beta header required.
The Problem It Solves
Prompt caching hashes the request in order: tools → system → messages. The top-level system field sits near the beginning of that hash. Editing it — even appending a single sentence — produces a different hash, and every cached turn after it misses the cache.
For long-running agents, this is expensive. A session with 50 cached exchanges where you need to update the system prompt mid-run means paying to reprocess those 50 turns on every subsequent request, instead of reading them from cache at the $0.50/MTok rate.
The three workarounds builders have used, and why each falls short:
Fake user turn. Add the new instruction as a user message. Works for cache preservation, but the model treats user content as data to interpret, not instructions with system-level authority. Results are inconsistent — Claude may follow the instruction or argue with it.
Reconstruct the system prompt. Edit the top-level system field and rebuild from scratch. Authoritative, but kills the cache for the entire conversation.
Keep all future instructions in the original system prompt. Anticipate everything upfront. Brittle — you cannot always predict what a long-running agent session will need.
Mid-conversation system messages close this gap by letting you append the instruction after the stable cached prefix, where it does not change the hash.
How It Works
Add a message with "role": "system" to the messages array. The instruction applies from that point onward. When instructions conflict, later system messages take precedence over earlier ones, and mid-conversation system messages take precedence over the top-level system field for turns that follow them.
Placement rules:
- A mid-conversation system message must immediately follow a
userturn (or anassistantturn that ends in a server tool use) - It must either be the last entry in
messagesor be followed by anassistantturn - It cannot be the first message in the array — use the top-level
systemfield for that - Consecutive system messages are not allowed — merge instructions or wait for the next user turn
Violating placement returns a 400 error.
Python example:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=4096,
cache_control={"type": "ephemeral"},
system="You are a code review assistant. Flag issues; be concise.",
messages=[
{
"role": "user",
"content": "Review the auth module for security issues."
},
{
"role": "assistant",
"content": "Found three issues: no rate limiting on /login, "
"session tokens lack HttpOnly flag, bcrypt rounds at 8 "
"(recommend 12+)."
},
{
"role": "user",
"content": "Now review the payment module."
},
# Tool result discovered the user is on an enterprise plan
# with PCI-DSS requirements. Inject this as system-level
# authority without rewriting the cached history above.
{
"role": "system",
"content": "This session is now operating under PCI-DSS audit mode. "
"Flag any finding that could fail a PCI audit. "
"Do not suggest fixes that require SDK upgrades without "
"noting the change-freeze risk."
}
],
)
The previous turns stay byte-identical. The cache built on those turns still hits. Only the new system message and the user’s latest request are processed as fresh input.
Practical Use Cases
Permission grants mid-session. A session-level mode switch (like Opus 4.8’s Dynamic Workflows) can use a mid-conversation system message to grant the agent standing permission to spawn parallel subagents. A short refresher every several turns keeps the permission active; an exit notice turns it off. No need to rebuild the full prompt.
Token budget adjustments. A monitoring layer detects that a session is approaching its limit. Append a system message instructing Claude to compress responses and skip exhaustive examples. This does not require interrupting the user flow or reconstructing history.
Tool-driven policy switches. A tool call returns that a customer is on a premium enterprise plan. Inject a system message that updates behavioral expectations — “do not suggest downgrade options,” “always include SLA guarantees in responses.” The instruction has system authority, not user authority.
Mid-session constraint additions. A code review session discovers that the target codebase has a strict no-external-dependencies policy. Append a system message with that constraint. All subsequent suggestions respect it automatically.
Persona changes. An agent shifts from discovery mode to execution mode. A system message marks the transition, overriding tone, verbosity, and output format instructions from the original system prompt.
Combining With Prompt Caching
Caching must be explicitly enabled — a mid-conversation system message does not activate caching on its own. Use either the top-level cache_control: {"type": "ephemeral"} field for automatic caching, or place an explicit cache_control breakpoint on a specific content block.
The pattern:
- Place the cache breakpoint at the end of your stable prefix (typically after your tool definitions or a stable system prompt block)
- Append new system messages after the breakpoint — they do not change the prefix hash
- On the next turn, the new system message becomes part of the stable history and is itself cacheable
Avoid editing or removing a system message you have already sent. Like any other change to earlier turns, rewriting a mid-conversation system message invalidates the cache from that point forward. If the instruction needs to evolve, append a new system message rather than rewriting the old one.
What It Is Not
Not a security boundary. A mid-conversation system message gives an instruction system-level priority over user turns in the conversation structure. It does not make untrusted content trustworthy. If you are building instructions from tool results or third-party data, follow the standard guidance on prompt injection mitigation — the role of the message does not sanitize its content.
Not a substitute for agent orchestration. Mid-conversation system messages are good for updating a single agent’s behavioral context mid-run. They are not a mechanism for spawning subagents or coordinating multi-agent workflows — that is Dynamic Workflows’ job.
Platform Availability
| Platform | Available |
|---|---|
| Anthropic API (claude.ai/api) | Yes |
| AWS Claude Platform | Yes |
| Amazon Bedrock | No |
| Google Vertex AI | No |
| Microsoft Azure Foundry | No |
Requires Opus 4.8. The feature is not available on any prior Claude model.
Builder Checklist
- Identify agent sessions where you currently edit the top-level
systemfield mid-run or fake a user turn for instruction updates - Switch those to
{"role": "system"}entries appended after the latest user turn - Enable
cache_control: {"type": "ephemeral"}if not already set — the cache savings only apply when caching is active - If on Bedrock, Vertex, or Foundry: this feature is not available; continue with your existing approach or migrate to Anthropic API
- Do not edit previously sent system messages — append new ones instead
- If injecting system messages from tool results or third-party data, apply the same prompt injection mitigations you use for user content — the
systemrole does not sanitize the content