From CVE Advisory to Sigma Rule in 30 Minutes — A Repeatable Workflow | Colorful White

Most published CVEs sit in a strange middle ground: the advisory tells you the bug exists, the PoC tells you how to fire it, and the vendor patch tells you what is now blocked. But almost nothing tells you the only thing a defender actually cares about — what does this look like in my logs the morning a real attacker hits us. Writing that detection rule is its own discipline, and over the last year I have hardened a single workflow that turns any public N-day into a shippable Sigma rule and a Suricata signature in roughly half an hour.

This post documents that workflow as I run it today, with one running example (Apache Shiro RememberMe deserialization, CVE-2016-4437) so the steps stay concrete.

Why a four-step pipeline at all

A surprising amount of detection content in the wild is written by people who have never reproduced the bug they are writing about. The cost of that shortcut is two predictable failure modes:

The rule fires on the wrong artifact. Someone writes a Sigma rule that hunts for the string gadget or commons-beanutils in the request body — but the real attack payload is base64-encoded inside a cookie, so the rule never matches anything in production.
The rule misses every real-world variant. The author saw exactly one PoC, hard-coded the exact payload offsets, and now any attacker who pads the payload or swaps the gadget chain trivially evades the rule.

The four-step workflow I will describe exists to make both failure modes structurally hard to reach.

Step 1 — Reproduce, then breakpoint at the parsing boundary

The first move is always: stand up vulhub for the target CVE, drop a breakpoint or strace filter at the boundary where attacker-controlled data first touches application code, and fire the PoC. For Shiro, the boundary is org.apache.shiro.mgt.AbstractRememberMeManager#getRememberedSerializedIdentity() — the moment the rememberMe cookie is base64-decoded, AES-decrypted with the hard-coded key, and handed to the Java deserializer. I do not care about the deserialization gadget chain at this stage; I care about what crosses the wire.

The artifact I am hunting for at this step is the smallest piece of information that is both attacker-controlled and structurally invariant across every payload. For Shiro, that turns out to be a four-byte property:

A non-empty rememberMe cookie whose decoded length is >= 16 and which, after AES-CBC decryption with the default key kPH+bIxk5D2deZiIxcaaaA==, produces bytes that start with the Java serialization magic \xac\xed\x00\x05.

That sentence is the detection target. Notice it does not name a single gadget chain. It does not mention CommonsBeanutils1 or TemplatesImpl or any specific gadget. It describes the vulnerable code path, not any specific exploit of it.

Step 2 — Convert the parsing-boundary fact into a Sigma rule

Sigma rules are usually written against either application logs (Apache access logs, Spring Servlet request traces) or instrumentation (Suricata HTTP keyword extraction). For a Java cookie deserialization bug, the practical signal lives in the access log.

The Sigma rule I ship for CVE-2016-4437 looks like this (abridged):

title: Apache Shiro RememberMe Deserialization Attempt (CVE-2016-4437)
id: 7c5b3c1a-...
status: experimental
description: >
  Detects HTTP requests carrying a rememberMe cookie whose base64-decoded
  length exceeds 16 bytes — a strong indicator of a Shiro RememberMe
  deserialization attempt against an unpatched 1.x server with the default
  AES key. Pair with the network rule that decrypts and checks for the
  Java serialization magic 0xACED.
references:
  - https://nvd.nist.gov/vuln/detail/CVE-2016-4437
logsource:
  product: webserver
  service: access
detection:
  selection_cookie:
    cs-cookie|contains: "rememberMe="
  filter_short_cookie:
    cs-cookie|re: 'rememberMe=[A-Za-z0-9+/=]{1,16}\b'
  condition: selection_cookie and not filter_short_cookie
fields:
  - c-ip
  - cs-uri-stem
  - cs-cookie
falsepositives:
  - Legitimate users with very long rememberMe tokens on patched Shiro 2.x
  - Custom applications reusing the cookie name "rememberMe"
level: medium

Two structural decisions matter here, and they came directly from Step 1:

The detection key is “long base64 in rememberMe cookie,” not any gadget string. Because the parsing boundary in Step 1 is “AES-decrypt and feed to ObjectInputStream,” any payload that successfully exploits the bug must clear the length threshold. Hunting for a specific gadget chain misses 80% of variants.
The false-positive section names the realistic FP cases up front. Legitimate Shiro 2.x deployments with long tokens will trip this rule; the analyst running it needs to know that on day one, not at 2am during an incident.

Step 3 — Pair with a Suricata signature that recovers the Java magic

The Sigma rule above catches the attempt. The Suricata signature catches the successful version of the attempt — by reconstructing the post-decryption bytes and looking for the Java serialization magic header. In practice I ship the signature as a paired rule with a reserved SID range:

alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS $HTTP_PORTS (
  msg:"INDICATOR-OBFUSCATION Apache Shiro 1.x RememberMe deserialization payload";
  flow:to_server,established;
  http.cookie; content:"rememberMe="; nocase;
  http.cookie; content:!"deleteMe"; nocase;
  pcre:"/rememberMe=[A-Za-z0-9+\/=]{32,}/Hi";
  reference:cve,2016-4437;
  classtype:web-application-attack;
  sid:9000016; rev:1;
)

I keep a SIDS.md file in every lab directory recording which SID range is allocated for that CVE; this avoids the slow-motion train wreck of two rules using sid:1 in the same deployment.

Step 4 — Write a hunting query in three SIEM dialects

The rule catches the next attack. The hunting query catches the previous attack you did not know about. I write the hunting query in three dialects so a defender on any major stack can use it without translation:

Splunk SPL:

index=webserver sourcetype=access_combined
  cookie="*rememberMe=*"
| rex field=cookie "rememberMe=(?<rm>[A-Za-z0-9+/=]+)"
| eval rm_len = len(rm)
| where rm_len > 32
| stats count by clientip, cs_uri_stem, rm_len
| sort -count

Microsoft Sentinel KQL:

W3CIISLog
| where csCookie has "rememberMe="
| extend rm = extract(@"rememberMe=([A-Za-z0-9+/=]+)", 1, csCookie)
| where strlen(rm) > 32
| summarize hits = count() by cIP, csUriStem, strlen(rm)
| order by hits desc

**Elastic ES

QL:**

FROM apache-access-*
| WHERE cookie LIKE "%rememberMe=%"
| EVAL rm = REPLACE(cookie, "^.*rememberMe=([A-Za-z0-9+/=]+).*$", "$1")
| WHERE LENGTH(rm) > 32
| STATS hits = COUNT(*) BY client.ip, url.path
| SORT hits DESC

The point of the three-dialect parallel block is not to be exhaustive. It is to remove the most common excuse for not running the hunt: “we are on Sentinel/Elastic, the rule is for Splunk, we will get to it later.”

Why this scales — and the bug class catalogue it produced

When I started this workflow the goal was a single repeatable recipe. The unintended consequence was that the workflow forced me to classify CVEs by their parsing boundary, which turned out to be a far more useful taxonomy than CVSS score or CWE category. The current bucket layout in my labs/ chapter:

Boundary class	Example CVE	Detection key
Cookie-based Java deserialization	CVE-2016-4437 (Shiro)	base64 length + Java magic
HTTP body / OGNL evaluation	CVE-2017-5638 (Struts2), CVE-2022-22965 (Spring4Shell)	`class.module.classLoader` pattern, suspicious OGNL tokens
Logging chain JNDI lookup	CVE-2021-44228 (Log4Shell)	`${jndi:` substring (and obfuscated variants)
Non-HTTP protocol deserialization	CVE-2023-46604 (ActiveMQ OpenWire)	OpenWire frame anomalies, outbound HTTP from broker process
Args4j `@filename` expansion	CVE-2024-23897 (Jenkins CLI)	CLI subcommand seen with `@` prefix in argv
URL parameter command injection	CVE-2019-12725 (ZeroShell)	`kerbynet` path + shell metacharacter in `Action` parameter
Path-parameter authentication bypass	CVE-2024-27198 (TeamCity)	`;` in URL path before `/admin/`

Looking at this table it becomes obvious why generic “any web request with a semicolon” rules are useless — the parsing-boundary semantics matter, and the same character (;) is benign in one bucket and a one-shot auth bypass in another.

What I would change if I were starting today

Two things.

First, I would front-load Suricata HTTP keyword extraction in Step 1. The first time I tried to write a detection from the Shiro PoC, I instrumented the JVM with a Java agent so I could observe ObjectInputStream.readObject() directly. That worked, but a single Suricata sensor in promiscuous mode in front of the lab would have given me the same parsing-boundary intuition in 10 minutes instead of 2 hours.

Second, I would commit the failed Sigma rules as well as the passing ones. The current ctf-notes/labs/ chapter only ships the version that survived. The five-or-six earlier drafts — including the one that hard-coded the CommonsBeanutils1 gadget name and missed every TemplatesImpl variant — would be more educational than the final clean rule. The next refactor of the chapter will add an attempts/ subdirectory for those.

If you want the source rules, IOC tables, and SIEM-dialect parallel hunting queries for any of the CVEs in the table above, they live in the matching subdirectory under github.com/1392081456/ctf-notes/labs. Every writeup follows the same four-step structure documented above.