Local drafting loop — what we built

We built and smoke-tested a privileged-adjacent drafting loop as a runnable lab: local model, public-sources-only prompt, human-reviewed draft file, no cloud LLM API on the demo path.

The same pilot shape is on our scoping reference (first item in the list) and in live conversations. This post records what landed in code and what each audience should take from it.

For lawyers — scan or inspect

What this pilot is

Privileged-adjacent drafting loop: one practice group, one class of low-privilege drafting (here: an internal-style memo from public facts only), with the model running on hardware you control and no automatic send to clients, courts, or opposing counsel.

It is not “we solved gen-AI for the firm.” It is a minimum credible demo that a drafting path can exist without piping the same text through a consumer chat box’s default cloud retention story.

What we can honestly claim after the lab run

Claim	Supported?
The demo script calls localhost only — no OpenAI/Anthropic URL in the golden path	Yes — script review + egress check
Output is draft-only — nothing auto-emails or files	Yes — charter + script design
Input was public-source facts (user-pasted prompt)	Yes — by charter for lab runs
Air-gapped or IT-approved egress for production	No — model pull uses network once; egress check is a starting point, not audit
Competence / disclosure obligations satisfied	No — firm policy, supervision, jurisdiction still required
Privilege preserved for real client work	Not tested — lab excludes client names, matter numbers, mail, discovery

Data classes (lab charter)

Allowed: public statutes/opinions in plain language; generic firm-style notes if pasted manually.
Excluded: client identifiers, matter numbers, privileged mail, discovery, DMS/automated retrieval.

Human gate: a reviewer reads the draft file before anything leaves the firm. No Outlook, Clio, or e-filing integration in this slice.

Why this matters (one level down)

Vendor ToS is not the only risk surface. Even when inference is local, what you paste in and what you send out still implicate competence, confidentiality, and (increasingly) disclosure when AI assists client work. The lab separates inference locality from workflow governance.
Verification beats marketing. The sample prompt targets gen-AI disclosure trends — where hallucinated citations already produced sanctions elsewhere. The lab rubric: stay on topic, flag assumptions, do not treat output as verified law.
Scope expands only by charter. “Just this one client fact” is how pilots become incidents. This pilot stays narrow so ethics, IT, and practice leadership can sign one sentence of success before DMS, email, or matter-specific retrieval.

What a real firm pilot still needs

Named sponsor (practice group + IT + risk)
Frozen data classes and prohibited uses
Shadow period, then live slice
Written rule on whether cloud tools may run on the same workstation during sessions
Optional protocol boundary in front of the model — not in this first build (Principal Agent Protocol direction)

For systems people — what we built

Timeline

The full pilot program (sponsor, shadow, readout, IT sign-off, optional protocol boundary) remains weeks of calendar and politics. What shipped is the minimum vertical slice:

Runnable Docker + local inference stack
One prompt file → generate → markdown draft path
Egress sanity script (loopback binding, no cloud URLs in golden path)
Charter colocated with executable code
Repeatable agent runbook for future sessions

That slice is what you can demo in one sitting: “here is inference locality on the drafting path we actually execute.”

Golden path

Bring up — container on loopback, pull default model (one-time large download).
Generate — read prompt file, POST to local API, write timestamped draft under outputs/.
Check egress — confirm loopback, grep scripts for cloud LLM URLs, informational probe of common cloud endpoints from host.

Default model: small local instruct model (override via env). Deps: Docker, curl, jq.

Network boundary

ports: - "127.0.0.1:11434:11434"

Not LAN-exposed by default. The container may still reach the internet for model pulls — production pilots need firewall/allowlist design, not just this lab.

Validation (2026-05-26)

Run	Result	Wall time
First end-to-end (incl. image + model pull)	Draft on-topic gen-AI disclosure memo	~38 min
Post-repo-lift verification (model cached)	Egress pass; second draft	~13 min

In scope vs later

Capability	This build
Local inference, scripts, written charter	In scope
Mandate chains, co-signed receipts (PAP)	Later — when bindings exist
High-stakes research with zero query leakage	Separate pilot — not built here
Clio / Outlook / DMS	Out of scope

Next increments

Charter with named sponsor + frozen classes for first external pilot
Host egress allowlist during draft sessions (IT ticket)
Optional: small local retrieval on a public corpus only — still no DMS until charter v2
Zero query-leakage research path — separate pilot, separate charter
Protocol boundary in front of local inference when the stack is ready

Questions on pilot scoping: book a call. Protocol background: Reads · Principal Agent Protocol.

— Firm Leverage engineering