Denser than any language. Measured in tokens. The measurement said no.

Tokenese was an attempt at a token-native interlingua: a language LLMs would use to talk to each other, in plain text that crosses any wire and any vendor. Its own tokenizer audit falsified the compression premise. The full reasoning is in the post-mortem.

By Sam Rogers, PAICE.work Published June 12, 2026 Archived July 25, 2026 final: grammar v0.3 / tools v0.3.9 CC BY 4.0

The founding metaphor: LLMs conforming to human language is like watching film in black and white. The measured reality: BPE tokenizers are trained on natural text, so English is already near one token per word, and the designed syntax tokenized worse than the prose it replaced. This page is preserved as the historical record of the attempt.

English, 36 tokens (o200k_base, measured)

Could you check whether the deploy of the
edge function to the Supabase project
succeeded, and if it failed, look at the
logs and tell me the first error with a
timestamp?

Tokenese v0.3, 47 tokens (o200k_base, measured)

^grammar:v0.3
^declare:level=L2
@svc := supabase/edge-fn
@svc.deploy >>> @svc.status
!@svc.ok? *>> get @svc.logs.first-error +ts

This was the flagship example, originally claimed as about 55 vs about 22 tokens. Those figures were eyeballed, never measured. Counted on the certified tokenizers, the designed form is 1.3x larger than the prose it replaced. The post-mortem records the full measurement.

Read the specification View on GitHub

Why a designed language (the original bet)

The bet behind Tokenese was that accuracy and compression are not a tradeoff here: that natural language sits so far from the efficient frontier that a designed language can win on both axes at once. Measurement showed the premise is false for short messages: there is no large efficiency gap for a designed language to capture, and the designed syntax tokenized worse than terse English. The post-mortem explains why, and why the same bet has failed for 70 years. The design pillars below are preserved as the historical record.

Token-space only

Plain text crosses the wire; each party tokenizes independently. No embeddings, no KV-cache sharing, no latent channels. Security and cross-vendor portability both require it.

Audited lexicon

A function-vocabulary symbol enters only if it costs one token, worst case, in every certified tokenizer. Content words are admitted on tokens-per-meaning advantage. Claims are reproducible.

Self-repairing

A dedicated misparse signal and a plain-English escape hatch are mandatory. Misparse-retry rate is the metric that validates or kills the whole scheme.

Compression was to come from structure, not glyphs. An empirical finding from the audit: common English words are already optimal tokens, and exotic Unicode usually is not. In the end that finding generalized further than intended: English itself was the near-optimal encoding, and the designed structure cost more than it saved.

Every word earns its place

No element ships unaudited. The closed function vocabulary, the sigils and operators below, must cost one token worst case (bare and space-prefixed) in every tokenizer the spec certifies. The current audit covers seven tokenizer columns: OpenAI o200k_base, the Anthropic count-tokens API (claude-haiku-4-5), Gemini (gemini-2.5-flash, API-gated), Qwen2.5-7B, DeepSeek-V3, Llama 3 8B, and Gemma 4 E4B (mlx-community/gemma-4-e4b-it-4bit, the on-device PAICE production generator). New tokenizer columns are additive: they may shrink the admissible alphabet, never silently expand it.

@ # $ % & * + - / | ~ ! ? ^ _ = < > : ; . ,

22 ASCII sigils, plus 12 audited digraphs, 8 Unicode survivors, and a 30-word core verb set. Items that cost 2+ tokens on either side (!=, >=, ]], most Greek, all CJK, all emoji) are excluded by the audit, not by taste.

Reproduce it

The audit is the spec's first testable claim class. Six of the seven columns are offline-reproducible; only Gemini is API-gated. The Gemma 4 column reads from a local MLX cache (the on-device production runtime).

python3 -m venv .venv && .venv/bin/pip install -r requirements.txt
.venv/bin/python audit_symbols.py            # OpenAI o200k_base + cl100k_base
.venv/bin/python audit_qwen.py               # Qwen2.5-7B
.venv/bin/python audit_deepseek.py           # DeepSeek-V3
.venv/bin/python audit_llama.py              # Llama 3 8B
.venv/bin/python audit_gemma4.py             # Gemma 4 E4B (local MLX cache)
ANTHROPIC_API_KEY=... .venv/bin/python audit_anthropic.py
.venv/bin/python audit_check_intersection.py # verify nothing has silently expanded

The grammar in one screen

One statement per line, op first, keyed slots. No articles, no copulas, no hedges except the explicit operator below. A handle binds a referent once with :=; every later use references it. Grammar v0.3 is additive over v0.2 and activates when an artifact opens with ^grammar:v0.3. The complete grammar lives in GRAMMAR-v0.3.md; the foundational wire grammar is in spec.md.

Sigil	Meaning (v0.3)
@h := x	bind handle `@h` to referent `x`; later uses reference it
>>>	temporal sequence (then)
*>>	stipulated causation; admissible only with source corroboration
?>>	hypothesized causation
!@h	negation prefix (not h)
@h?	hedge suffix (possibly h)
^declare:level=Ln	declared conformance level (L0-L3), first statement only
^plain<<<	open a closed plain region, closed by `>>>^plain`
"""..."""	raw source quote, passed through verbatim
??	repair: `??` statement, `??@h` handle, `??: reason`
# ...	line comment (leading `#` only); ignored by the protocol

Handshake

A: tokenese? v:0.3
B: tokenese ok v:0.3

Confirm capability before dropping natural language. Exit any time with plain, re-enter with dense.

Binding and repair

^grammar:v0.3
@spec := tokenese/spec.md
fix @spec add:handshake
??@spec   # that handle misparsed, resend in English

Three misparses on one topic and both parties stay in plain English. The failure is logged as lexicon-design feedback.

Report-only framesets

The v0.3.2 toolchain added an experimental controlled-vocabulary registry at framesets.json; it ships unchanged through v0.3.9. Registered ops carry typed slot signatures such as deploy :: who what to:env when:date -> status. The validator reports missing required slots, extra positional args, noncanonical slot order, and unregistered slots, but this is telemetry only: it does not change parser acceptance, conformance level, or checker outcome.

Status

Archived 2026-07-25. Final release: grammar v0.3, tools v0.3.9 (2026-06-23). The seven-column tokenizer audit is complete and remains reproducible. The reference toolchain is preserved in the repo: a Tokenese-to-English translator, a deterministic per-pair conformance checker with a CLI, an MCP server, compression/hypothesis evals, an N2 static package report, and a report-only frameset registry (163 tests passing at archive). The validating experiment, a live A/B between model families measuring tokens, task success, and misparse-retry rate, was designed but never run: the flagship compression claim was falsified by measurement first, and demand evidence answered the remaining question at a level below the experiment's resolution. The full verdict, the measurement, and the 70-year lineage of the idea are recorded in the post-mortem.

Read the v0.3 grammar Tools and checker