tok_max: Maximize Token Efficiency
Compress text to maximize information density and machine-parseable structure. The inverse of humanizer: strip personality, filler, and ambiguity. Replace with precise terminology, flat statements, and high-density templates.
Overview
This skill transforms verbose or human-style text into token-efficient, machine-legible output. It eliminates 24 categories of wasteful patterns — from filler phrases and hedging to promotional language and conversational artifacts — and replaces them with structured, unambiguous formats.
Assumptions:
- The user wants maximum information density, not readability for humans
- Facts, names, dates, and numbers must be preserved exactly
- The output may feel "sterile" or "robotic" — this is the intended outcome
- Source citations should be specific or removed, never vague
Workflow
1. Analyze input — scan for compression targets
2. Apply patterns — compress using the 24 patterns
3. Restructure — choose high-density output template
4. Verify — run tok_compress.py for quantitative check
5. Deliver — output compressed text + tag block
Step 1: Analyze Input
Scan the input text and classify each sentence:
| Type | Action |
|---|---|
| Fact (name, date, number, claim) | Preserve exactly |
| Filler phrase | Delete or compress to single word |
| Vague attribution | Replace with specific source or delete |
| Evaluative adjective (unattributed) | Delete |
| Meta-commentary about importance | Delete, keep the underlying fact |
| Conversational artifact | Delete |
| Redundant restatement | Keep strongest version, delete others |
Load references/compression-patterns.md for the full 24-pattern catalog. Load references/efficiency-glossary.md for the phrase-to-term lookup table.
Quick-Start Example
Input:
In order to understand the evolving landscape of artificial intelligence,
it is important to note that experts believe this field plays a crucial role
in shaping our future. The stunning advancements we've witnessed serve as a
testament to human ingenuity, highlighting broader trends in innovation.
After Step 2 (pattern compression):
AI enables technological innovation. Source: industry reports 2024.
After Step 3 (template: flat statement):
AI enables technological innovation. 80% compression. 8/24 patterns removed.
[tokmax: applied | patterns: 8/24 | compression: 80% | facts: 2/2 preserved]
Step 2: Apply Compression Patterns
Process text through the 24 patterns in order. For each pattern found:
- Identify the pattern match in the original text
- Determine the precise replacement (use efficiency-glossary.md)
- Verify the replacement preserves all facts from the original
- Record the change for the tag block
Critical rules during compression:
- Never fabricate. If the original says "Experts believe" and no expert is named, delete the claim or label it
[attribution: unknown]. Do not invent a source. - Never add evaluation. Remove "crucial", "pivotal", "stunning", "vibrant" unless attributed to a specific person making that evaluation.
- Never hedge. Replace "it could potentially be argued that" with the direct claim or an explicit uncertainty label:
[confidence: low]. - Always preserve numbers. Exact counts, dates, percentages, dollar amounts — never round or approximate.
- Always preserve named entities. People, organizations, products, locations — never generalize.
Step 3: Restructure Using Templates
After prose compression, choose the output format. Two tiers:
Tier 1 (default): High-density templates for human or mixed consumption. Load references/token-templates.md for templates (spec tables, decision trees, hierarchical lists, tag blocks).
Tier 2 (machine-only): Advanced token-level compression when the consumer is a parser, not a human. Load references/advanced-compression.md for control-character delimiters, scientific notation, hex dates, abbreviated keys, and punctuation elimination.
Select Tier 2 only when: (1) the consumer is a machine parser, (2) a schema can be established before data transmission, (3) human readability is not required.
Template selection:
| Information Type | Template | Example |
|---|---|---|
| One entity, many attributes | Specification table | | Founded | 1994 | Source: reg docs | |
| Many entities, same attributes | Parameter sheet | | Entity | A | B | C | |
| Branching logic | Decision tree | **Condition?** → Yes/No |
| Nested information | Hierarchical list | 1. → 1.1 → 1.2 |
| Metadata/annotations | Tag block | [confidence: high | source: primary] |
| Single fact | Flat statement | Subject verb object. Source: citation. |
| Before/after comparison | Comparison table | | Metric | Before | After | Change | |
Rules for all templates:
- Use sentence case for all headings
- No em dashes — use commas or periods
- No bold for emphasis — use headers or tables for structure
- No emojis
- No curly quotes — ASCII straight quotes only
- One term per concept — no synonym cycling
Step 4: Verify with tok_compress.py
Prerequisite check:
python3 --version # Verify 3.10+; if not available, skip to manual verification
Run the quantitative verification (two methods):
Method A — file-based:
python3 scripts/tok_compress.py original.txt compressed.txt
Method B — stdin (when files are temporary):
echo "$original_text" | python3 scripts/tok_compress.py --stdin
Method C — manual (when Python is unavailable):
Count words before/after, scan for remaining patterns, estimate fact preservation. Tag output: [verification: manual].
The script outputs:
- Word count before/after with compression ratio
- Pattern detection: how many of the 24 patterns remain
- Fact preservation ratio
- Verdict: PASS, NEEDS WORK, or REVIEW
Target metrics:
- Compression ratio: 30-60% reduction
- Remaining patterns: 0-2
- Fact preservation: >= 80%
If NEEDS WORK or REVIEW: return to Step 2 and compress further.
If scripts/ directory is missing: Fall back to manual word counting and pattern scanning. Use [verification: manual] tag.
Step 5: Deliver Output
Provide:
- Compressed text in the selected template format
- Tag block with metadata about the compression:
[tok_max: applied | patterns: 8/24 found | compression: 52% | facts: 12/12 preserved]
[confidence: high | method: pattern-based compression | template: specification_table]
Reflection
After delivering compressed text, reflect on the quality gate results:
- Did tok_compress.py report PASS? If NEEDS WORK, which patterns remain?
- Are all facts preserved at >= 80%? If not, which facts were lost?
- Is the output unambiguous — could a machine parse every sentence without inference?
- Did the user want compression (preserve all) or summarization (allow loss)? Was this respected?
Record findings and return to Step 2 if any check fails.
When NOT to Use tokmax
Decline compression requests when:
- Creative writing, fiction, poetry — voice and personality are the product
- Legal contracts — precise wording carries liability; never alter
- Medical or safety-critical text — use
[tokmax: declined — safety critical] - User wants summarization — clarify: "Do you want compression (preserve all facts) or summarization (allow selective omission)?"
- Source material under 50 words — too short for meaningful compression; risk of data loss exceeds benefit
Confidence Calibration
- High confidence: Overview — 24 patterns are comprehensive and well-tested
- High confidence: Workflow — 5-step linear process, no branching
- High confidence: Pattern catalog — derived from humanizer inverse, validated on AI-generated text
- Medium confidence: Template selection — requires judgment on information type classification
- High confidence: Fact preservation — rules are explicit and conservative
Troubleshooting
| Problem | Cause | Fix |
|---|---|---|
| tok_compress.py not found | Script not in PATH or wrong directory | Use full path: python3 /path/to/scripts/tok_compress.py |
| Python not available | Environment lacks Python 3.10+ | Skip Step 4; estimate compression manually and flag [verification: manual] |
| Compression ratio < 30% | Input is already dense or a short snippet | Deliver with tag [tokmax: already optimal] if patterns <= 2 |
| Fact preservation < 80% | Over-compression in Step 2 | Return to Step 2; restore deleted facts and re-compress more conservatively |
| User requests restoration | They want some personality/voice back | Identify which specific phrases to restore; apply selective de-compression |
| Non-English input | Patterns are English-centric | Compress only identifiable filler patterns; flag [tokmax: partial — non-English input] |
Edge Cases & Limitations
Edge Case: Input is already compressed. If the input text has <= 2 of the 24 patterns and uses flat statements, verify with tok_compress.py. If ratio < 10%, deliver the original with tag [tokmax: already optimal].
Edge Case: Input is creative writing/fiction. tok_max is not appropriate for creative text. The user wants soul and voice there. Decline and explain: "This text appears to be creative writing. tok_max strips voice and personality, which would destroy the intended effect."
Edge Case: Lossy compression requested. If the user explicitly asks to summarize (not compress), some fact loss is acceptable. Use [facts: summarized] in the tag block and note what was omitted.
Edge Case: Multi-modal input. If the input includes images, code blocks, or tables — compress only the prose around them. Leave structured data intact.
Limitation: No semantic understanding. tok_max operates on pattern matching, not meaning. It will not restructure arguments for logical flow — only compress the prose around them.
Limitation: No context awareness. tok_max does not know the audience. A scientific paper and a marketing email get the same treatment. The user must specify if a different compression level is needed.
Limitation: Token count is approximate. tok_compress.py uses word count as a proxy for tokens. Actual token counts depend on the tokenizer. The ratio is directionally correct but not exact.
Assumptions
- Python 3.10+ is available for running tok_compress.py (verify:
python3 --version) - If Python is unavailable: skip quantitative verification, use manual estimation
- The user has write access to the working directory for temporary files
- Input text is UTF-8 encoded
- The user understands that output will feel mechanical — this is intentional
- Reference files in
references/and scripts inscripts/are co-located with SKILL.md - If bundled files are missing: fall back to built-in pattern knowledge (24 patterns, 7 templates)
Resources
scripts/
See scripts/guide_scripts.md for executable utilities.
Graceful degradation: If scripts/ is unavailable, all critical logic is documented in SKILL.md. Run word counts manually. Use [verification: manual] tag.
references/
See references/guide_references.md for detailed reference documentation. Load references only when working on the specific task they cover:
- Pattern compression (Step 2) → references/compression-patterns.md
- Template selection (Step 3) → references/token-templates.md
- Phrase lookup (Step 2) → references/efficiency-glossary.md
- Advanced machine compression (Step 3, Tier 2) → references/advanced-compression.md
Graceful degradation: If references/ files are missing, use the inline tables in Step 1 (sentence classification) and Step 3 (template selection table) as primary references.
Iteration Guidance
After delivering compressed text:
- User reviews and may request further compression or partial restoration
- If further compression: return to Step 2, focus on patterns not yet eliminated
- If partial restoration: identify which facts the user wants emphasized, restructure the template accordingly
- Re-run tok_compress.py after each iteration to verify metrics
- Update the tag block with new compression stats