SMART EGG Engine

Technical analysis of the text adventure engine: compression, memory layout, parser, and Part 2 overlay architecture.

Contents

  1. Engine Overview
  2. Memory Layout
  3. Text Compression Pipeline
  4. Compression Tables
  5. Parser Architecture
  6. Part 2 Overlay Architecture
  7. Key Code Addresses

Engine Overview

The SMART EGG engine is a reusable, data-driven text adventure system for the Commodore 64. It was built by Said Hassan of Smart Egg Software and supports a complete adventure game infrastructure: a command parser, inventory system, room navigation, scoring, save/load, colored text output, and a sophisticated three-tier text compression scheme.

Evidence that this was a product, not a one-off:

~7KB Engine Code
~10KB Compressed Text
36.3% Compression Savings
1.57x Expansion Ratio

Memory Layout (C64, 64KB)

Address RangeSizePurpose
$0000-$00FF256BZero page (CPU registers, pointers)
$0100-$01FF256BStack
$0200-$033F320BSystem variables
$0334-$03FF204BDecompressor workspace
$0400-$07FF1KBGame state / variables
$0800-$0BFF1KBGame data tables (header, pointers)
$0C00-$0FFF1KBScreen RAM (40x25 characters)
$1000-$1FFF4KBGame logic / verb handlers
$2000-$27FF2KBCustom character set (VIC-II)
$2800-$4E70~10KBRoom data / object data / action handlers
$4E71-$75D8~10KBCompressed text data (actions/descriptions)
$75F1-$77EE510BAction text pointer table (255 entries x 2 bytes)
$77EF-$7982~400BCompressed text data (rooms/system messages)
$7983-$79C870BRoom text pointer table (35 entries x 2 bytes)
$8000-$8603~1.5KBGame engine code (parser, interpreter)
$8604-$86E5226BText compression tables
$880E-$8891132BText decompression handler
$8B60-$8E3C~700BGame entry point + VIC-II setup
$8E3D-$9781~2.4KBGame engine (input, parser, game loop)
$9782-$97A334BText print loop
$9928-$997780BSystem messages
$9AE1-$9BFC~280BScreen output + control codes
$9B2E-$9B3710B"SMART EGG!" watermark
$CC00-$CFFF1KBText buffer
$D000-$DFFF4KBI/O registers (VIC-II, SID, CIA)

Text Compression Pipeline

Every compressed byte is processed by the handler at $880E. The pipeline works in three stages:

Raw text bytes
      |
      v
  EOR #$FF (XOR with $FF, invert all bits)
      |
      v
  Range check inverted value:
      |
      +-- $01-$5F: SINGLE CHARACTER
      |      Index into 96-entry character table at $8604
      |      Maps to PETSCII codes (A-Z, a-z, digits, symbols, colors)
      |
      +-- $60-$EF: DIGRAM (character pair)
      |      Subtract $60, lookup in 12x12 grid
      |      Grid uses 12 most frequent chars: SPACE,E,A,I,S,R,T,N,O,L,D,C
      |      Two characters emitted per byte (2:1 compression)
      |
      +-- $F0-$FF: DICTIONARY WORD
      |      One of 16 common words (WERE,WAS,WITH,AND,AS,THE,THAT,TO,
      |      FOR,OF,IN,HARPER,NORTH,SOUTH,EAST,WEST)
      |      Multi-character expansion per byte
      |
      +-- $FF (original $00): TERMINATOR
             End of text block

How It Works in 6502

$880E: LDA ($40),Y     ; load compressed byte from pointer at $40/$41
$8810: INC $41          ; increment source pointer
$8812: BNE +2
$8814: INC $42
$881F: EOR #$FF         ; XOR with $FF (invert all bits)
$8821: BEQ done         ; if zero (was $FF), text ends
$8823: CMP #$60         ; compare with $60
$8825: BCC single_char  ; below $60? single character
$8827: CMP #$F0         ; compare with $F0
$8829: BCS dictionary   ; $F0 or above? dictionary word
       ; else: digram ($60-$EF)

Compression Tables

Character Table (96 entries at $8604)

Maps compression indices ($01-$5F) to PETSCII codes. The table includes uppercase and lowercase letters, digits, punctuation, spaces, and PETSCII control codes for color changes (white, cyan, yellow, green, purple, pink). The ordering is optimized by frequency for the digram encoding.

Digram Characters (12 entries at $86CC)

The 12 most frequent characters, used to build the 12x12 digram grid (144 two-character pairs):

IndexCharacterWhy it's here
0SPACEMost frequent character in any text
1EMost frequent letter in English
2ASecond most frequent vowel
3ICommon pronoun character
4SPlural marker, common in descriptions
5R"HARPER" contains R twice
6TCommon in articles (THE, THAT, TO)
7NCommon in NORTH, AND, IN
8OCommon vowel
9LCommon in descriptions
10DPast tense marker, DOOR, DARK
11CCOULD, CLOSED, directional words

Dictionary Words (16 entries at $8684)

IndexByte RangeWordCategory
0$F0WEREFunction word
1$F1WASFunction word
2$F2WITHFunction word
3$F3ANDFunction word
4$F4ASFunction word
5$F5THEFunction word
6$F6THATFunction word
7$F7TOFunction word
8$F8FORFunction word
9$F9OFFunction word
10$FAINFunction word
11$FBHARPERGame-specific (protagonist)
12$FCNORTHGame-specific (navigation)
13$FDSOUTHGame-specific (navigation)
14$FEEASTGame-specific (navigation)
15$FFWESTGame-specific (navigation)

The presence of HARPER and the four compass directions alongside THE and AND proves that frequency analysis was run against this specific game's text, not a generic English corpus.

Parser Architecture

Input Buffer ($0200)
      |
      v
  Tokenizer
  - Strips leading spaces
  - Identifies verb (first word)
  - Identifies noun/object (remaining words)
      |
      v
  Verb Lookup
  - Searches verb table for match
  - Returns verb handler address
      |
      v
  Noun Resolution
  - Searches object table for match
  - Checks visibility (in room or inventory)
  - Returns object ID
      |
      v
  Action Dispatch
  - Calls verb handler with object
  - Handler checks preconditions
  - Modifies game state
  - Triggers text output
      |
      v
  Room Update
  - Checks for state changes
  - Runs room-specific event handlers
  - Updates score if applicable
  - Displays room description if moved

Vocabulary Structure

The vocabulary table at $7E04 uses a 5-byte record format where each byte is a compressed SMART EGG text byte. The 5th byte is a token/category ID. Words are truncated to 4 characters for matching, consistent with the common text adventure convention (e.g., "EXAM" matches both "EXAMINE" and "EXAMINATION").

The table contains roughly 200 vocabulary entries covering movement verbs, object interaction verbs, examination verbs, physical action verbs, system commands, and special keywords.

Part 2 Overlay Architecture

Part 1 loads completely:
  Engine code ($8604-$9BFC)  +  Game data ($0800-$7982)

Part 2 is a DATA OVERLAY:
  - Same engine code preserved in memory
  - New game data overlaid onto $0800-$7982
  - New text blocks in $2D00-$5000
  - New object table at $2D00-$2FA0
  - New pointer tables
  - Title screen: "LOAD YOUR DATA FROM PART I"

What the Overlay Preserves

ComponentAddressStatus
Character table$8604Overwritten (Part 2 has different tables)
Digram tables$86CC-$86E3Overwritten (Part 2 has different tables)
Dictionary$8663-$86CBOverwritten (Part 2 has different tables)
Text handler$880EAlgorithm preserved, tables differ
Print loop$9782Preserved
Screen output$9AE1-$9BFCPreserved

Key finding: Part 1 and Part 2 use completely different compression tables. All 96 character table entries differ, all 12 digram characters differ, and 71 of 72 dictionary bytes differ. The SMART EGG compiler regenerates optimal tables for each part based on text content.

Key Code Addresses

AddressFunction
$8B60Game entry point (SEI, VIC-II setup, NMI handler)
$8E3DGame initialization
$9782Text print loop (reads compressed bytes, calls $880E)
$880EPer-byte text handler (EOR #$FF, dispatch to tier)
$886ECharacter output (char table lookup, screen write)
$9AE1Screen output (PETSCII to screen code conversion)
$9B07Control code dispatcher (colors, formatting)
$9BBCPrintable character handler (writes to screen RAM at $0C00)
$9BE9Color RAM calculation (adds $0C to high byte for $D800 offset)
$9461SAVE handler (JSR $FFD8, KERNAL SAVE)
$94D1LOAD handler (JSR $FFD5, KERNAL LOAD)
$9B2E"SMART EGG!" watermark (10 bytes, never executed)

Game Data Header ($0800)

AddressPointerDestination
$0811$75F1Action text pointer table
$0813$7983Room text pointer table
$0815$7C06Object/state table
$0817$7E04Vocabulary/parser table
$0819$84ADGame logic table
$081B$8518Extended data table