MiniMiggy

A lightweight, efficient language model built with TinyGrad

MiniMiggy Implementation
python@terminal:~$ python tiny_llm.py

Lightning Fast

Optimized architecture with Flash Attention for efficient training and inference

Lightweight

Compact 5K vocabulary and 64-dimensional embeddings for minimal footprint

Easy to Use

Simple API with straightforward training and inference pipelines

Documentation

v1.0.0

Architecture

  • 4-layer transformer
  • 4-head attention
  • 128-dim embeddings
  • 50K GPT2 vocabulary

Performance

  • 64 token context window
  • 8 batch size
  • TinyGrad backend
  • Adam optimizer

Key Features

Modern Architecture

Multi-head attention with layer normalization and residual connections

Tokenization

GPT2 BPE tokenizer using TikToken for efficient text processing

Training

Configurable training with warmup and evaluation intervals

Quick Start

quickstart.py
# Initialize model with default config
from minimiggy import GPT, GPTConfig, BPETokenizer

model = GPT(GPTConfig())
tokenizer = BPETokenizer()

# Generate text
context = "ROMEO:"
tokens = tokenizer.encode(context)
output = model.generate(tokens, max_tokens=100)
print(tokenizer.decode(output))

Advanced Usage

train.py
from minimiggy import TrainConfig, train

train_config = TrainConfig(
    batch_size=8,
    learning_rate=1e-3,
    warmup_steps=100,
    max_iters=1000,
    eval_interval=100
)

train(model, train_config, train_data)
config.py
@dataclass
class GPTConfig:
    block_size: int = 64
    vocab_size: int = 50257
    n_layer: int = 4
    n_head: int = 4
    n_embd: int = 128