1. Background and Goals
Build a joy license plate character recognition system as a vehicle for learning how the Softmax classifier works. The system covers the pipeline from model training to real-plate inference using PyTorch.
2. System Architecture
Real license plate photo
│
▼
[Cell 1] Plate region detection ← EasyOCR Group-by-Y strategy
│
▼
[Cell 2] Skew correction + segmentation ← Hough lines + vertical projection
│
▼
[Cell 3] Synthetic training set ← DIN/FE-Schrift fonts + augmentation
│
▼
[Cell 4] Train printed-font model ← PrintedCharNet (MLP + BN + Dropout)
│
▼
[Cell 5] Domain adaptation ← Few-shot fine-tuning on real patches
│
▼
[Cell 6] TTA inference + evaluation ← Average 30 perturbed Softmax outputs
3. Algorithm Analysis
3.1 Softmax Function
Definition:
Converts raw logits into a probability distribution — all outputs are positive and sum to 1. Used to express the model's confidence over 36 character classes.
Temperature scaling:
| Temperature | Effect |
|---|---|
| Collapses to argmax — maximum confidence | |
| Standard softmax | |
| Approaches uniform distribution — maximum uncertainty |
3.2 PyTorch Model — PrintedCharNet
Architecture:
Input (784) → Linear(784→512) → BN → ReLU → Dropout(0.3)
→ Linear(512→512) → BN → ReLU → Dropout(0.3)
→ Linear(512→36)
→ [Softmax applied by loss / at inference]
class PrintedCharNet(nn.Module):
def __init__(self, hid=512, out=36):
super().__init__()
self.net = nn.Sequential(
nn.Linear(784, hid), nn.BatchNorm1d(hid), nn.ReLU(), nn.Dropout(0.3),
nn.Linear(hid, hid), nn.BatchNorm1d(hid), nn.ReLU(), nn.Dropout(0.3),
nn.Linear(hid, out)
# NO nn.Softmax here
)
Why no nn.Softmax in forward():
nn.CrossEntropyLoss fuses log_softmax + NLLLoss in a single numerically stable operation. Computing softmax separately and then taking its log loses floating-point precision. Softmax is only applied explicitly at inference time:
probs = torch.softmax(model(x), dim=1) # inference only
pred = probs.argmax(dim=1)
He initialization (applied to all nn.Linear layers):
Keeps activation variance stable through ReLU layers, preventing vanishing/exploding gradients.
Training loop:
optimizer.zero_grad()
logits = model(X) # raw scores — no softmax
loss = criterion(logits, y) # CrossEntropyLoss applies softmax internally
loss.backward() # autograd computes all gradients
optimizer.step()
3.4 Plate Detection — Group-by-Y Strategy
Motivation: Color/edge-based detection fires on tail lights, logos, and shadows. A character-first approach finds the text line directly.
Algorithm:
Step 1 — EasyOCR full-image scan. Collect all (bbox, text, confidence) results.
Step 2 — Group boxes into horizontal bands. Two boxes belong to the same band if their vertical centres are within max(char_height × 0.8, 30px) of each other.
Step 3 — Score each band:
score = 5.0 × has_washington_text # "Washington" / "WASH*", conf > 0.15
+ 4.0 × has_plate_alnum # 4–8 alphanumeric chars, conf > 0.25
+ 2.0 × count_score # peaks at 7 chars, decays away from it
Step 4 — Dynamic Plate Expansion. Take the union bounding box of only the key tokens (Washington + plate text), not the full band. This excludes unrelated text (e.g., dealer stickers) that happens to sit at the same height.
Step 5 — Proportional padding:
vert_pad = max(14, char_height × 0.4)
horiz_pad = max(14, char_height × 0.6)
Why it works: A Washington state plate is the only location in the image where "Washington" co-occurs with a 7-character alphanumeric string.
Observed results:
| Image | Truth | Selected score | Detected bbox |
|---|---|---|---|
| IMG_1474.jpg | CGL2439 | 9.0 | y=325–379 x=95–209 ✓ |
| IMG_6478.jpg | AWC6290 | 4.0 | Correct region ✓ |
3.5 Skew Correction — Hough Line Detection
Problem with the original approach: Sweeping -8° to +8° and maximising projection variance selected -8° (the boundary of the search range) — clearly wrong, since the optimiser hit the wall rather than finding a true peak.
Hough-based method:
# 1. Threshold plate crop to binary
binary = (plate < local_mean - 15) * 255
# 2. Detect line segments
lines = cv2.HoughLinesP(binary, rho=1, theta=π/180,
threshold=20,
minLineLength=plate_width // 6,
maxLineGap=plate_width // 8)
# 3. Keep only near-horizontal lines (|angle| < 15°)
near_horiz = [angle for line in lines if abs(angle) < 15]
# 4. Correction = negative median (robust to outlier lines)
correction = -median(near_horiz)
Decision logic:
|correction| ≥ 0.5° → rotate (meaningful skew)
|correction| < 0.5° → skip (plate is already straight)
Hough finds no lines → fall back to two-pass projection sweep
(coarse 1° steps, then fine 0.25° around best)
Observed on IMG_1474.jpg: Hough detected 52 near-horizontal lines, median angle 2.72°. The old projection method would have applied -8° instead.
3.6 Character Segmentation — Vertical Projection Histogram
col_proj = binary.sum(axis=0) # ink pixel count per column
ink_thresh = col_proj.max() * 0.10 # 10% of peak = character boundary
Contiguous columns above threshold form a run (one character segment). Wide runs (width > 1.5 × expected character width) are split at local minima:
valleys, _ = find_peaks(-run_proj, distance=expected_w * 0.5)
# if not enough valleys, fall back to equal-width splits
Character bounding boxes are then tightened vertically to the actual ink rows within each column slice.
3.7 Synthetic Dataset Generation
Training data for Section 10 is rendered programmatically to match the target font (DIN 1451 / FE-Schrift):
Font pool (in priority order):
FE-Schrift.ttf— closest match to real German/WA plate font (manual download)DIN Alternate Bold,DIN Condensed Bold— closest macOS built-insImpact,Arial Black,Helvetica— fallback bold fonts
Augmentation pipeline per character image:
- Perspective warp (±8% corner jitter)
- Elastic distortion (simulate worn plate)
- Motion blur (simulate camera shake)
- Gaussian noise (σ = 0.02–0.05)
- Brightness scaling (×0.75–1.0)
36 classes (0–9, A–Z), multiple fonts, ~500 augmented samples per class.
3.8 Domain Adaptation (Few-shot Fine-tuning)
Problem: Synthetic font ≠ real plate appearance. A model trained on clean rendered characters fails on photographs of weathered, reflective metal plates.
Solution — fine-tune on the actual plate patches:
- Extract the 7 real character patches from the detected plate
- Augment each patch 200× (translate ±2px, rotate ±8°, brightness ×[0.8, 1.0], noise σ=0.02)
- Fine-tune with a very small learning rate to preserve existing knowledge:
optimizer = Adam(model.parameters(), lr=5e-5, weight_decay=1e-4)
# 20 epochs on 7 × 200 = 1,400 samples
Test-Time Augmentation (TTA):
At inference, apply 30 random small perturbations to each character patch and average the Softmax outputs:
probs = mean([softmax(model(augment(patch))) for _ in range(30)])
This reduces sensitivity to a single unlucky crop or lighting condition.
3.9 Preprocessing — Training/Inference Consistency
Both the synthetic dataset builder and the inference path call the exact same function:
def preprocess_patch(arr):
arr = 255.0 - arr # invert: dark chars → bright
arr = clip(arr - arr.max() * 0.20, 0, None) # suppress background
cy, cx = center_of_mass(arr)
arr = shift(arr, (H/2 - cy, W/2 - cx)) # centre-of-mass alignment
arr = resize(arr, (28, 28)) # EMNIST standard size
arr = clip(arr - arr.max() * 0.10, 0, None) # second-pass suppression
return arr / arr.max() # normalise to [0, 1]
Centre-of-mass alignment matches the preprocessing convention used in EMNIST. Using different preprocessing at training vs inference time is a common source of accuracy degradation that this design explicitly avoids.
4. Training Configuration
| Stage | Optimizer | LR | Batch | Epochs | Regularisation |
|---|---|---|---|---|---|
| PyTorch (EMNIST) | Adam | 0.001 | 256 | 15 | ReduceLROnPlateau |
| PrintedCharNet | Adam | 0.001 | 64 | 30 | Dropout(0.3) + BatchNorm |
| Domain adaptation | Adam | 5e-5 | full | 20 | weight_decay=1e-4 |
5. Dataset Summary
| Dataset | Purpose | Scale |
|---|---|---|
| EMNIST Balanced | Baseline PyTorch model (Sections 1–9) | 47 classes, 112,800 train / 18,800 test |
| Synthetic printed font | Section 10 plate model | 36 classes, 12 fonts, heavy augmentation |
| Real plate patches | Domain adaptation fine-tuning | 7 chars × 200 augmentations = 1,400 samples |
6. Key Design Decisions
| Decision | Rationale |
|---|---|
No nn.Softmax in forward() | nn.CrossEntropyLoss fuses log-softmax and NLLLoss in one numerically stable op. Separating them loses floating-point precision. |
BatchNorm + Dropout in PrintedCharNet | Training set is synthetic; BN stabilises training across font styles, Dropout prevents overfitting to rendered artefacts. |
Adam with ReduceLROnPlateau | Adam handles sparse gradients from varied fonts better than SGD. LR halves if validation loss stalls for 3 epochs. |
| Hough lines for skew correction | Projection variance sweep hit the ±8° boundary on 82px-tall crops (noise dominated). Hough measures actual pixel line angles from 52 detected lines. |
| EasyOCR instead of YOLO or MSER for plate detection | YOLO boxes the vehicle not the plate. MSER blob grouping is unstable at small scales. OCR reads text directly — more semantically reliable. |
| Expand from key tokens, not full band | Prevents dealer stickers and other same-row text from inflating the plate bounding box. |
| Fine-tuning LR = 5e-5 | 1,400 samples is tiny; a large LR overwrites pretrained weights instead of adapting them. |
| TTA with 30 samples | Single-pass predictions are noisy near character edges. Averaging 30 perturbed inputs collapses variance without retraining. |
Shared preprocess_patch() | Eliminates train/inference preprocessing mismatch — a common silent accuracy killer. |
7. File Structure
license_plate_softmax/
├── pytorch_softmax.py # PyTorch standalone script
├── license_plate_softmax.ipynb # Main notebook — 10 sections
├── RFC.md # This document
├── fonts/
│ └── FE-Schrift.ttf # Download manually from dafont.com
├── data/
│ └── emnist/ # Auto-downloaded by torchvision
└── yolov8n.pt # Auto-downloaded (kept as fallback)
8. Known Limitations
Font availability. FE-Schrift must be downloaded manually. The gap between built-in fonts and the real DIN 1451 typeface is the primary driver of digit misclassification.
OCR misreads. EasyOCR reads
CGL2439asCELZ439(C/G and L/E confusion). Detection still succeeds because the Washington text anchor compensates, but a misread reduces the band's score.Non-Washington plates. Without "Washington" text, detection falls back to alphanumeric matching alone. Other states with different layouts may produce lower detection confidence.
Segmentation threshold is heuristic.
ink_thresh = col_proj.max() × 0.10works well on clean crops but may over- or under-segment on low-contrast or heavily shadowed plates.TTA cost at inference. 30 forward passes per character × 7 characters = 210 forward passes per plate. Acceptable for a learning demo; would need batching or a single-pass uncertainty estimate for production use.