Architect Rocks: [Joy]License Plate Pattern Recognition

1. Background and Goals

Build a joy license plate character recognition system as a vehicle for learning how the Softmax classifier works. The system covers the pipeline from model training to real-plate inference using PyTorch.

2. System Architecture

Real license plate photo
        │
        ▼
 [Cell 1] Plate region detection     ← EasyOCR Group-by-Y strategy
        │
        ▼
 [Cell 2] Skew correction + segmentation  ← Hough lines + vertical projection
        │
        ▼
 [Cell 3] Synthetic training set     ← DIN/FE-Schrift fonts + augmentation
        │
        ▼
 [Cell 4] Train printed-font model   ← PrintedCharNet (MLP + BN + Dropout)
        │
        ▼
 [Cell 5] Domain adaptation          ← Few-shot fine-tuning on real patches
        │
        ▼
 [Cell 6] TTA inference + evaluation ← Average 30 perturbed Softmax outputs

3. Algorithm Analysis

3.1 Softmax Function

Definition:

softmax (z_{i}) = \frac{e^{z_{i}}}{\sum_{j} e^{z_{j}}}

Converts raw logits into a probability distribution — all outputs are positive and sum to 1. Used to express the model's confidence over 36 character classes.

Temperature scaling:

softmax (z / T)

Temperature	Effect
$T \to 0$	Collapses to argmax — maximum confidence
$T = 1$	Standard softmax
$T \to \infty$	Approaches uniform distribution — maximum uncertainty

3.2 PyTorch Model — `PrintedCharNet`

Architecture:

Input (784) → Linear(784→512) → BN → ReLU → Dropout(0.3)
            → Linear(512→512) → BN → ReLU → Dropout(0.3)
            → Linear(512→36)
            → [Softmax applied by loss / at inference]

class PrintedCharNet(nn.Module):
    def __init__(self, hid=512, out=36):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(784, hid), nn.BatchNorm1d(hid), nn.ReLU(), nn.Dropout(0.3),
            nn.Linear(hid, hid), nn.BatchNorm1d(hid), nn.ReLU(), nn.Dropout(0.3),
            nn.Linear(hid, out)
            # NO nn.Softmax here
        )

Why no nn.Softmax in forward():

nn.CrossEntropyLoss fuses log_softmax + NLLLoss in a single numerically stable operation. Computing softmax separately and then taking its log loses floating-point precision. Softmax is only applied explicitly at inference time:

probs = torch.softmax(model(x), dim=1)   # inference only
pred  = probs.argmax(dim=1)

He initialization (applied to all nn.Linear layers):

W \sim N ⁣ (0, \sqrt{\frac{2}{n_{in}}})

Keeps activation variance stable through ReLU layers, preventing vanishing/exploding gradients.

Training loop:

optimizer.zero_grad()
logits = model(X)              # raw scores — no softmax
loss   = criterion(logits, y)  # CrossEntropyLoss applies softmax internally
loss.backward()                # autograd computes all gradients
optimizer.step()

3.4 Plate Detection — Group-by-Y Strategy

Motivation: Color/edge-based detection fires on tail lights, logos, and shadows. A character-first approach finds the text line directly.

Algorithm:

Step 1 — EasyOCR full-image scan. Collect all (bbox, text, confidence) results.

Step 2 — Group boxes into horizontal bands. Two boxes belong to the same band if their vertical centres are within max(char_height × 0.8, 30px) of each other.

Step 3 — Score each band:

score = 5.0 × has_washington_text      # "Washington" / "WASH*", conf > 0.15
      + 4.0 × has_plate_alnum          # 4–8 alphanumeric chars, conf > 0.25
      + 2.0 × count_score              # peaks at 7 chars, decays away from it

Step 4 — Dynamic Plate Expansion. Take the union bounding box of only the key tokens (Washington + plate text), not the full band. This excludes unrelated text (e.g., dealer stickers) that happens to sit at the same height.

Step 5 — Proportional padding:

vert_pad  = max(14, char_height × 0.4)
horiz_pad = max(14, char_height × 0.6)

Why it works: A Washington state plate is the only location in the image where "Washington" co-occurs with a 7-character alphanumeric string.

Observed results:

Image	Truth	Selected score	Detected bbox
IMG_1474.jpg	CGL2439	9.0	y=325–379 x=95–209 ✓
IMG_6478.jpg	AWC6290	4.0	Correct region ✓

3.5 Skew Correction — Hough Line Detection

Problem with the original approach: Sweeping -8° to +8° and maximising projection variance selected -8° (the boundary of the search range) — clearly wrong, since the optimiser hit the wall rather than finding a true peak.

Hough-based method:

# 1. Threshold plate crop to binary
binary = (plate < local_mean - 15) * 255

# 2. Detect line segments
lines = cv2.HoughLinesP(binary, rho=1, theta=π/180,
                         threshold=20,
                         minLineLength=plate_width // 6,
                         maxLineGap=plate_width // 8)

# 3. Keep only near-horizontal lines (|angle| < 15°)
near_horiz = [angle for line in lines if abs(angle) < 15]

# 4. Correction = negative median (robust to outlier lines)
correction = -median(near_horiz)

Decision logic:

|correction| ≥ 0.5°  →  rotate  (meaningful skew)
|correction| < 0.5°  →  skip   (plate is already straight)
Hough finds no lines →  fall back to two-pass projection sweep
                         (coarse 1° steps, then fine 0.25° around best)

Observed on IMG_1474.jpg: Hough detected 52 near-horizontal lines, median angle 2.72°. The old projection method would have applied -8° instead.

3.6 Character Segmentation — Vertical Projection Histogram

col_proj   = binary.sum(axis=0)            # ink pixel count per column
ink_thresh = col_proj.max() * 0.10         # 10% of peak = character boundary

Contiguous columns above threshold form a run (one character segment). Wide runs (width > 1.5 × expected character width) are split at local minima:

valleys, _ = find_peaks(-run_proj, distance=expected_w * 0.5)
# if not enough valleys, fall back to equal-width splits

Character bounding boxes are then tightened vertically to the actual ink rows within each column slice.

3.7 Synthetic Dataset Generation

Training data for Section 10 is rendered programmatically to match the target font (DIN 1451 / FE-Schrift):

Font pool (in priority order):

FE-Schrift.ttf — closest match to real German/WA plate font (manual download)
DIN Alternate Bold, DIN Condensed Bold — closest macOS built-ins
Impact, Arial Black, Helvetica — fallback bold fonts

Augmentation pipeline per character image:

Perspective warp (±8% corner jitter)
Elastic distortion (simulate worn plate)
Motion blur (simulate camera shake)
Gaussian noise (σ = 0.02–0.05)
Brightness scaling (×0.75–1.0)

36 classes (0–9, A–Z), multiple fonts, ~500 augmented samples per class.

3.8 Domain Adaptation (Few-shot Fine-tuning)

Problem: Synthetic font ≠ real plate appearance. A model trained on clean rendered characters fails on photographs of weathered, reflective metal plates.

Solution — fine-tune on the actual plate patches:

Extract the 7 real character patches from the detected plate
Augment each patch 200× (translate ±2px, rotate ±8°, brightness ×[0.8, 1.0], noise σ=0.02)
Fine-tune with a very small learning rate to preserve existing knowledge:

optimizer = Adam(model.parameters(), lr=5e-5, weight_decay=1e-4)
# 20 epochs on 7 × 200 = 1,400 samples

Test-Time Augmentation (TTA):

At inference, apply 30 random small perturbations to each character patch and average the Softmax outputs:

probs = mean([softmax(model(augment(patch))) for _ in range(30)])

This reduces sensitivity to a single unlucky crop or lighting condition.

3.9 Preprocessing — Training/Inference Consistency

Both the synthetic dataset builder and the inference path call the exact same function:

def preprocess_patch(arr):
    arr = 255.0 - arr                            # invert: dark chars → bright
    arr = clip(arr - arr.max() * 0.20, 0, None)  # suppress background
    cy, cx = center_of_mass(arr)
    arr = shift(arr, (H/2 - cy, W/2 - cx))      # centre-of-mass alignment
    arr = resize(arr, (28, 28))                  # EMNIST standard size
    arr = clip(arr - arr.max() * 0.10, 0, None)  # second-pass suppression
    return arr / arr.max()                       # normalise to [0, 1]

Centre-of-mass alignment matches the preprocessing convention used in EMNIST. Using different preprocessing at training vs inference time is a common source of accuracy degradation that this design explicitly avoids.

4. Training Configuration

Stage	Optimizer	LR	Batch	Epochs	Regularisation
PyTorch (EMNIST)	Adam	0.001	256	15	ReduceLROnPlateau
PrintedCharNet	Adam	0.001	64	30	Dropout(0.3) + BatchNorm
Domain adaptation	Adam	5e-5	full	20	weight_decay=1e-4

5. Dataset Summary

Dataset	Purpose	Scale
EMNIST Balanced	Baseline PyTorch model (Sections 1–9)	47 classes, 112,800 train / 18,800 test
Synthetic printed font	Section 10 plate model	36 classes, 12 fonts, heavy augmentation
Real plate patches	Domain adaptation fine-tuning	7 chars × 200 augmentations = 1,400 samples

6. Key Design Decisions

Decision	Rationale
No `nn.Softmax` in `forward()`	`nn.CrossEntropyLoss` fuses log-softmax and NLLLoss in one numerically stable op. Separating them loses floating-point precision.
BatchNorm + Dropout in `PrintedCharNet`	Training set is synthetic; BN stabilises training across font styles, Dropout prevents overfitting to rendered artefacts.
Adam with `ReduceLROnPlateau`	Adam handles sparse gradients from varied fonts better than SGD. LR halves if validation loss stalls for 3 epochs.
Hough lines for skew correction	Projection variance sweep hit the ±8° boundary on 82px-tall crops (noise dominated). Hough measures actual pixel line angles from 52 detected lines.
EasyOCR instead of YOLO or MSER for plate detection	YOLO boxes the vehicle not the plate. MSER blob grouping is unstable at small scales. OCR reads text directly — more semantically reliable.
Expand from key tokens, not full band	Prevents dealer stickers and other same-row text from inflating the plate bounding box.
Fine-tuning LR = 5e-5	1,400 samples is tiny; a large LR overwrites pretrained weights instead of adapting them.
TTA with 30 samples	Single-pass predictions are noisy near character edges. Averaging 30 perturbed inputs collapses variance without retraining.
Shared `preprocess_patch()`	Eliminates train/inference preprocessing mismatch — a common silent accuracy killer.

7. File Structure

license_plate_softmax/
├── pytorch_softmax.py            # PyTorch standalone script
├── license_plate_softmax.ipynb   # Main notebook — 10 sections
├── RFC.md                        # This document
├── fonts/
│   └── FE-Schrift.ttf            # Download manually from dafont.com
├── data/
│   └── emnist/                   # Auto-downloaded by torchvision
└── yolov8n.pt                    # Auto-downloaded (kept as fallback)

8. Known Limitations

Font availability. FE-Schrift must be downloaded manually. The gap between built-in fonts and the real DIN 1451 typeface is the primary driver of digit misclassification.
OCR misreads. EasyOCR reads CGL2439 as CELZ439 (C/G and L/E confusion). Detection still succeeds because the Washington text anchor compensates, but a misread reduces the band's score.
Non-Washington plates. Without "Washington" text, detection falls back to alphanumeric matching alone. Other states with different layouts may produce lower detection confidence.
Segmentation threshold is heuristic. ink_thresh = col_proj.max() × 0.10 works well on clean crops but may over- or under-segment on low-contrast or heavily shadowed plates.
TTA cost at inference. 30 forward passes per character × 7 characters = 210 forward passes per plate. Acceptable for a learning demo; would need batching or a single-pass uncertainty estimate for production use.

Architect Rocks

Sunday, March 22, 2026

[Joy]License Plate Pattern Recognition

1. Background and Goals

2. System Architecture

3. Algorithm Analysis

3.1 Softmax Function

3.2 PyTorch Model — `PrintedCharNet`

3.4 Plate Detection — Group-by-Y Strategy

3.5 Skew Correction — Hough Line Detection

3.6 Character Segmentation — Vertical Projection Histogram

3.7 Synthetic Dataset Generation

3.8 Domain Adaptation (Few-shot Fine-tuning)

3.9 Preprocessing — Training/Inference Consistency

4. Training Configuration

5. Dataset Summary

6. Key Design Decisions

7. File Structure

8. Known Limitations

No comments:

Post a Comment

[Joy]License Plate Pattern Recognition

Labels

Blog Archive

Sunday, March 22, 2026

[Joy]License Plate Pattern Recognition

1. Background and Goals

2. System Architecture

3. Algorithm Analysis

3.1 Softmax Function

3.2 PyTorch Model — PrintedCharNet

3.4 Plate Detection — Group-by-Y Strategy

3.5 Skew Correction — Hough Line Detection

3.6 Character Segmentation — Vertical Projection Histogram

3.7 Synthetic Dataset Generation

3.8 Domain Adaptation (Few-shot Fine-tuning)

3.9 Preprocessing — Training/Inference Consistency

4. Training Configuration

5. Dataset Summary

6. Key Design Decisions

7. File Structure

8. Known Limitations

No comments:

Post a Comment

[Joy]License Plate Pattern Recognition

3.2 PyTorch Model — `PrintedCharNet`