Sunday, March 22, 2026

[Joy]License Plate Pattern Recognition

1. Background and Goals

Build a joy license plate character recognition system as a vehicle for learning how the Softmax classifier works. The system covers the pipeline from model training to real-plate inference using PyTorch.


2. System Architecture

Real license plate photo
        │
        ▼
 [Cell 1] Plate region detection     ← EasyOCR Group-by-Y strategy
        │
        ▼
 [Cell 2] Skew correction + segmentation  ← Hough lines + vertical projection
        │
        ▼
 [Cell 3] Synthetic training set     ← DIN/FE-Schrift fonts + augmentation
        │
        ▼
 [Cell 4] Train printed-font model   ← PrintedCharNet (MLP + BN + Dropout)
        │
        ▼
 [Cell 5] Domain adaptation          ← Few-shot fine-tuning on real patches
        │
        ▼
 [Cell 6] TTA inference + evaluation ← Average 30 perturbed Softmax outputs

3. Algorithm Analysis

3.1 Softmax Function

Definition:

softmax(zi)=ezijezj

Converts raw logits into a probability distribution — all outputs are positive and sum to 1. Used to express the model's confidence over 36 character classes.

Temperature scaling:

softmax(z/T)

TemperatureEffect
T0Collapses to argmax — maximum confidence
T=1Standard softmax
TApproaches uniform distribution — maximum uncertainty

3.2 PyTorch Model — PrintedCharNet

Architecture:

Input (784) → Linear(784→512) → BN → ReLU → Dropout(0.3)
            → Linear(512→512) → BN → ReLU → Dropout(0.3)
            → Linear(512→36)
            → [Softmax applied by loss / at inference]
class PrintedCharNet(nn.Module):
    def __init__(self, hid=512, out=36):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(784, hid), nn.BatchNorm1d(hid), nn.ReLU(), nn.Dropout(0.3),
            nn.Linear(hid, hid), nn.BatchNorm1d(hid), nn.ReLU(), nn.Dropout(0.3),
            nn.Linear(hid, out)
            # NO nn.Softmax here
        )

Why no nn.Softmax in forward():

nn.CrossEntropyLoss fuses log_softmax + NLLLoss in a single numerically stable operation. Computing softmax separately and then taking its log loses floating-point precision. Softmax is only applied explicitly at inference time:

probs = torch.softmax(model(x), dim=1)   # inference only
pred  = probs.argmax(dim=1)

He initialization (applied to all nn.Linear layers):

WN ⁣(0, 2nin)

Keeps activation variance stable through ReLU layers, preventing vanishing/exploding gradients.

Training loop:

optimizer.zero_grad()
logits = model(X)              # raw scores — no softmax
loss   = criterion(logits, y)  # CrossEntropyLoss applies softmax internally
loss.backward()                # autograd computes all gradients
optimizer.step()

3.4 Plate Detection — Group-by-Y Strategy

Motivation: Color/edge-based detection fires on tail lights, logos, and shadows. A character-first approach finds the text line directly.

Algorithm:

Step 1 — EasyOCR full-image scan. Collect all (bbox, text, confidence) results.

Step 2 — Group boxes into horizontal bands. Two boxes belong to the same band if their vertical centres are within max(char_height × 0.8, 30px) of each other.

Step 3 — Score each band:

score = 5.0 × has_washington_text      # "Washington" / "WASH*", conf > 0.15
      + 4.0 × has_plate_alnum          # 4–8 alphanumeric chars, conf > 0.25
      + 2.0 × count_score              # peaks at 7 chars, decays away from it

Step 4 — Dynamic Plate Expansion. Take the union bounding box of only the key tokens (Washington + plate text), not the full band. This excludes unrelated text (e.g., dealer stickers) that happens to sit at the same height.

Step 5 — Proportional padding:

vert_pad  = max(14, char_height × 0.4)
horiz_pad = max(14, char_height × 0.6)

Why it works: A Washington state plate is the only location in the image where "Washington" co-occurs with a 7-character alphanumeric string.

Observed results:

ImageTruthSelected scoreDetected bbox
IMG_1474.jpgCGL24399.0y=325–379 x=95–209 ✓
IMG_6478.jpgAWC62904.0Correct region ✓

3.5 Skew Correction — Hough Line Detection

Problem with the original approach: Sweeping -8° to +8° and maximising projection variance selected -8° (the boundary of the search range) — clearly wrong, since the optimiser hit the wall rather than finding a true peak.

Hough-based method:

# 1. Threshold plate crop to binary
binary = (plate < local_mean - 15) * 255

# 2. Detect line segments
lines = cv2.HoughLinesP(binary, rho=1, theta=π/180,
                         threshold=20,
                         minLineLength=plate_width // 6,
                         maxLineGap=plate_width // 8)

# 3. Keep only near-horizontal lines (|angle| < 15°)
near_horiz = [angle for line in lines if abs(angle) < 15]

# 4. Correction = negative median (robust to outlier lines)
correction = -median(near_horiz)

Decision logic:

|correction| ≥ 0.5°  →  rotate  (meaningful skew)
|correction| < 0.5°  →  skip   (plate is already straight)
Hough finds no lines →  fall back to two-pass projection sweep
                         (coarse 1° steps, then fine 0.25° around best)

Observed on IMG_1474.jpg: Hough detected 52 near-horizontal lines, median angle 2.72°. The old projection method would have applied -8° instead.


3.6 Character Segmentation — Vertical Projection Histogram

col_proj   = binary.sum(axis=0)            # ink pixel count per column
ink_thresh = col_proj.max() * 0.10         # 10% of peak = character boundary

Contiguous columns above threshold form a run (one character segment). Wide runs (width > 1.5 × expected character width) are split at local minima:

valleys, _ = find_peaks(-run_proj, distance=expected_w * 0.5)
# if not enough valleys, fall back to equal-width splits

Character bounding boxes are then tightened vertically to the actual ink rows within each column slice.


3.7 Synthetic Dataset Generation

Training data for Section 10 is rendered programmatically to match the target font (DIN 1451 / FE-Schrift):

Font pool (in priority order):

  1. FE-Schrift.ttf — closest match to real German/WA plate font (manual download)
  2. DIN Alternate BoldDIN Condensed Bold — closest macOS built-ins
  3. ImpactArial BlackHelvetica — fallback bold fonts

Augmentation pipeline per character image:

  • Perspective warp (±8% corner jitter)
  • Elastic distortion (simulate worn plate)
  • Motion blur (simulate camera shake)
  • Gaussian noise (σ = 0.02–0.05)
  • Brightness scaling (×0.75–1.0)

36 classes (0–9, A–Z), multiple fonts, ~500 augmented samples per class.


3.8 Domain Adaptation (Few-shot Fine-tuning)

Problem: Synthetic font ≠ real plate appearance. A model trained on clean rendered characters fails on photographs of weathered, reflective metal plates.

Solution — fine-tune on the actual plate patches:

  1. Extract the 7 real character patches from the detected plate
  2. Augment each patch 200× (translate ±2px, rotate ±8°, brightness ×[0.8, 1.0], noise σ=0.02)
  3. Fine-tune with a very small learning rate to preserve existing knowledge:
optimizer = Adam(model.parameters(), lr=5e-5, weight_decay=1e-4)
# 20 epochs on 7 × 200 = 1,400 samples

Test-Time Augmentation (TTA):

At inference, apply 30 random small perturbations to each character patch and average the Softmax outputs:

probs = mean([softmax(model(augment(patch))) for _ in range(30)])

This reduces sensitivity to a single unlucky crop or lighting condition.


3.9 Preprocessing — Training/Inference Consistency

Both the synthetic dataset builder and the inference path call the exact same function:

def preprocess_patch(arr):
    arr = 255.0 - arr                            # invert: dark chars → bright
    arr = clip(arr - arr.max() * 0.20, 0, None)  # suppress background
    cy, cx = center_of_mass(arr)
    arr = shift(arr, (H/2 - cy, W/2 - cx))      # centre-of-mass alignment
    arr = resize(arr, (28, 28))                  # EMNIST standard size
    arr = clip(arr - arr.max() * 0.10, 0, None)  # second-pass suppression
    return arr / arr.max()                       # normalise to [0, 1]

Centre-of-mass alignment matches the preprocessing convention used in EMNIST. Using different preprocessing at training vs inference time is a common source of accuracy degradation that this design explicitly avoids.


4. Training Configuration

StageOptimizerLRBatchEpochsRegularisation
PyTorch (EMNIST)Adam0.00125615ReduceLROnPlateau
PrintedCharNetAdam0.0016430Dropout(0.3) + BatchNorm
Domain adaptationAdam5e-5full20weight_decay=1e-4

5. Dataset Summary

DatasetPurposeScale
EMNIST BalancedBaseline PyTorch model (Sections 1–9)47 classes, 112,800 train / 18,800 test
Synthetic printed fontSection 10 plate model36 classes, 12 fonts, heavy augmentation
Real plate patchesDomain adaptation fine-tuning7 chars × 200 augmentations = 1,400 samples

6. Key Design Decisions

DecisionRationale
No nn.Softmax in forward()nn.CrossEntropyLoss fuses log-softmax and NLLLoss in one numerically stable op. Separating them loses floating-point precision.
BatchNorm + Dropout in PrintedCharNetTraining set is synthetic; BN stabilises training across font styles, Dropout prevents overfitting to rendered artefacts.
Adam with ReduceLROnPlateauAdam handles sparse gradients from varied fonts better than SGD. LR halves if validation loss stalls for 3 epochs.
Hough lines for skew correctionProjection variance sweep hit the ±8° boundary on 82px-tall crops (noise dominated). Hough measures actual pixel line angles from 52 detected lines.
EasyOCR instead of YOLO or MSER for plate detectionYOLO boxes the vehicle not the plate. MSER blob grouping is unstable at small scales. OCR reads text directly — more semantically reliable.
Expand from key tokens, not full bandPrevents dealer stickers and other same-row text from inflating the plate bounding box.
Fine-tuning LR = 5e-51,400 samples is tiny; a large LR overwrites pretrained weights instead of adapting them.
TTA with 30 samplesSingle-pass predictions are noisy near character edges. Averaging 30 perturbed inputs collapses variance without retraining.
Shared preprocess_patch()Eliminates train/inference preprocessing mismatch — a common silent accuracy killer.

7. File Structure

license_plate_softmax/
├── pytorch_softmax.py            # PyTorch standalone script
├── license_plate_softmax.ipynb   # Main notebook — 10 sections
├── RFC.md                        # This document
├── fonts/
│   └── FE-Schrift.ttf            # Download manually from dafont.com
├── data/
│   └── emnist/                   # Auto-downloaded by torchvision
└── yolov8n.pt                    # Auto-downloaded (kept as fallback)

8. Known Limitations

  1. Font availability. FE-Schrift must be downloaded manually. The gap between built-in fonts and the real DIN 1451 typeface is the primary driver of digit misclassification.

  2. OCR misreads. EasyOCR reads CGL2439 as CELZ439 (C/G and L/E confusion). Detection still succeeds because the Washington text anchor compensates, but a misread reduces the band's score.

  3. Non-Washington plates. Without "Washington" text, detection falls back to alphanumeric matching alone. Other states with different layouts may produce lower detection confidence.

  4. Segmentation threshold is heuristic. ink_thresh = col_proj.max() × 0.10 works well on clean crops but may over- or under-segment on low-contrast or heavily shadowed plates.

  5. TTA cost at inference. 30 forward passes per character × 7 characters = 210 forward passes per plate. Acceptable for a learning demo; would need batching or a single-pass uncertainty estimate for production use.

No comments:

Post a Comment

[Joy]License Plate Pattern Recognition

1. Background and Goals Build a joy license plate character recognition system as a vehicle for  learning how the Softmax classifier works ....