Monday, October 13, 2025

EasyDB Architecture Explained: Building Blocks of a Tiny Database

EasyDB Architecture Explained: Building Blocks of a Toy Database

This blog post explores the internal architecture of EasyDB, a lightweight educational database written in Java. We'll start with a high-level overview of modern database architecture and key algorithmic design choices, then walk through EasyDB's components across four areas: storage, MVCC & ACID, query execution, and indexing/query planning. Each section covers design trade-offs and includes source code examples.

Database Architecture Overview

A modern relational database typically consists of the following subsystems:

This blog post explores the internal architecture of EasyDB, a lightweight educational database written in Java. We'll start with a high-level overview of modern database architecture and key algorithmic design choices, then walk through EasyDB's components across four areas: storage, MVCC & ACID, query execution, and indexing/query planning. Each section covers design trade-offs and includes source code examples.

A modern relational database typically consists of the following subsystems:

  1. Storage Engine: Manages how data is physically laid out and persisted. The core challenge is balancing cost and throughput. For example, PostgreSQL uses 4KB pages because that size aligns well with physical disk block sizes and minimizes I/O overhead. The storage engine includes page formatting (e.g., slotted pages) to support transaction management, index and vacuum etc.

  2. Transaction Management: Ensures ACID (Atomicity, Consistency, Isolation, Durability) properties through concurrency control and logging.

  3. Query Processor: Parses SQL, plans execution, and runs logical operators using execution models like Volcano.

  4. Indexing and Optimizer: Supports fast data retrieval using index structures and selects efficient query plans.

Algorithm Comparison Across Core Areas

ComponentAlgorithmDescriptionProsCons
StorageTuple-based Layout and Vacuum supportClassic tuple based storageFast in-place CRUDNeeds Auto Vacuum to cleanup the zombie records
Null Bitmaps Compression to save spaceCompact format to save spacesExtra calculation and state managment
TransactionsMVCC with XMIN/XMAXSnapshot-based read isolationReaders don’t block writersOld versions need cleanup


Query ExecutionVolcano Iterator ModelOperators implement next() to fetch rowsMemory efficient, modularNo vectorization or parallelism
Rule-based PlanningUses indexes if availableSimple to implementNot cost-optimal
IndexingClustered B+ TreeSorted, balanced tree for equality/range queriesFast range scans, efficient searchSlower writes, not version-aware
Heuristic Plan SelectionChooses plans based on field/index availabilityEasy to understandNo cost-based decisions

With this context in mind, let's explore how EasyDB implements each of these components.

1. Database Storage

Overview: EasyDB persists data using slotted page storage and write-ahead logging, organized with in-memory buffer management.

Core Structures:

  • Tuple.java: Tuple header layout and version management.

  • PageImpl.java: Page structure and raw byte array management.

  • Stroage.java: Handles file reads/writes and page allocation.

Algorithms & Concepts:

  • Header Format: Tracks validity and size metadata for each record.

  • Null bitmaps: Bitmasks to track the null columns.

Pros:

  • Clear and efficient control over memory and I/O.

  • Easy to understand format.

Cons:

  • Manual memory management with potential for leaks.

  • No automatic eviction or LRU.

2. MVCC and ACID Guarantees

Overview: EasyDB uses Multi-Version Concurrency Control (MVCC) and a transaction table to enforce ACID semantics.

Core Structures:

  • TupleHeader.java: Transaction table and state tracking.

  • VersionManagerImpl.java: MVCC logic and isolation.

  • LockTable.java: Two-phase locking and deadlock detection.

Algorithms & Concepts:

  • XMIN/XMAX per record: Identifies creating and deleting transactions.

  • Visibility Rules: Based on commit status and transaction start time.

  • Undo Logging: Stores oldRaw in each DataItem for rollback.

Pros:

  • Snapshot isolation with non-blocking reads.

  • Rollback and recovery support with WAL and oldRaw.

Cons:

  • Manual version chain cleanup required.

  • Writers can still block due to locks.

Code Highlights:

public static boolean isVersionSkip(Transaction t, long xmin, long xmax) {
    return t.getLevel() == 3 && t.getXid() != xmin && xmax != 0;
}

Ensures repeatable read isolation by detecting version skips.

3. Query Engine

Overview: EasyDB implements a unified query processing pipeline that spans three key stages: parsing, planning, and execution. This modular flow is built on top of JSQLParser and operates using a Volcano-style iterator model to deliver results.

Parsing

EasyDB uses the external JSQLParser library to convert SQL strings into structured syntax trees. This parsing stage breaks down a query into its components—select fields, tables, conditions, and expressions—making it easier to reason about the query in subsequent steps.

Code Path:

  • TableManagerImpl.java – Handles SQL statement type dispatch.

  • Table.java – Extracts parsed clauses (e.g., WHERE conditions) and converts them into internal representations.

Example:

Statement stmt = CCJSqlParserUtil.parse(sql);

Planning

Once parsed, EasyDB builds a logical execution plan. This is a rule-based planner that chooses between index scans and full table scans based on available indexes and WHERE conditions.

Plan Strategies:

  • No index available: Full table scan.

  • Single index available: Index scan with optional post-filter.

  • Two indexes: Intersect or union results for AND/OR logic.

Code Path:

  • Table.parseWhere() – Translates conditions into index-friendly filters.

Example:

if (f1 != null && f2 != null) {
    uids.retainAll(uids2); // AND logic with index intersection
}

Execution

Execution is carried out through a simple iterator pipeline that mimics the Volcano model. Each query returns a list of candidate UIDs (record IDs), which are then read using MVCC filters and transformed into result rows.

Execution Flow:

  1. Iterate over candidate UIDs.

  2. Use VersionManager.read(xid, uid) to fetch visible records.

  3. Apply WHERE clause filters.

  4. Project selected fields.

Code Path:

  • Table.select() – Performs MVCC-based filtering and row materialization.

  • VersionManagerImpl.java – Controls record visibility.

Example:

for (Long uid : uids) {
    byte[] raw = versionManager.read(xid, uid);
    if (raw == null) continue;
    Map<String, Object> entry = parseEntry(raw);
    if (matchesWhere(entry)) result.add(entry);
}

Pros:

  • Modular and easy to extend.

  • Clear separation between parse, plan, and execute.

Cons:

  • No cost-based optimization.

  • No pipelining between operators or intermediate plan tree.

4. Indexing, Stats & Query Planning

Overview: EasyDB supports clustered B+ tree indexes and uses basic heuristics for query planning.

Core Structures:

  • BPlusTree.java: Tree operations and page structure.

  • Node.java: Leaf and internal node definitions.

  • Field.java: Index lookup and condition handling.

Algorithms & Concepts:

  • Clustered B+ Tree: Leaf nodes store data UIDs; inner nodes guide search.

  • Range Scan Support: Sibling pointers for fast traversal.

  • Rule-Based Plan Selection: Favor indexes over full scans.

Pros:

  • Fast equality and range lookups.

  • Efficient key ordering and storage.

Cons:

  • Indexes not versioned — may point to stale records.

  • No histogram-based cost estimates.

Code Highlights:

List<Long> uids = f1.search(res1.left, res1.right);

Uses B+ tree index to retrieve UIDs matching a range.


Final Thoughts

EasyDB (Java version) demonstrates many core components of database systems — from slotted pages and WAL to MVCC and B+ tree indexing. While simplified, it offers a clear and accessible path to understand database internals.

For more details, browse the EasyDB GitHub repository and try hacking on it yourself!

No comments:

Post a Comment

[Joy]License Plate Pattern Recognition

1. Background and Goals Build a joy license plate character recognition system as a vehicle for  learning how the Softmax classifier works ....