Sunday, May 25, 2025

Building a toy database from scratch with Cursor

As an experienced backend engineer, I have been trying to leverage Cursor to explore the potential of coding agent. The experiment (Build a toy database from scratch) is ranging from Dec 2024 to March 2025,  32 commits, and thousands of lines of code, here are the three lessons I learnt from it.

TLDR: Coding agent is a capable peer engineer you can leverage as every part of the SDLC, but intentionally.

#1 Context is key

Building a database from a high level can have different approaches, from the high level to the detailed implementation, each context could be different, feeding in right amount of context for the coding agent is critical. The engineer is capable at every part of system from architecture to coding, but you just need to treat him as a buddy to brainstorm, communicate, and think together.

At a high level, choose your high level mind first. Building a database could be relational, key-value, inverted search, document-based, or vertor-based etc. It is easy for the coding agent to generate a framework code for you, but you will need to determine which architecture style you would like to go with. In my case, I go with relational, and asked Cursor to stick to the postgres implementation by defining a cursor rule

At component level, try to setup a minimal structure to get started and iterate incrementally. A database is composed by storage, index, query, parser, execution, transaction, maintenance, admission control etc. It can easily be overwhelmed for the coding agent to copy other implementation and overwhelm your mind and project. Try to start small, I started with a memory-based storage, and then focusing on query engine, and index later. I had tried to go faster and let the coding agent do everything, but then I have to revert and take a step back, simply because the project is out-of-control.

At a detailed level, implementing a specific algorithm can have different styles, for example: the MVCC (multiple-version concurrency control) can have different way of implementation, try to brainstorm with the coding agent and think together which way is the best, and then choose to go with the path. For example, I brainstormed with the agent on Version Chain, but the version chain could be implemented at the tuple, or a separate data structure, but the original storage layer is better suited with the tuple level, it make the coding implementation easier. 

#2 Commit frequently

Working on large-scale and thinking intensive system is challenging, so does for the agent, The agent is easy to copy and understand the existing systems, but building a new system is always have some innovative aspect, and the agent is NOT you, and will not follow 100%. If the chance is right, the agent can help 10x your productvitiy, but it can also easily ruin your hard-working project. So, do backup, by commit frequently.

When I was firstly get started, I did not do it, and hard lesson is I have to restart on a big part of the project.

#3 Set boundary clearly

The toy database is built with Bazel 7, which has limited information on the internet, so whenever I asked it to fix a build, it always copy some instructions from the previous Bazel versions. So I have to specific the cursor rule to instruct the coding agent always use a Bazel 7 version, so it can reduce the chance that it is telling wrong answers.

On the coding and testing part, specific your coding style and testing framework you want to use and follow on. There are many different implementations to copy from, coding with intention.

Next Steps, it actually give me more fun for coding, and will try to build more toy systems with it.



No comments:

Post a Comment

Building a toy database from scratch with Cursor

As an experienced backend engineer, I have been trying to leverage Cursor to explore the potential of coding agent. The experiment ( Build a...