Sunday, February 25, 2024

Platform engineering - How does it work?

 

Platform Engineering: How does it work

Introduction

Platform engineering is a Top Gartner 2023 IT Trend. There are different concepts in the industry, but how can these concepts and products help an enterprise?

This article will discuss platform engineering from an enterprise perspective.

What does a tech organization need?

Digital transformation has changed enterprises. Every enterprise needs IT investments. As the enterprise grows, the Tech department also grows. To operate the Tech teams efficiently, just like an enterprise requires ERPs and CRMs, the Tech team needs specified tooling to achieve operational efficiency. This is now called DevOps, but there is often a disconnect.

Tech is built by people, for people. People are humans and cannot be 100% available like a machine. To achieve the desired quality, efficiency, and cost, a systematic approach is needed to cover every aspect and work well with PM, Dev, Test, and Ops engineers.

Amazon, for example, uses Dev to cover Dev, Test, and Ops engineers' roles, sometimes the PM role as well, and learns from that. Then, it applies the learning into the product itself to close the loop. Internally, Amazon has a strong infrastructure and Tools platform to cover for the engineers, so the engineers can focus on the actual work for each, rather than environment settings, permission management, host replacement, etc.

Google built Borg to achieve consistency in cluster management and improve cluster utilization to reduce costs. Google also invented the concept of SRE and built SRE engineering to focus more on continuous delivery, monitoring, etc. Based on the Tools and Infrastructure internally, they built and open-sourced Kubernetes. Kubernetes has gained tremendous popularity across the industry and changed Tech.

Amazon and Google are the whales, but you can also see examples from Netflix, Airbnb, Uber, etc.

Why does a tech organization need platform engineering?

The public cloud market share is expected to be 500 Billion in 2022 and 600 Billion in 2023(according to Gartner report). While it has gained tremendous popularity and redefined the IT industry, it has also created a tremendous challenge for DevOps. The infrastructure and tools used before have changed to Multi-cloud. The complexity has caused the developer efficiency to be lower, cost to increase, and quality to vary.

Platform engineering simplifies the build experience of new products. When a company is young and growing at its baby stage, it keeps trying to find a product-market-fit. At the growing stage, it matches multiple use cases and customers adopt it with a similar expectation on latency, performance, scalability, and experience. One way to achieve this is to use the same engineers or the engineers who have done this before. Another way is to bake the engineering experiences into tools and platforms, so products gain the learnings easily.

Platform engineering simplifies internal governance. Software organizations are product-lines with innovation. They innovate constantly and build different product lines. The dynamism from a product perspective is encouraged. The governance requirements typically land specifically in one organization, called platform, compliance, and as of engineering DevOps, it is now called platform engineering, with an emphasis on being platform-driven, but still focusing on the engineering metrics like requirement delivery time, deployment failure rate, software bug count, etc.

Platform engineering Concepts

The software industry is good at innovating new concepts, but they all have a purpose and history.

The Internal Developer Portal (IDP) is to build a standardized portal experience. With the emergence of micro frontend architecture, it has made it easier to build templatized and decoupled frontend architecture. Netflix shared their experience regarding Paved Road, and Spotify open-sourced backstage, which matched the enterprise internal websites demand.

Infrastructure focuses on operations, but the practices vary for company type. For the cloud companies, it means the hardware supply chain, standardization, virtualization, cluster management, etc. For the companies that rely on the cloud, it means to provide simplified, standardized, and secure access. Examples include Kubernetes and their variations, infrastructure-as-code, etc.

Productivity Tools focus on engineering output, with a focus on code lines output, code review count. Sometimes it also provides release tools, for example, Bazel, compiler tools, and software deployment tools. It embeds the software best practices standards within the tool to accelerate the development process.

How does it work for an enterprise

Enterprises focus on the quality, speed, and cost aspects of engineering.

On the quality part, the productivity tools embed the software standards. For example, Google used Code Review Certifier to review the code changes, with the tooling enforcing that checklist. The industry has multiple practices including LORE, SPACE to focus on different metrics, but the goal is the same, and the method will vary.

On the cost front, it is either the people or infrastructure cost. With a centralized infrastructure organization, it has the ability to control the infra cost, but to not impact the businesses, there is typically a negotiation process and a top-down OKR to drive it.

On the speed of innovation, the productivity tools in Google invented Bazel to accelerate the build speed. Netflix invented Spinnaker to simplify the deployment process, and Today there are AI tools to help generate code and tests.

In the end, this space is not new, but the mindset has evolved time-to-time. With the new AI innovation in the space, we are likely to see more.

Building a toy database from scratch with Cursor

As an experienced backend engineer, I have been trying to leverage Cursor to explore the potential of coding agent. The experiment ( Build a...