Situation Overview: The Power of Positive Feedback Loops
Over the last decade, we have witnessed the growing prominence of DevOps within both technical and enterprise communities. Understanding that development (“Dev”) and IT Operations (“Ops”) should not be siloed, engineers and specialists now work together across the entire application lifecycle. At its core, DevOps combines cultural philosophies with tools and practices to increase the delivery of applications and services at high velocity, evolving and improving faster than using traditional software development or infrastructure management processes. The improved speed enables organizations to better serve their customers and compete more effectively in the market.
While agile development has historically been effective in reducing development times by increasing communication between developers and key stakeholders, DevOps applies the same cultural norms to post-development tasks across communities and collaborators. Firms with DevOps tend to witness increased deployment frequency on-demand (multiple deploys per day), lead times (under an hour vs. up to several weeks), and mean-time-to-recover [1] (under an hour vs. up to several days). Post-code processes are now guaranteed the same responsiveness and agility as staging and pre-production environments.
Today nearly 75% of enterprises are adopting the superior model. The old design → build → test → ship methodology does not lend itself to quickly releasing bug fixes, patches, or updates. Instead, we witness the rise of continuous everything: continuous integration (CI) and continuous delivery (CD) in turn allow for rapid deployment, testing, and promotion of new features into production (see Figure 1). Not to mention, such a model supports scale through: achieving complex or changing systems efficiently with monitored risk; collaborating via shared responsibilities, combined workflows, and reduced handover; and securing systems with retained control, real-time monitoring / logging, and preserved compliance.
Concurrently, technical advancements across network infrastructure led to the rise of microservices and containers. Microservices have democratized language and technology choices across independent service teams, groups which develop new features quickly or iteratively and continuously deliver software, further inhibiting the DevOps model. Move over monoliths, as microservices running in containers, shrink down the scale of deployment and make processes far more manageable. Services like Docker and Kubernetes dramatically reduce the incremental operational burden to deployment- deploying 10 services is no longer 10x the work as that of a single app. This phenomenon results in dramatic cost reductions for microservice adoption and has spurred its growth, as patterns for packaging and deployment are standardized across the entire organization.
With scalability and clustering growing naturally along the demand curve, the ability to add new services is far more granular than what was previously possible with bundled deployment. Containers have become the go-to choice for deploying many microservices, an unsurprising feat given many immediate benefits over traditional virtual and physical machines including faster startup time, smaller footprints (megabytes vs. gigabytes), direct hardware access (VMs are too abstract, slowing down processing), and ongoing self-documentation with configuration. To be sure, the global cloud microservices market is growing at a ~20% CAGR and expected to reach ~$3bn in spend by 2026 [3].
Other technological trends have also played a role. Progress in automated processes and practices such as application release automation (ARA) that support continuous integration, within a market expected to reach $65bn by 2022 [4], have become standardized. Conversely, open-source has enabled faster speeds to market and innovation. As software continues to transform the world and become more integral to parts of every business, microservices and continuous delivery will enable teams to take ownership of services and version releases. In turn, teams will innovate for customers faster, adapt to changing markets better, and grow more efficient at driving business results.
The emergence of new cultural norms alongside the rise of technological innovation is the product of powerful, tightly-coupled feedback loops: communities and enterprises are able to inculcate a DevOps environment with technological advancements such as microservices and containers, while such technologies have enjoyed asymmetric adoption bolstered by the rise of the new movement.
The microservices and containers movement has undoubtedly resulted in highly scalable, independently delivered services in a cloud-native approach. However, with the aforementioned benefits improving software development come new complexities and technological challenges. Service mesh, a disruptive software infrastructure layer for controlling and monitoring service-to-service traffic in microservices, aims to standardize runtime operations the same way microservices standardized deployment-time operations.
History of Service Mesh
Service mesh as a technology initially emerged around 2010 alongside the rise in adoption of web-apps and out of the need to standardize runtime operations. The three-tiered model of application architecture [5] became de-facto for powering traffic and communications between the web, application layer, and database. However, during instances of high ingress, monolithic design begins to break down at the application layer. While Docker and Kubernetes provided standardized packaging and reduced operational burden around app deployment, efficiencies surrounding ongoing communications and runtime were left largely untouched.
Subsequently, engineers began to experiment with ways to control and measure request traffic between applications and services by decoupling monoliths into independent service modules (via service-oriented architecture, or SOA). These early microservices leveraged network-based communications protocols to provision and offer services to other components of the system. Such applications are structured as a collection of loosely-coupled, fine-grained services. When connected together (via lightweight protocols), they form a comprehensive application, while decoupling point-of-failure risk from any single instance of north-south or east-west traffic. App updates or patches can also be targeted by developers at individual microservices versus the broader monolith.
The microservice approach, while improving certain facets of deployment and run-time, in turn introduced new complexities. Any missteps in the communication network within or across such microservices resulted in site failures, as connecting, monitoring, and securing numerous microservices together proved to be no simple infrastructure or networking feat. In order to adopt containerized microservice runtime environments, new technology became essential as:
- Traditional API gateways proved too cumbersome to manage the volume of ingress/digress communication and lacked TLS encryption between clients at the edge
- Traditional IP networking (client-server communication (via a load balancer) with respective IP addresses hardcoded) lacked the optimal framework to manage all facets of microservices’ interservice communication
Early Enterprise Design
Early adopters demanded new mediation and more efficient security for service-to-service communication, authentication / authorization, encryption, traffic management and observability (such as automated tracing, monitoring, service logging). One of the earliest attempts to tackle this issue took place within Netflix, which in 2015 launched open-source runtime services and libraries that incorporated load balancing (‘Ribbon’), circuit-breaker applications (‘Hystrix’), microservices registry (‘Eureka’), and intelligent routing (‘Zuul’). These purpose-built, language-specific application libraries [6] were able to meet the challenges of early cloud-native microservices deployments, all while providing uniform runtime operations and managing heavy request traffic across service apps.
Other firms began to develop ‘fat client’ libraries, such as Google’s Stubby or Twitter’s Finagle, to achieve standardization across runtime operations and handle traffic requests. The early approach of developing bespoke libraries, each tailored around specific languages and frameworks for certain services, quickly capitulated across teams. As organizations simultaneously began to deploy heterogeneous and hybrid computing environments amidst the rise of the cloud and across polyglot systems with DevOps norms, ‘fat’ service-tailored libraries grew operationally complex and prohibitively expensive.
Then rose out-of-process proxies, a strong alternative to libraries, which allow for polyglot systems and cross-network upgrades (without requiring system recompilation). Moreover, proxy instance meshes impress DevOps norms by aligning runtime operations with engineers, who are empowered with new tools for communication versus developers, who are farther removed from its end functionality. Today, service mesh functions as a low-latency software infrastructure layer for controlling and monitoring internal, service-to-service traffic across microservice applications. Out-of-process network proxy instances, or sidecars, are deployed alongside application code via the data plane, while the control plane interacts with such proxies (more implementation details discussed in Service Mesh Architecture below).
A number of enterprises, such as Twitter, Airbnb and Lyft, built internal service mesh technology for their microservices architecture, while Microsoft was one of the first vendors to offer it via Azure Service Fabric Framework. In 2016, Buoyant released Linkerd using Twitter’s open-source work, which has since been running in production at a number of enterprises globally. Later in 2017, open-source Istio (adopted by the likes of Google, IBM, and Lyft) launched with the main objective of providing a service mesh framework for microservices running Kubernetes. Istio v1.0 [7] released in 2018 with Linkerd 2.0 out later that year; meanwhile Kubernetes vendors developed Istio-based offerings as client interest for service mesh grows.
[1] Measured as the time from when impairment occurs until the time it is resolved.
[3] Link: Cloud Microservices — Global Market Outlook
[4] Link: Infrastructure Automation Market
[5] Link: The History of Service Mesh
[6] Reference: IDC, Vendors…Service Mesh Landscape, 2
[7] Gartner, Innovation Insight for SM, 6.