Cloud Blog: Etsy’s Service Platform on Cloud Run cuts deployment time from days to under an hour

Source URL: https://cloud.google.com/blog/products/infrastructure/etsys-service-platform-on-cloud-run-cuts-deployment-time-from-days-to-minutes/
Source: Cloud Blog
Title: Etsy’s Service Platform on Cloud Run cuts deployment time from days to under an hour

Feedly Summary: Introduction
Etsy, a leading ecommerce marketplace for handmade, vintage, and unique items has a passion for delivering innovative and seamless experiences for customers. Like many fast growing companies, Etsy needed to scale their teams, technologies, and tools to keep pace with their business growth. Indeed, between 2012 and 2021, their gross merchandise sales increased over 1400% to $13.5 billion. 
As part of Etsy’s efforts to keep pace with this growth, the company migrated all their infrastructure from traditional data centers to Google Cloud. This shift not only marked a significant technological milestone, but also prompted Etsy to rethink its service development approach. The journey led to the creation of “ESP” (“Etsy’s Service Platform”), an Etsy-tailored service platform running on Google Cloud Run, which is a customized platform built on Google Cloud Run that streamlines the development, deployment, and management of microservices. 
This blog post will delve into Etsy’s experience building the service platform, how Cloud Run helped them accomplish their vision, highlight lessons learned, and share how their platform continues to evolve.

aside_block
), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

The need for change and architectural vision
As Etsy grew, so did the demand for our engineering organization to support richer functionality and higher traffic volume in our marketplace. Our migration to GCP in 2018 enabled Etsy engineers to explore and leverage Google Cloud based service platforms, however this explosion of technical creativity also gave rise to some new challenges, including duplicated scaffolding and code, and unsupported infrastructure with uncertain ownership.
To address these challenges, Etsy assembled a squad of architects to craft a vision detailing what future service development at Etsy would look like. The goal was clear: create a platform that decouples service writing from infrastructure, liberating developers from the burden of backend complexities and allowing them to quickly and safely deploy new services.  
Transforming vision into reality
The resulting architectural vision became the blueprint for ESP,  Etsy’s Service Platform, and a newly formed squad was to take on the exciting challenge of transforming the Vision into reality. The first step was assembling a dynamic team capable of bridging the gap between infrastructure and application development. Comprising seasoned engineers with diverse expertise, the team brought a rich blend of skills to the table.
Recognizing the importance of aligning with our future platform customers, the team collaborated closely with Etsy architecture and engineering. The Ads Platform Team, already engaged in service development, played a pivotal role by agreeing to embed one of their senior engineers in the service platform team. Together, they delivered a Minimum Viable Platform (MVP) to support the deployment of a new Ads Platform service as the ESP pilot.
Choosing Cloud Run for accelerated development 
A successful service platform, according to our architectural vision, would streamline the developer experience by decoupling infrastructure and automating its provisioning. The team recognized that our potential customers from the larger engineering organization also needed a platform that integrated into their workflow with as little friction as possible. To achieve this, the service platform team chose to focus on Etsy-specific aspects: developer experience and language support, CI/CD, integration with existing services, observability, service catalog, security, and compliance.
The decision to leverage Google Cloud services, especially Cloud Run, was strategic. While alternatives like GKE were enticing, the team wanted to deliver value quickly. Cloud Run’s robust and intuitive design allowed the team to focus on core platform functionality, letting Cloud Run handle the more complex and time-consuming aspects of running containerized services.
The Toolbox: A Closer Look

To provide a consistent and efficient developer and operational experience, ESP relies on a carefully selected toolbox:

Developer Interface: A custom CLI tool for streamlined developer interactions.

Protocols: gRPC and protobuf for standardized communication.

Language Support: Go, Python, Node, PHP, Java, Scala.

CI/CD: GitHub Actions for a smooth integration and deployment pipeline.

Observability: Leveraging OTEL on Google Cloud services and Google Monitoring and Logging, along with Prometheus and AlertManager

Client Library: ESP generated clients are registered in Artifactory

Service Catalog: Utilizing Backstage for centralized service visibility.

Runtime: Cloud Run, chosen for its simplicity and compatibility.

Navigating Challenges
The path to creating the service platform encountered obstacles. The VPC connector experienced overloading, and some services required fine-tuning to optimize resource allocation. However, these challenges led to platform-level improvements that benefit future adopters.
ESP’s design prioritized flexibility to accommodate our diverse technology landscape. While the team possessed expertise in various technologies, creating a one-size-fits-all platform supporting multiple service and client languages across diverse use cases was challenging. We decided to initially focus on a core feature set and add incremental capabilities and workarounds based on user feedback.
As ESP matured, valuable lessons shaped both day-to-day operations and its future evolution.

Sandbox Feature: A “sandbox" environment accelerated iteration, enabling developers to launch development versions of new services on Cloud Run in under five minutes, complete with CI/CD and observability.

Familiar Observability Tools: ESP integrated with our existing tools like promQL and Grafana, streamlining workflows for engineers.

Security Considerations: While ESP favored TLS and layer 7 authentication using Google IAM, collaboration with the Google Serverless Networking team ensured secure connectivity with our legacy applications.

Supporting AI/ML Innovation: During a company-wide hackathon, ESP’s adaptability shone as a service interfacing with Google’s Vertex AI was rapidly deployed.

Real-World Success: The Ads Platform service expanded to three additional systems as client support in more languages rolled out. Cloud Run’s auto-scaling easily handled the increased load.

Conclusion and Future Outlook
ESP enables our engineers to be bold, fast, and safe, and is experiencing steady and  continued adoption throughout the organization. Customer requests for workloads beyond the serverless model have spurred collaboration with Google and our internal GKE team. The goal is to extend ESP’s tooling to support an expanding class of services while maintaining a consistently high level of operational and developer experience.
The journey to pilot, challenges overcome, and future outlook highlight the dynamic and iterative nature of our service platform journey. ESP stands as a testament to our ability to adapt, innovate, and empower Etsy’s  engineering community to meet the ever-growing needs of our marketplace and business.

AI Summary and Description: Yes

**Summary:**
The text details Etsy’s migration from traditional data centers to Google Cloud, leading to the development of Etsy’s Service Platform (ESP). This platform is designed to enhance service development efficiency and agility, addressing the challenges associated with scaling operations. Insights include the strategic choice of Google Cloud Run for container management and collaboration across engineering teams to ensure a smooth integration and observability.

**Detailed Description:**
Etsy’s evolution highlights key advancements in cloud computing infrastructure and service platform development. Below are the significant points raised in the text:

– **Growth and Migration:**
– Etsy’s transition to Google Cloud was driven by exponential business growth, reflecting a shift from traditional data centers to a more scalable cloud solution.
– The company’s gross merchandise sales saw a staggering increase of over 1400% from 2012 to 2021, necessitating a robust infrastructure to support higher traffic.

– **Service Platform Development:**
– The ESP was created to decouple service writing from infrastructure management, allowing developers to focus on service deployment without backend complexities.
– A dedicated team of architects was formed to turn this vision into reality, emphasizing collaboration with existing engineering teams.

– **Adoption of Google Cloud Run:**
– Cloud Run was selected for its capacity to facilitate rapid service deployment and robust developer experience, partially due to its simplicity in handling containerized applications.
– The decision to utilize Cloud Run over alternatives like Google Kubernetes Engine (GKE) underscores the platform’s immediate value proposition.

– **Toolbox Features:**
– A comprehensive toolkit was put into place including:
– Custom CLI tools for developers.
– Standardized communication protocols (gRPC and protobuf).
– Support for multiple programming languages (Go, Python, Node, etc.).
– CI/CD integration with GitHub Actions.
– Observability tools like Prometheus and Google Monitoring.

– **Challenges and Adaptation:**
– Early challenges involved overloading issues and the need for fine-tuning services, leading to improvements in platform design.
– The “sandbox” feature allowed for rapid iteration, enhancing the developer experience and speeding up service launch times.

– **Security Considerations:**
– Emphasis was placed on incorporating secure practices, such as TLS and Google IAM for access control, showcasing the importance of security in cloud environments.
– Collaborations with specialized teams (e.g., Google Serverless Networking) ensured secure legacy system connectivity.

– **Future Directions:**
– The platform shows promise for further integration of AI/ML services, with a successful deployment of services connecting to Google’s Vertex AI.
– Continued adaptation of ESP is anticipated to meet evolving customer needs beyond its current serverless model.

Overall, Etsy’s experience provides valuable insights into cloud migration strategies, service platform development, and security considerations essential for scaling modern infrastructure. These elements are critical for professionals focused on enhancing operational efficiency and maintaining security in growing organizations.