Tag: autoscaling
-
Cloud Blog: PyTorch/XLA 2.5: vLLM support and an improved developer experience
Source URL: https://cloud.google.com/blog/products/ai-machine-learning/whats-new-with-pytorchxla-2-5/ Source: Cloud Blog Title: PyTorch/XLA 2.5: vLLM support and an improved developer experience Feedly Summary: Machine learning engineers are bullish on PyTorch/XLA, a Python package that uses the XLA deep learning compiler to connect the PyTorch deep learning framework and Cloud TPUs. And now, PyTorch/XLA 2.5 is here, along with a set…
-
Cloud Blog: Save on GPUs: Smarter autoscaling for your GKE inferencing workloads
Source URL: https://cloud.google.com/blog/products/containers-kubernetes/tuning-the-gke-hpa-to-run-inference-on-gpus/ Source: Cloud Blog Title: Save on GPUs: Smarter autoscaling for your GKE inferencing workloads Feedly Summary: While LLM models deliver immense value for an increasing number of use cases, running LLM inference workloads can be costly. If you’re taking advantage of the latest open models and infrastructure, autoscaling can help you optimize…
-
Cloud Blog: How to benchmark application performance from the user’s perspective
Source URL: https://cloud.google.com/blog/products/containers-kubernetes/benchmarking-how-end-users-perceive-an-applications-performance/ Source: Cloud Blog Title: How to benchmark application performance from the user’s perspective Feedly Summary: What kind of performance does your application have, and how do you know? More to the point, what kind of performance do your end users think your application has? In this era of rapid growth and unpredictable…
-
Cloud Blog: Cut costs and boost efficiency with Dataflow’s new custom source reads
Source URL: https://cloud.google.com/blog/products/data-analytics/cut-costs-and-boost-efficiency-with-dataflows-new-source-reads/ Source: Cloud Blog Title: Cut costs and boost efficiency with Dataflow’s new custom source reads Feedly Summary: Scaling workloads often comes with a hefty price tag, especially in streaming environments, where latency is heavily scrutinized. So it makes sense we want our pipelines to run without bottlenecks — because costs and latency…
-
Cloud Blog: Your infrastructure resources, your way, with new GKE custom compute class API
Source URL: https://cloud.google.com/blog/products/containers-kubernetes/introducing-new-gke-custom-compute-class-api/ Source: Cloud Blog Title: Your infrastructure resources, your way, with new GKE custom compute class API Feedly Summary: Picture this: You’re in the middle of your peak sales period and your e-commerce platform is humming along. Despite surging customer demand, your Kubernetes infrastructure is seamlessly adapting to handle the increased traffic. That’s…