Source URL: https://cloud.google.com/blog/products/storage-data-transfer/parallelstore-high-performance-file-service-for-hpc-and-ai-is-ga/
Source: Cloud Blog
Title: Parallelstore is now GA, fueling the next generation of AI and HPC workloads
Feedly Summary: Organizations use artificial intelligence (AI) and high-performance computing (HPC) applications to process massive datasets, run complex simulations, and train generative models with billions of parameters for diverse use cases such as LLMs, genomic analysis, quantitative analysis, or real-time sports analytics. These workloads place big performance demands on their storage systems, requiring high throughput and I/O performance that scales and that maintains sub-millisecond latencies, even when thousands of clients are concurrently reading and writing the same shared files.
To power these next-generation AI and HPC workloads, we announced Parallelstore at Google Cloud Next 2024, and today, we are excited to announce that it is now generally available. Built on the Distributed Asynchronous Object Storage (DAOS) architecture, Parallelstore combines a fully distributed metadata and key-value architecture to deliver high-performance throughput and IOPS.
Read on to learn how Parallelstore serves the needs of complex AI and HPC workloads, allowing you to maximize goodput and GPU/TPU utilization, programmatically move data in and out of Parallelstore, and provision Google Kubernetes Engine and Compute Engine resources.
Maximize goodput and GPU/TPU utilization
To overcome the performance limitations of traditional parallel file systems, Parallelstore uses a distributed metadata management system and a key-value store architecture. Parallelstore’s high-throughput parallel data access minimizes latency and I/O bottlenecks, and allows it to saturate the network bandwidth of individual compute clients. This efficient data delivery maximizes goodput to GPUs and TPUs, a critical factor for optimizing AI workload costs. Parallelstore can also provide continuous read/write access to thousands of VMs, GPUs and TPUs, satisfying modest-to-massive AI and HPC workload requirements.
For a 100 TiB deployment, the maximum Parallelstore deployment, throughput scales to ~115 GiB/s, ~3 million read IOPS, ~1 million write IOPS, and a low-latency of ~0.3 ms. This means that Parallelstore is also a good platform for small files and random, distributed access across a large number of clients. For AI use cases, Parallelstore’s performance with small files and metadata operations enables up to 3.9x faster training times and up to 3.7x higher training throughput compared to native ML framework data loaders, as measured by Google Cloud benchmarking.
Programmatically move data in and out of Parallelstore
Many AI and HPC workloads store data in Cloud Storage for data preparation or archiving. You can use Parallelstore’s integrated import/export API to automate movement of the data you’d like to import to Parallelstore for processing. With the API, you can ingest massive datasets from Cloud Storage into Parallelstore at ~20GB/s for files larger than 32MB, and at ~5,000 files per second for files under 32MB.
code_block
When an AI training job or HPC workload is complete, you can export results programmatically to Cloud Storage for further assessment or longer-term storage. You can also automate data transfers via the API, minimizing manual intervention and streamlining data pipelines.
code_block
<ListValue: [StructValue([(‘code’, ‘gcloud alpha parallelstore instances export-data $INSTANCE_ID –location=$LOCATION –destination-gcs-bucket-uri=gs://$BUCKET_NAME\r\n[–source-parallelstore-path="/"]’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ee2c43ac370>)])]>
Programmatically provision GKE resources through the CSI driver
It’s easy to efficiently manage high-performance storage for containerized workloads through Parallelstores’ GKE CSI driver. You can dynamically provision and manage Parallelstore file systems as persistent volumes or access existing Parallelstore instances in Kubernetes workloads, directly within your GKE clusters using familiar Kubernetes APIs. This reduces the need to learn and manage a separate storage system, so you can focus on optimizing resources and lowering TCO.
code_block
<ListValue: [StructValue([(‘code’, ‘apiVersion: storage.k8s.io/v1\r\nkind: StorageClass\r\nmetadata:\r\n name: parallelstore-class\r\nprovisioner: parallelstore.csi.storage.gke.io\r\nvolumeBindingMode: Immediate\r\nreclaimPolicy: Delete\r\nallowedTopologies:\r\n- matchLabelExpressions:\r\n – key: topology.gke.io/zone\r\n values:\r\n – us-central1-a’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ee2c43ac7f0>)])]>
In the coming months, you’ll be able to preload data from Cloud Storage via the fully managed GKE Volume Populator, which automates the preloading of data from Cloud Storage directly into Parallelstore during the PersistentVolumeClaim provisioning process. This helps ensure your training data is readily available, so you can minimize idle compute-resource time and maximize GPU and TPU utilization.
Programmatically provision Compute Engine resources with the Cluster Toolkit
It’s easy to deploy Parallelstore instances for Compute Engine with the support of the Cluster Toolkit. Formerly known as Cloud HPC Toolkit, Cluster Toolkit is open-source software for deploying HPC and AI workloads. Cluster Toolkit provisions compute, network, and storage resources for your cluster/workload following best practices. You can get started with Cluster Toolkit today by incorporating the Parallelstore module into your blueprint with only a four-line change in your blueprint; we also provide starter blueprints for your convenience. In addition to the Cluster Toolkit, there are also Terraform templates for deploying Parallelstore, supporting operations and provisioning processes through code and minimizing manual operational overhead.
code_block
<ListValue: [StructValue([(‘code’, ‘resource "google_parallelstore_instance" "instance" { \r\ninstance_id = "instance" \r\nlocation = "us-central1-a" \r\ndescription = "test instance" \r\ncapacity_gib = 12000 \r\nnetwork = google_compute_network.network.name \r\nfile_stripe_level = "FILE_STRIPE_LEVEL_MIN" \r\ndirectory_stripe_level = "DIRECTORY_STRIPE_LEVEL_MIN" \r\nlabels = { \r\ntest = "value" \r\n} \r\nprovider = google-beta \r\ndepends_on = [google_service_networking_connection.default] \r\n} \r\n\r\nresource "google_compute_network" "network" { \r\nname = "network" \r\nauto_create_subnetworks = true \r\nmtu = 8896 \r\nprovider = google-beta \r\n} \r\n\r\n# Create an IP address \r\nresource "google_compute_global_address" "private_ip_alloc" { \r\nname = "address" \r\npurpose = "VPC_PEERING" \r\naddress_type = "INTERNAL" \r\nprefix_length = 24 \r\nnetwork = google_compute_network.network.id \r\nprovider = google-beta \r\n} \r\n\r\n# Create a private connection \r\nresource "google_service_networking_connection" "default" { \r\nnetwork = google_compute_network.network.id \r\nservice = "servicenetworking.googleapis.com"\r\nreserved_peering_ranges = [google_compute_global_address.private_ip_alloc.name] \r\nprovider = google-beta \r\n}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3ee2c43ac3a0>)])]>
Real-world impact: Respo.vision sees more with Parallelstore
Respo.Vision, a leader in sports video analytics, is leveraging Parallelstore to accelerate an upgrade from 4K to 8K videos for their real-time system. By using Parallelstore as the transport layer, Respo.vision helps capture and label granular data markers, delivering actionable insights to coaches, scouts, and fans. With Parallelstore, Respo.vision avoided pricey infrastructure investments to manage surges of high-performance video processing, all while maintaining low compute latency.
“Our goal was to process 8K video streams at 25 frames per second to deliver richer quality sports analytical data to our customers, and Parallelstore exceeded expectations by effortlessly handling the required volume and delivering an impressive read latency of 0.3 ms. The integration into our system was remarkably smooth and thanks to its distributed nature, Parallelstore has significantly enhanced our system’s scalability and resilience.” – Wojtek Rosinski, CTO, Respo.vision
HPC and AI usage is growing rapidly. With its combination of innovative architecture, performance, and integration with Cloud Storage, GKE, and Compute Engine, Parallelstore is the storage solution you need to keep the demanding GPU/TPUs and workloads satisfied. To learn more about Parallelstore, check out the documentation, and reach out to your sales team for more information.
AI Summary and Description: Yes
Summary: The text discusses Google Cloud’s new Parallelstore solution, designed to enhance the performance and efficiency of artificial intelligence (AI) and high-performance computing (HPC) workloads. This innovative architecture offers significant improvements in throughput, IOPS, and data management capabilities, specifically addressing the needs of AI initiatives that require fast access to large datasets.
Detailed Description:
The introduction of Parallelstore marks a notable advancement in storage solutions tailored for AI and HPC workloads. Key elements of the offering include:
– **Architecture and Performance**:
– Built on the Distributed Asynchronous Object Storage (DAOS) architecture.
– Delivers high-performance throughput and IOPS, with the capability to handle large datasets efficiently.
– Scales throughput to ~115 GiB/s for 100 TiB deployments, and can achieve ~3 million read IOPS and ~1 million write IOPS at sub-millisecond latencies.
– **Impact on AI Workloads**:
– Enhances GPU/TPU utilization, optimizing costs and maximizing efficiency in training processes.
– Demonstrates up to 3.9x faster training times and 3.7x higher training throughput compared to native machine learning data loaders.
– **Data Management and Integration**:
– Features integrated APIs for programmatic data ingestion and export between Parallelstore and Cloud Storage, facilitating efficient data pipeline automation.
– Supports Kubernetes environments, allowing users to manage storage within familiar workflows through the GKE CSI driver.
– **Real-World Applications**:
– Respo.Vision is leveraging Parallelstore for scaling video processing capabilities, successfully transitioning from 4K to 8K video analytics with enhanced performance and low latency.
– **Future Developments**:
– In the coming months, a GKE Volume Populator will be introduced to automate data preloading from Cloud Storage into Parallelstore, optimizing readiness for AI workloads.
With HPC and AI demand surging, Parallelstore aims to provide a robust solution that not only meets current performance needs but also paves the way for future innovations in data processing and availability. This solution is particularly valuable for professionals in AI, cloud computing, and infrastructure security looking to enhance their data handling capabilities.