Source URL: https://simonwillison.net/2024/Nov/26/s3-conditional-writes/#atom-everything
Source: Simon Willison’s Weblog
Title: Amazon S3 adds new functionality for conditional writes
Feedly Summary: Amazon S3 adds new functionality for conditional writes
Amazon S3 can now perform conditional writes that evaluate if an object is unmodified before updating it. This helps you coordinate simultaneous writes to the same object and prevents multiple concurrent writers from unintentionally overwriting the object without knowing the state of its content. You can use this capability by providing the ETag of an object […]
This new conditional header can help improve the efficiency of your large-scale analytics, distributed machine learning, and other highly parallelized workloads by reliably offloading compare and swap operations to S3.
(Both Azure Blob Storage and Google Cloud have this feature already.)
When AWS added conditional write support just for if an object with that key exists or not back in August I wrote about Gunnar Morling’s trick for Leader Election With S3 Conditional Writes. This new capability opens up a whole set of new patterns for implementing distributed locking systems along those lines.
Here’s a useful illustrative example by lxgr on Hacker News:
As a (horribly inefficient, in case of non-trivial write contention) toy example, you could use S3 as a lock-free concurrent SQLite storage backend: Reads work as expected by fetching the entire database and satisfying the operation locally; writes work like this:
Download the current database copy
Perform your write locally
Upload it back using “Put-If-Match" and the pre-edit copy as the matched object.
If you get success, consider the transaction successful.
If you get failure, go back to step 1 and try again.
AWS also just added the ability to enforce conditional writes in bucket policies:
To enforce conditional write operations, you can now use s3:if-none-match or s3:if-match condition keys to write a bucket policy that mandates the use of HTTP if-none-match or HTTP if-match conditional headers in S3 PutObject and CompleteMultipartUpload API requests. With this bucket policy in place, any attempt to write an object to your bucket without the required conditional header will be rejected.
Via Hacker News
Tags: s3, scaling, aws, architecture
AI Summary and Description: Yes
Summary: Amazon S3’s introduction of conditional writes enhances the management of simultaneous write operations, improving data integrity and efficiency, particularly for large-scale applications. This functionality aligns it more closely with existing features in competitors like Azure Blob Storage and Google Cloud, and opens new avenues for implementing distributed systems and analytics in a more secure manner.
Detailed Description: The new functionality of conditional writes in Amazon S3 allows users to perform updates on an object only if certain conditions are met, specifically evaluating whether an object has been modified. This feature is crucial for environments where multiple writers may attempt to update the same object simultaneously, thereby reducing the risk of overwrites without knowledge of the current state of the content. Here are the key takeaways:
– **Concurrent Writes Handling**:
– Prevents unintentional overwriting of objects in S3 when multiple users or processes attempt to make updates.
– Helps maintain data integrity by ensuring that updates only occur if no modifications have been made since the last read.
– **Enhanced Operational Efficiency**:
– Supports large-scale analytics and parallelized workloads by offloading operations such as compare and swap to S3.
– This is especially relevant in distributed machine learning tasks where data consistency is critical.
– **Distributed Locking Systems**:
– The conditional write capability encourages new design patterns in distributed computing, particularly for implementing locking mechanisms.
– An example discussed involves using S3 as a concurrent SQLite storage backend, showcasing how local writes and uploads can be managed effectively.
– **Vaulting Resilience through Bucket Policies**:
– AWS now supports conditional write operations within its bucket policies, allowing administrators to enforce that HTTP conditional headers must be included in specific API requests.
– This adds a layer of security and governance, safeguarding against unauthorized or unintended write operations to buckets.
– **Comparative Context**:
– The addition of this feature highlights the competition among cloud service providers, as both Azure Blob Storage and Google Cloud have long offered similar functionalities.
This new feature signifies a major step forward in enhancing data management practices in cloud environments, making it especially relevant for professionals in AI and cloud-based applications who are keen to implement robust data handling mechanisms in their workflows.