Write Once

Riak 2.1.0 introduces the concept of write-once buckets, buckets whose entries are intended to be written exactly once and never updated or overwritten. Buckets of this type circumvent the normal “coordinated PUT” path, which would otherwise result in a read on the coordinating vnode before the write. Avoiding coordinated PUTs results in higher throughput and lower PUT latency, though at the cost of different semantics in the degenerate case of sibling resolution.

Write-once buckets do not support Riak commit hooks. Because Riak objects are inserted into the realtime queue using a postcommit hook, realtime replication is unavailable for write-once buckets. Fullsync replication will, however, replicate the data.

Configuration

When the new write_once bucket type parameter is set to true, buckets of type will treat all key/value entries as semantically “write once;” once written, entries should not be modified or overwritten by the user.

The write_once property is a boolean property applied to a bucket type and may only be set at bucket creation time. Once a bucket type has been set with this property and activated, the write_once property may not be modified.

The write_once property is incompatible with Riak data types and strong consistency, This means that if you attempt to create a bucket type with the write_once property set to true, any attempt to set the datatype parameter or to set the consistent parameter to true will fail.

The write_once property may not be set on the default bucket type, and may not be set on individual buckets. If you set the lww or allow_mult parameters on a write-once bucket type, those settings will be ignored, as sibling values are disallowed by default.

The following example shows how to configure a bucket type with the write_once property:

riak-admin bucket-type create my-bucket-type '{"props": {"write_once": true}}'
# my-bucket-type created

riak-admin bucket-type activate my-bucket-type
# my-bucket-type has been activated

riak-admin bucket-type status my-bucket-type
# my-bucket-type is active
...
write_once: true
...

Quorum

The write path used by write-once buckets supports the w, pw, and dw configuration values. However, if dw is specified, then the value of w is taken to be the maximum of the w and dw values. For example, for an n_val of 3, if dw is set to all, then w will be 3.

This write additionally supports the sloppy_quorum property. If set to false, only primary nodes will be selected for calculation of write quorum nodes.

Runtime

The write-once path circumvents the normal coordinated PUT code path, and instead sends write requests directly to all vnodes (or vnode proxies) in the effective preference list for the write operation.

In place of the put_fsm used in the normal path, we introduce a collection of new intermediate worker processes (implementing gen_server behavior). The role of these intermediate processes is to dispatch put requests to vnode or vnode proxies in the preflist and to aggregate replies. Unlike the put_fsm, the write-once workers are long-lived for the lifecycle of the riak_kv application. They are therefore stateful and store request state in a state- local dictionary.

The relationship between the riak_client, write-once workers, and vnode proxies is illustrated in the following diagram:


Write Once

Client Impacts

Since the write-once code path is optimized for writes of data that will not be updated and therefore may potentially issue asynchronous writes, some client features might not work as expected. For example, PUT requests asking for the object to be returned will behave like requests that do not request the object to be returned when they are performed against write-once buckets.

Siblings

As mentioned, entries in write-once buckets are intended to be written only once—users who are not abusing the semantics of the bucket type should not be updating or over-writing entries in buckets of this type. However, it is possible for users to misuse the API, accidentally or otherwise, which might result in incomparable entries for the same key.

In the case of siblings, write-once buckets will resolve the conflict by choosing the “least” entry, where sibling ordering is based on a deterministic SHA-1 hash of the objects. While this algorithm is repeatable and deterministic at the database level, it will have the appearance to the user of “random write wins.”

As mentioned in Configuration, write-once buckets and Riak Data Types are incompatible because of this.

Handoff

The write-once path supports handoff scenarios, such that if a handoff occurs during PUTs in a write-once bucket, the values that have been written will be handed off to the newly added Riak node.

Asynchronous Writes

For backends that support asynchronous writes, the write-once path will dispatch a write request to the backend and handle the response asynchronously. This behavior allows the vnode to free itself for other work instead of waiting on the write response from the backend.

At the time of writing, the only backend that supports asynchronous writes is LevelDB. Riak will automatically fall back to synchronous writes with all other backends.

Note on the multi backend

The Multi backend does not support asynchronous writes. Therefore, if LevelDB is used with the Multi backend, it will be used in synchronous mode.