Object Deletion Reference

In single-server, non-clustered data storage systems, object deletion is a trivial process. In an eventually consistent, clustered system like Riak, however, object deletion is far less trivial because objects live on multiple nodes, which means that a deletion process must be chosen to determine when an object can be removed from the storage backend.

Object Deletion Example Scenario

The problem of object deletion in Riak can be illustrated more concretely using the following example:

An object is stored on nodes A, B, and C
Node C suddenly goes offline
A Riak client sends a delete request to node A, which forwards that request to node B
On nodes A and B, the object is marked as deleted with a tombstone
Node C comes back online
The object has been marked as deleted on nodes A and B, but it still lives on node C
A client attempts to read the object, Riak senses that there are divergent replicas and initiates a repair process (either read repair or active anti-entropy, depending on configuration)

At this point, Riak needs to make a decision about what to do. Should node C be instructed to delete the object as well? Should nodes A and B be instructed to reinstate the object so that it lives on all three nodes again?

What happens in this scenario depends on how you have configured Riak to handle deletion. More on configuration can be found in the section below.

Tombstones

Riak addresses the problem of deletion in distributed systems by marking deleted objects with a so-called tombstone. This means that an X-Riak-Deleted metadata key is added to the object and given the value true, while the object itself is set to an empty Erlang object, i.e. <<>>.

When a delete request is sent to Riak, the following process is set in motion:

A tombstone object (<<>>) is written to N vnodes, with N defined by n_val
If all N vnodes store the tombstone, the object is removed
If fallback vnodes are in use, the object will not be immediately removed

Configuring Object Deletion

If step 3 in the process explained above is reached, the delete_mode setting in your configuration files will determine what happens next. This setting determines how long Riak will wait after identifying an object for deletion and actually removing the object from the storage backend.

There are three possible settings:

keep — Disables tombstone removal; protects against an edge case in which an object is deleted and recreated on the owning vnodes while a fallback is either down or awaiting handoff
immediate — The tombstone is removed as soon as the request is received
Custom time interval — How long to wait until the tombstone is removed, expressed in milliseconds. The default is 3000, i.e. to wait 3 seconds

In general, we recommend setting the delete_mode parameter to keep if you plan to delete and recreate objects under the same key frequently.

Setting delete_mode to immediate can be useful in situations in which an aggressive space reclamation process is necessary, such as when running MapReduce jobs, but we do not recommend this in general.

Setting delete_mode to a longer time duration than the default can be useful in certain edge cases involving Multi-Datacenter Replication, e.g. when network connectivity is an issue.

Please note that there is an edge case where tombstones will remain stored in the backend, even if the time interval-based delete_mode is used. This occurs if the node is stopped after a tombstone has been written but before it has been removed from the backend. In this case, the tombstone will show up in keylisting and MapReduce operations and will not disappear until you read the key, which has the effect of making Riak aware of the tombstone. In practice, if delete_modeis set to 10000, all keys that have been deleted during the last 10 seconds before a node is stopped will remain in the backend.

Client Library Examples

Check out Deleting Objects in the Developing section for examples of deleting objects client-side.

Resources

Discussion on the Riak mailing list