Although latency is unavoidable in distributed systems like Riak, there are a number of actions that can be undertaken to reduce latency to the lowest levels possible within a cluster. In this guide, we’ll list potential sources of high latency and what you can do about it.
Riak always performs best with smaller objects. Large objects, which can be mistakenly inserted into Riak by your application or caused by siblings (see below), can often increase latency.
We recommend keeping all objects stored in Riak smaller than 1-2 MB, preferably below 100 KB. Large objects lead to increased I/O activity and can put strain on memory resources. In some cases, just a few large objects can impact latency in a cluster, even for requests that are unrelated to those objects.
If your use case requires large objects, we recommend checking out Riak CS, which is intended as a storage system for large objects.
The best way to find out if large objects are impacting latency is to
monitor each node’s object size stats. If you run
riak-admin status or make an HTTP
/stats endpoint, you will see the results for the following
metrics related to object size, all of which are calculated only for
GET operations (i.e. reads):
||The mean object size encountered by this node in the last minute|
||The median object size encountered by this node in the last minute|
||The 95th-percentile object size encountered by this node in the last minute|
||The 99th-percentile object size encountered by this node in the last minute|
||The 100th-percentile object size encountered by this node in the last minute|
median measurements may not be good indicators,
especially if you’re storing billions of keys. Instead, you should be on
the lookout for trends in the
- Is there an upward trend?
- Do the metrics indicate that there are outliers?
- Do these trends coincide with increased latency?
If you suspect that large object size is impacting latency, try making the following changes to each node’s configuration:
- If you are using the newer,
riak.conf-based configuration system, the commented-out value for
32MB. Uncomment this setting and re-start your node.
- If you are using the older,
vm.args-based configuration system, try increasing the
32768or higher (measured in kilobytes). This increases the size of the distributed Erlang buffer from its default of 1024 KB. Re-start your node when configuration changes have been made.
Large objects can also impact latency even if they’re only present on
some nodes. If increased latency occurs only on N nodes, where N is your
replication factor, also known as
n_val, this could indicate that a single large object and its replicas are slowing down all requests on those nodes.
If large objects are suspected, you should also audit the behavior of siblings in your cluster, as explained in the next section.
In Riak, object conflicts are handled by keeping multiple versions of the object in the cluster either until a client takes action to resolve the conflict or until active anti-entropy resolves the conflict without client intervention. While sibling production is normal, sibling explosion is a problem that can come about if many siblings of an object are produced. The negative effects are the same as those associated with large objects.
The best way to monitor siblings is through the same
riak-admin status interface used to monitor
object size (or via an HTTP
GET request to
/stats). In the output of
riak-admin status in each node, you’ll see the following
||The mean number of siblings encountered during all GET operations by this node within the last minute|
||The median number of siblings encountered during all GET operations by this node within the last minute|
||The 95th percentile of the number of siblings encountered during all GET operations by this node within the last minute|
||The 99th percentile of the number of siblings encountered during all GET operations by this node within the last minute|
||The 100th percentile of the number of siblings encountered during all GET operations by this node within the last minute|
Is there an upward trend in these statistics over time? Are there any large outliers? Do these trends correspond to your observed latency spikes?
If you believe that sibling creation problems could be responsible for latency issues in your cluster, you can start by checking the following:
allow_multis set to
truefor some or all of your buckets, be sure that your application is correctly resolving siblings. Be sure to read our documentation on conflict resolution for a fuller picture of how this can be done. Note: In Riak versions 2.0 and later,
allow_multis set to
trueby default for all bucket types that you create and activate. If you wish to set
falseon a bucket type, you will have to do so explicitly.
- Application errors are a common source of problems with
siblings. Updating the same key over and over without passing a
causal context to Riak can cause sibling explosion. If this seems to be the issue, modify your application’s conflict resolution
strategy. Another possibility worth exploring is using dotted version vectors (DVVs) in place of traditional vector clocks. DVVs can be enabled using bucket types by setting the
truefor buckets that seem to be experiencing sibling explosion.
Compaction and Merging
The Bitcask and LevelDB storage backends occasionally go through heavily I/O-intensive compaction phases during which they remove deleted data and reorganize data files on disk. During these phases, affected nodes may be slower to respond to requests than other nodes. If your cluster is using one or both of these backends, there are steps that can be taken to monitor and address latency issues.
To determine whether compaction and merging cycles align with increased
latency, keep an eye on on your
console.log files (and LevelDB
files if you’re using LevelDB). Do Bitcask merging and/or LevelDB
compaction events overlap with increased latencies?
If so, our first recommendation is to examine your replication properties to make sure that neither R nor W are set to N, i.e. that you’re not requiring that reads or writes go to all nodes in the cluster. The problem with setting
W=N is that any request will only respond as quickly as the slowest node amongst the N nodes involved in the request.
Beyond checking for
W=N for requests, the recommended
mitigation strategy depends on the backend:
With Bitcask, it’s recommended that you:
- Limit merging to off-peak hours to decrease the effect of merging cycles on node traffic
- Stagger merge windows between nodes so that no more than one node is undergoing a merge phase at any given time
Instructions on how to accomplish both can be found in our guide to tuning Bitcask.
It’s also important that you adjust your maximum file size and merge
threshold settings appropriately. This setting is labeled
bitcask.max_file_size in the newer,
riak.conf-based configuration files and
max_file_size in the older,
Setting the maximum file size lower will cause Bitcask to merge more often (with less I/O churn), while setting it higher will induce less frequent merges with more I/O churn. To find settings that are ideal for your use case, we recommend checking out our guide to configuring Bitcask.
The more files you keep in memory, the faster LevelDB will perform in general. To make sure that you are using your system resources appropriately with LevelDB, check out our guide to LevelDB parameter planning.
While a number of latency-related problems can manifest themselves in development and testing environments, some performance limits only become clear in production environments.
If you suspect that OS-level issues might be impacting latency, it might be worthwhile to revisit your OS-specific configurations. The following guides may be of help:
- Open files limit
- General System performance tuning
- AWS performance tuning if you’re running Riak on Amazon Web Services
I/O and Network Bottlenecks
Riak is a heavily I/O- and network resource-intensive system. Bottlenecks on either front can lead to undue latency in your cluster. We recommend an active monitoring strategy to detect problems immediately when they arise.
To diagnose potential overloads, Riak versions 1.3.2 and later come equipped with an overload protection feature designed to prevent cascading failures in overly busy nodes. This feature limits the number of GET and PUT finite state machines (FSMs) that can exist simultaneously on a single Riak node. Increased latency can result if a node is frequently running up against these maximums.
node_get_fsm_active_60sto get an idea of how many operations your nodes are coordinating. If you see non-zero values in
node_get_fsm_rejected_60s, that means that some of your requests are being discarded due to overload protection.
- The FSM limits can be increased, but disabling overload protection entirely is not recommended. More details on these settings are available in the release notes for Riak version 1.3.
In versions 2.0 and later, Riak enables you to configure a variety of settings regarding Riak objects, including allowable object sizes, how many siblings to allow, and so on. If you suspect that undue latency in your cluster stems from object size or related factors, you may consider adjusting these settings.
A concise listing of object-related settings can be found in the Riak configuration documentation. The sections below explain these settings in detail.
Note on configuration files in 2.0
The object settings listed below are only available using the new system for configuration files in Riak 2.0. If you are using the older,
app.config-based system, you will not have access to these settings.
As stated above, we recommend always keeping objects below 1-2 MB
and preferably below 100 KB if possible. If you want to ensure that
objects above a certain size do not get stored in Riak, you can do so by
object.size.maximum parameter lower than the default of
50MB, which is far above the ideal object size. If you set this
parameter to, say,
1MB and attempt to store a 2 MB object, the write
will fail and an error message will be returned to the client.
You can also set an object size threshold past which a write will
succeed but will register a warning in the logs, you can adjust the
object.size.warning_threshold parameter. The default is
Sibling Explosion Management
In order to prevent or cut down on sibling explosion, you can either prevent Riak from storing
additional siblings when a specified sibling count is reached or set a
warning threshold past which Riak logs an error (or both). This can be
done using the
object.siblings.warning_threshold settings. The default maximum is 100
and the default warning threshold is 25.
Object Storage Format
There are currently two possible binary representations for objects stored in Riak:
- Erlang’s native
term_to_binaryformat, which tends to have a higher space overhead
- A newer, Riak-specific format developed for more compact storage of smaller values
You can set the object storage format using the
0 selects Erlang’s
term_to_binary format while
default) selects the Riak-specific format.