Handoff Reference
Riak is a distributed system built with two essential goals in mind:
- fault tolerance, whereby a Riak cluster can withstand node failure, network partitions, and other events in a way that does not disrupt normal functioning, and
- scalability, whereby operators can gracefully add and remove nodes to/from a Riak cluster
Both of these goals demand that Riak is able to either temporarily or permanently re-assign responsibility for portions of the keyspace. That re-assigning is referred to as intra-cluster handoff (or simply handoff in our documentation).
Types of Handoff
Intra-cluster handoff typically takes one of two forms: hinted handoff and ownership transfer.
Hinted handoff occurs when a vnode temporarily takes over responsibility for some data and then returns that data to its original “owner.” Imagine a 3-node cluster with nodes A, B, and C. If node C goes offline, e.g. during a network partition, nodes A and B will pick up the slack, so to speak, assuming responsibility for node C’s operations. When node C comes back online, responsibility will be handed back to the original vnodes.
Ownership transfer is different because it is meant to be permanent. It occurs when a vnode no longer belongs to the node on which it’s running. This typically happens when the very makeup of a cluster changes, e.g. when nodes are added or removed from the cluster. In this case, responsibility for portions of the keyspace needs to be fundamentally re-assigned.
Both types of handoff are handled automatically by Riak. Operators do have the option, however, of enabling and disabling handoff on particular nodes or all nodes and of configuring key aspects of Riak’s handoff behavior. More information can be found below.
Configuring Handoff
A full listing of configurable parameters can be found in our configuration files document. The sections below provide a more narrative description of handoff configuration.
SSL
If you want to encrypt handoff behavior within a Riak cluster, you need
to provide each node with appropriate paths for an SSL certfile (and
potentially a keyfile). The configuration below would designate a
certfile at /ssl_dir/cert.pem
and a keyfile at /ssl_dir/key.pem
:
handoff.ssl.certfile = /ssl_dir/cert.pem
handoff.ssl.keyfile = /ssl_dir/key.pem
{riak_core, [
%% Other configs
{handoff_ssl_options, [
{certfile, "/ssl_dir/cert.pem"},
{keyfile, "/ssl_dir/key.pem"}
]},
%% Other configs
]}
Port
You can set the port used by Riak for handoff-related interactions using
the handoff.port
parameter. The default is 8099. This would change the
port to 9000:
handoff.port = 9000
{riak_core, [
%% Other configs
{handoff_port, 9000},
%% Other configs
]}
Background Manager
Riak has an optional background manager that limits handoff activity in the name of saving resources. The manager can help prevent system response degradation during times of heavy load, when multiple background tasks may contend for the same system resources. The background manager is disabled by default. The following will enable it:
handoff.use_background_manager = on
{riak_kv, [
%% Other configs
{handoff_use_background_manager, on},
%% Other configs
]}
Maximum Rejects
If you’re using Riak features such as Riak Search, those subsystems can block handoff of primary key/value data, i.e. data that you interact with via normal reads and writes.
The handoff.max_rejects
setting enables you to set the maximum
duration that a vnode can be blocked by multiplying the
handoff.max_rejects
setting by the value of
vnode_management_timer
.
Thus, if you set handoff.max_rejects
to 10 and
vnode_management_timer
to 5 seconds (i.e. 5s
), non-K/V subsystems
can block K/V handoff for a maximum of 50 seconds. The default for
handoff.max_rejects
is 6, while the default for
vnode_management_timer
is 10s
. This would set max_rejects
to 10:
handoff.max_rejects = 10
{riak_kv, [
%% Other configs
{handoff_rejected_max, 10},
%% Other configs
]}
Transfer Limit
You can adjust the number of node-to-node transfers (which includes
handoff) using the transfer_limit
parameter. The default is 2. Setting
this higher will increase node-to-node communication but at the expense
of higher resource intensity. This would set transfer_limit
to 5:
transfer_limit = 5
{riak_core, [
%% Other configs
{handoff_concurrency, 5},
%% Other configs
]}
Enabling and Disabling Handoff
Handoff can be enabled and disabled in two ways: via configuration or on the command line.
Enabling and Disabling via Configuration
You can enable and disable both outbound and inbound handoff on a node
using the handoff.outbound
and handoff.inbound
settings,
respectively. Both are enabled by default. The following would disable
both:
handoff.outbound = off
handoff.inbound = off
{riak_core, [
%% Other configs
{disable_outbound_handoff, true},
{disable_inbound_handoff, true},
%% Other configs
]}
Enabling and Disabling Through the Command Line
Check out the Cluster Operations: Handoff for steps on enabling and disabling handoff via the command line.