V2 Multi-Datacenter Replication: Quickstart

The Riak Multi-Datacenter Replication Quick Start will walk you through the process of configuring Riak’s version 2 Replication to perform replication between two sample Riak clusters in separate networks. This guide will also cover bidirectional replication, which is accomplished by setting up unidirectional replication in both directions between the clusters.

Prerequisites

This Guide assumes that you have completed the following steps:

Installing Riak Enterprise
[Performing system tuning|System Performance Tuning][perf index]
[Reviewing configuration][config v2 mdc]

Scenario

Configure Riak MDC to perform replication, given the following 3-node Riak Enterprise clusters:

Cluster 1

Name	IP	Node name
`node1`	`172.16.1.11`	`riak@172.16.1.11`
`node2`	`172.16.1.12`	`riak@172.16.1.12`
`node3`	`172.16.1.13`	`riak@172.16.1.13`

Cluster 2

Name	IP	Node name
`node4`	`192.168.1.21`	`riak@192.168.1.21`
`node5`	`192.168.1.22`	`riak@192.168.1.22`
`node6`	`192.168.1.23`	`riak@192.168.1.23`

Note: The addresses used in these example clusters are contrived, non-routable addresses. In real-world applications, however, these addresses would need to be routable over the public Internet.

Set Up Cluster1 → Cluster2 Replication

Set Up the Listeners on Cluster1 (Source cluster)

On a node in Cluster1, node1 for example, identify the nodes that will be listening to connections from replication clients with riak-repl add-listener <nodename> <listen_ip> <port> for each node that will be listening for replication clients.

riak-repl add-listener riak@172.16.1.11 172.16.1.11 9010
riak-repl add-listener riak@172.16.1.12 172.16.1.12 9010
riak-repl add-listener riak@172.16.1.13 172.16.1.13 9010

Set Up the Site on Cluster2 (Site cluster)

On a node in Cluster2, node4 for example, inform the replication clients where the Source Listeners are located with riak-repl add-site <ipaddr> <port> <sitename>. Use the IP address(es) and port(s) you configured in the earlier step. For sitename enter Cluster1.

riak-repl add-site 172.16.1.11 9010 Cluster1

Note: While a Listener needs to be added to each node, only a single Site needs to be added on the Site cluster. Once connected to the Source cluster, it will get the locations of the rest of the Listeners in the Source cluster.

Verify the Replication Configuration

Verify the replication configuration using riak-repl status on both a Cluster1 node and a Cluster2 node. A full description of the riak-repl status command’s output can be found in the documentation for riak-repl’s [status output][cluster ops v2 mdc#status].

On the Cluster1 node, verify that there are listener_<nodename>s for each listening node, and that leader and server_stats are populated. They should look similar to the following:

listener_riak@172.16.1.11: "172.16.1.11:9010"
listener_riak@172.16.1.12: "172.16.1.12:9010"
listener_riak@172.16.1.13: "172.16.1.13:9010"
leader: 'riak@172.16.1.11'
server_stats: [{<8051.3939.0>,
               {message_queue_len,0},
               {status,[{site,"Cluster2"},
                        {strategy,riak_repl_keylist_server},
                        {fullsync_worker,<8051.3940.0>},
                        {dropped_count,0},
                        {queue_length,0},
                        {queue_byte_size,0},
                        {state,wait_for_partition}]}}]

On the Cluster2 node, verify that Cluster1_ips, leader, and client_stats are populated. They should look similar to the following:

Cluster1_ips: "172.16.1.11:9010, 172.16.1.12:9010, 172.16.1.13:9010"
leader: 'riak@192.168.1.21'
client_stats: [{<8051.3902.0>,
               {message_queue_len,0},
               {status,[{site,"Cluster1"},
                        {strategy,riak_repl_keylist_client},
                        {fullsync_worker,<8051.3909.0>},
                        {put_pool_size,5},
                        {connected,"172.16.1.11",9010},
                        {state,wait_for_fullsync}]}}]

Testing Realtime Replication

That’s all there is to it! When PUT requests are coordinated by Cluster1, these operations will be replicated to Cluster2.

You can use the following example script to verify that PUT operations sent to Cluster1 are being replicated to Cluster2:

#!/bin/bash

VALUE=`date`
CLUSTER_1_IP=172.16.1.11
CLUSTER_2_IP=192.168.1.21
 
curl -s -X PUT -d "${VALUE}" http://${CLUSTER_1_IP}:8098/riak/replCheck/c1

CHECKPUT_C1=`curl -s http://${CLUSTER_1_IP}:8098/riak/replCheck/c1`

if [ "${VALUE}" = "${CHECKPUT_C1}" ]; then
  echo "C1 PUT Successful"
else
  echo "C1 PUT Failed"
  exit 1
fi

CHECKREPL_C1_TO_C2=`curl -s http://${CLUSTER_2_IP}:8098/riak/replCheck/c1`

if [ "${VALUE}" = "${CHECKREPL_C1_TO_C2}" ]; then
  echo "C1 to C2 consistent"
else
  echo "C1 to C2 inconsistent
        C1:${CHECKPUT_C1}
        C2:${CHECKREPL_C1_TO_C2}"
  exit 1
fi

exit 0

You will have to change some of the above variables for your own environment, such as IP addresses or ports.

If you run this script and things are working as expected, you will get the following output:

C1 PUT Successful
C1 to C2 consistent

Set Up Cluster2 → Cluster1 Replication

About Bidirectional Replication

Multi-Datacenter support can also be configured to replicate in both directions, ensuring eventual consistency between your two datacenters. Setting up bidirectional replication is as simple as repeating the steps above in the other direction, i.e. from Cluster2 to Cluster1.

Set Up the Listeners on Cluster2 (Source cluster)

On a node in Cluster2, node4 for example, identify the nodes that will be listening to connections from replication clients with riak-repl add-listener <nodename> <listen_ip> <port> for each node that will be listening for replication clients.

riak-repl add-listener riak@192.168.1.21 192.168.1.21 9010
riak-repl add-listener riak@192.168.1.22 192.168.1.22 9010
riak-repl add-listener riak@192.168.1.23 192.168.1.23 9010

Set Up the Site on Cluster1 (Site cluster)

On a node in Cluster1, node1 for example, inform the replication clients where the Source Listeners are with riak-repl add-site <ipaddr> <port> <sitename>. Use the IP address(es) and port(s) you configured in the earlier step. For sitename enter Cluster2.

riak-repl add-site 192.168.1.21 9010 Cluster2

Verify the Replication Configuration

Verify the replication configuration using riak-repl status on a Cluster1 node and a Cluster2 node. A full description of the riak-repl status command’s output can be found in the documentation for riak-repl’s [status output][cluster ops v2 mdc#status].

On the Cluster1 node, verify that Cluster2_ips, leader, and client_stats are populated. They should look similar to the following:

Cluster2_ips: "192.168.1.21:9010, 192.168.1.22:9010, 192.168.1.23:9010"
leader: 'riak@172.16.1.11'
client_stats: [{<8051.3902.0>,
               {message_queue_len,0},
               {status,[{site,"Cluster2"},
                        {strategy,riak_repl_keylist_client},
                        {fullsync_worker,<8051.3909.0>},
                        {put_pool_size,5},
                        {connected,"192.168.1.21",9010},
                        {state,wait_for_fullsync}]}}]

On the Cluster2 node, verify that there are listener entries for each listening node, and that leader and server_stats are populated. They should look similar to the following:

listener_riak@192.168.1.21: "192.168.1.21:9010"
listener_riak@192.168.1.22: "192.168.1.22:9010"
listener_riak@192.168.1.23: "192.168.1.23:9010"
leader: 'riak@192.168.1.21'
server_stats: [{<8051.3939.0>,
               {message_queue_len,0},
               {status,[{site,"Cluster1"},
                        {strategy,riak_repl_keylist_server},
                        {fullsync_worker,<8051.3940.0>},
                        {dropped_count,0},
                        {queue_length,0},
                        {queue_byte_size,0},
                        {state,wait_for_partition}]}}]

Testing Realtime Replication

You can use the following script to perform PUTs and GETs on both sides of the replication and verify that those changes are replicated to the other side.

#!/bin/bash

VALUE=`date`
CLUSTER_1_IP=172.16.1.11
CLUSTER_2_IP=192.168.1.21
 
curl -s -X PUT -d "${VALUE}" http://${CLUSTER_1_IP}:8098/riak/replCheck/c1

CHECKPUT_C1=`curl -s http://${CLUSTER_1_IP}:8098/riak/replCheck/c1`

if [ "${VALUE}" = "${CHECKPUT_C1}" ]; then
  echo "C1 PUT Successful"
else
  echo "C1 PUT Failed"
  exit 1
fi

curl -s -X PUT -d "${VALUE}" http://${CLUSTER_2_IP}:8098/riak/replCheck/c2
CHECKPUT_C2=`curl -s http://${CLUSTER_2_IP}:8098/riak/replCheck/c2`

if [ "${VALUE}" = "${CHECKPUT_C2}" ]; then
  echo "C2 PUT Successful"
else
  echo "C2 PUT Failed"
  exit 1
fi

CHECKREPL_C1_TO_C2=`curl -s http://${CLUSTER_2_IP}:8098/riak/replCheck/c1`
CHECKREPL_C2_TO_C1=`curl -s http://${CLUSTER_1_IP}:8098/riak/replCheck/c2`

if [ "${VALUE}" = "${CHECKREPL_C1_TO_C2}" ]; then
  echo "C1 to C2 consistent"
else
  echo "C1 to C2 inconsistent
        C1:${CHECKPUT_C1}
        C2:${CHECKREPL_C1_TO_C2}"
  exit 1
fi

if [ "${VALUE}" = "${CHECKREPL_C2_TO_C1}" ]; then
  echo "C2 to C1 consistent"
else
  echo "C2 to C1 inconsistent
      C2:${CHECKPUT_C2}
      C1:${CHECKREPL_C2_TO_C1}"
  exit 1
fi

exit 0

You will have to change some of the above variables for your own environment, such as IP addresses or ports.

If you run this script and things are working as expected, you will get the following output:

C1 PUT Successful
C2 PUT Successful
C1 to C2 consistent
C2 to C1 consistent

Fullsync

During realtime replication, operations coordinated by the Source cluster will be replicated to the Site cluster. Riak Objects are placed in a queue on the Source cluster and streamed to the Site cluster. When the queue is full due to high traffic or a bulk loading operation, some objects will be dropped from replication. These dropped objects can be sent to the Site cluster by running a fullsync operation. The settings for the realtime replication queue and their explanations are available in the [configuration][config v2 mdc] documentation.

Initiating a fullsync

To start a fullsync operation, issue the following command on your leader node:

riak-repl start-fullsync

A fullsync operation may also be cancelled. If a partition is in progress, synchronization will stop after that partition completes. During cancellation, riak-repl status will show ‘cancelled’ in the status.

riak-repl cancel-fullsync

Fullsync operations may also be paused, resumed, or scheduled for certain times using cron jobs. A complete list of fullsync commands is available in the [MDC Operations][cluster ops v2 mdc] documentation.