Active Anti-Entropy: Slight chance that AAE could stall itself or crash a Riak node.
Info | Value |
---|---|
Date issued | November 17, 2016 |
Product | Riak KV |
Affected versions | Riak KV 2.0.0+, Riak KV 2.1.0+ |
Overview
There exists a highly unlikely condition where Riak KV’s active anti-entropy feature can either stall or cause a segmentation fault crash.
If you are using Riak KV Enterprise Edition, please see the Product Advisory in the Riak Support Portal for your patches and instructions.
Description
The active anti-entropy feature (AAE) has a procedure known as a hash tree rebuild. This procedure uses a LevelDB feature called an iterator. The rebuild procedure has the potential to simultaneously instruct the iterator to retrieve the next data item and to close the iterator. The probability of both happening is very slight, but Riak was able to create the simultaneous instructions under extremely heavy loads across many hours.
AAE stores its metadata within a LevelDB database. This metadata is independent of the user’s selected storage backend for Riak (LevelDB, Bitcask, memory, or multi). Therefore all users of AAE are impacted regardless of their chosen storage backend.
Affected Users
You could be impacted if you are running one of the listed versions of Riak KV and your riak.conf has the default setting: entropy = active
.
- Riak KV 2.0.0+
- Riak KV 2.1.0+
Impact
Active entropy hash tree rebuilds could stall indefinitely on a Riak node, or A Riak node could experience a segmentation fault that requires restarting Riak.
Mitigation Strategy
There are two approaches you can take to mitigate this issue:
- Update to the newly released Riak KV 2.2, or
- Patch eLevelDB to 2.0.32.
Patching eLevelDB to 2.0.32
If an upgrade to Riak KV 2.2 is not possible in your environment, the LevelDB library can be patched. This patch contains natively-compiled code and is operating system specific.
Do not apply the eleveldb.so patch to Riak TS, it will prevent it functioning correctly.
OS specific package links:
- Debian 6
- Debian 7
- Fedora 19
- FreeBSD 9
- OS X 10
- RHEL 5
- RHEL 6
- RHEL 7
- SLES 11
- SmartOS 1.8
- SmartOS 13.1
- Solaris 10
- Ubuntu Lucid
- Ubuntu Precise
- Ubuntu Trusty
Install the Patch
To install this patch, on each node in the cluster you must:
Determine your eleveldb directory.
- Find the Riak
lib
directory- RHEL/Centos/Fedora/SLES:
/usr/lib64/riak/lib/
- Ubuntu/Debian:
/usr/lib/riak/lib/
- FreeBSD:
/usr/local/lib/riak/lib/
- SmartOS:
/opt/local/lib/riak/lib/
- Solaris:
/opt/riak/lib/
- RHEL/Centos/Fedora/SLES:
- Consult the directory listing for a directory beginning with
eleveldb
, for example:eleveldb-2.0.17-0-g973fc92
on Riak KV 2.1.4 oreleveldb-2.0.22-0-g185e296
on Riak KV 2.0.7 - Make a note of this directory name for the following steps.
- Find the Riak
Stop the node by running
riak stop
.Change to the priv subdirectory of your eleveldb directory.
Rename the original eleveldb.so file to eleveldb.so.orig.
Copy the provided eleveldb.so to the directory and verify correct permissions.
If possible, verify that the md5sum of the eleveldb.so located in the eleveldb priv directory is correct.
Change into the ebin subdirectory of your eleveldb directory.
Rename the original eleveldb.beam file to eleveldb.beam.orig.
Copy the provided eleveldb.beam to the directory and verify correct permissions.
If possible, verify that the md5sum of the eleveldb.beam located in the eleveldb ebin directory is correct.
Start the node by running
riak start
.
To back out of this patch, on each node in the cluster you must:
Determine your eleveldb directory.
- Change to the Riak lib directory
- RHEL/Centos/Fedora/SLES:
/usr/lib64/riak/lib/
- Ubuntu/Debian:
/usr/lib/riak/lib/
- FreeBSD:
/usr/local/lib/riak/lib/
- SmartOS:
/opt/local/lib/riak/lib/
- Solaris:
/opt/riak/lib/
- RHEL/Centos/Fedora/SLES:
- Consult the directory listing for a directory beginning with
eleveldb
, for example:eleveldb-2.0.17-0-g973fc92
on Riak KV 2.1.4 oreleveldb-2.0.22-0-g185e296
on Riak KV 2.0.7 - Make a note of this directory name for the following steps.
- Change to the Riak lib directory
Stop the node by running
riak stop
.Change to the priv subdirectory of your eleveldb directory.
Verify that you have an eleveldb.so.orig file present in this directory.
Remove the patched eleveldb.so file.
Rename
eleveldb.so.orig
toeleveldb.so
.Change into the ebin subdirectory of your eleveldb directory.
Verify that you have an eleveldb.beam.orig file present in this directory.
Remove the patched
eleveldb.beam
file.Rename
eleveldb.beam.orig
toeleveldb.beam
.Start the node by running
riak start
.
Verifying the Patch Installation
When the patch is installed, the LevelDB LOG files will report that version 2.0.31 is installed.
Note: this is eleveldb version 2.0.32 but the leveldb LOG file is expected to report 2.0.31.
The LOG files for each running vnode will have a log line similar to the following:
2016/03/17-18:42:50.544293 7ffaaf3b1700 Version: 2.0.31