Introduction to Riak KV 2.0
Riak version 2.0 includes deep changes and many new features affecting all facets of Riak. This article gives an overview of the new features and where you can learn more about using them in your Riak installation.
For more in-depth implementation details check out the version 2.0 release notes.
If you’re upgrading to Riak 2.0 from an earlier version, please be aware that all of the new features listed below are optional:
- Riak Data Types — Riak’s new CRDT-based Data Types can simplify modeling data in Riak, but are only used in buckets explicitly configured to use them.
- Strong Consistency, Riak Security, and the New Riak Search — These are subsystems in Riak that must be explicitly turned on to work. If not turned on, they will have no impact on performance. Furthermore, the older Riak Search will continue to be included with Riak.
- Security — Authentication and authorization can be enabled or disabled at any time.
- Configuration management — Riak’s configuration files have
been streamlined into a single file named
riak.conf. If you are upgrading, however, your existing
vm.argsfiles will still be recognized in version 2.0.
- Bucket Types — While we strongly recommend using bucket types when creating new buckets, they are not required.
- Dotted Version Vectors (DVVs) — This alternative to traditional
vector clocks is enabled by default
in all bucket types, but DVVs can be disabled
by setting the
falseon any bucket type.
In a nutshell, upgrading to 2.0 will change how you use Riak only if you want it to. But even if you don’t plan on using the new features, there are a number of improvements that make upgrading a good choice, including the following:
- Cluster metadata — This is a subsystem of Riak added in 2.0 that reduces the amount of inter-node gossip in Riak clusters, which can reduce network congestion.
- Active Anti-Entropy — While Riak has had an Active Anti-Entropy (AAE) feature that is turned on by default since version 1.3, AAE performance has been improved in version 2.0.
- Bug patches — A variety of bugs present in earlier versions have been identified and patched.
More on upgrading can be found in our Riak 2.0 upgrade guide.
Riak Data Types
In distributed systems there is an unavoidable trade-off between consistency and availability. This can complicate some aspects of application design if you’re using Riak as a key/value store because the application is responsible for resolving conflicts between replicas of objects stored in different Riak nodes.
Riak 2.0 offers a new approach to this problem for a wide range of use cases in the form of Riak Data Types. Instead of forcing the application to resolve conflicts, Riak offers five Data Types that can reduce some of the complexities of developing using Riak: flags, registers, counters, sets, and maps.
- Using Data Types explains how to use Riak Data Types on the application side, with usage examples for all five Data Types in all of Riak’s officially supported clients (Java, Ruby, Python, .NET and Erlang) and for Riak’s HTTP interface.
- Data Types explains some of the theoretical concerns that drive Riak Data Types and shares details about how they are implemented in Riak.
Data Structures in Riak by Riak engineers Sean Cribbs and Russell Brown.
Riak Search 2.0 (codename: Yokozuna)
Riak Search 2.0 is a complete, top-to-bottom replacement for Riak Search, integrating Riak with Apache Solr’s full-text search capabilities and supporting Solr’s client query APIs.
- Using Search provides an overview of how to use the new Riak Search.
- Search Schema shows you how to create and manage custom search schemas.
- Search Details provides an in-depth look at the design considerations that went into the new Riak Search.
Riak Search 2.0 by Riak engineer and documentarian Eric Redmond.
Riak is typically known as an AP system, favoring high availability and partition tolerance while sacrificing data consistency. In version 2.0, you have the option of applying strong consistency guarantees and thus of using Riak as a CP—consistent plus partition-tolerant—system for some (or perhaps all) of your data.
- Using Strong Consistency shows you how to enable Riak’s strong consistency subsystem and to apply strong consistency guarantees to data stored in specified buckets.
- Strong Consistency provides a theoretical treatment of how a strongly consistent system differs from an eventually consistent system, as well as details about how strong consistency is implemented in Riak.
- Managing Strong Consistency is a guide to strong consistency for Riak operators.
Bringing Consistency to Riak by Riak engineer Joseph Blomstedt. You should also check out part 2.
Riak 2.0 enables you to manage:
Authorization to perform specific tasks, from GETs and PUTs to running MapReduce jobs to administering Riak Search.
Authentication of Riak clients seeking access to Riak.
Previously, securing Riak was restricted to the network level. Now, security measures can be applied to the internals of Riak itself and managed through a simple command-line interface.
- Authentication and Authorization explains how Riak Security can be enabled and disabled, how users and groups are managed, how authorization to perform certain operations can be granted and revoked, how security ciphers can be chosen, and more.
- Managing Security Sources is an in-depth tutorial on how to implement Riak’s four supported authentication sources: trusted networks, passwords, pluggable authentication modules, and certificates.
Locking the Distributed Chicken Coop by Riak engineer Andrew Thompson.
Simplified Configuration Management
In older versions of Riak, a Riak node’s configuration was determined by
two separate files:
vm.args. In Riak 2.0, you have
the option of either continuing to use these files, which can be useful
if you’re upgrading to 2.0, or to manage configuration through a single
riak.conf file in which parameters are set using the following syntax:
parameter.sub-parameter = setting
Based on Riak’s Cuttlefish
project, the new system is much simpler, leaving behind the Erlang
syntax required in
Version 2.0 will support both the old and the new configuration system, in
case you’re upgrading. Please note, however, that if you use both systems side
by side, all settings from the older,
vm.args-based system will
override any settings from the new system.
- Configuration Files lists and describes all of the configurable parameters available in Riak 2.0, from configuring your chosen storage backend(s) to setting default bucket properties to controlling Riak’s logging system and much more.
Lightning talk on Cuttlefish by Riak engineer Joe DeVivo.
In older versions of Riak, bucket properties were managed on a bucket-by-bucket, ad hoc basis. With bucket types, you can create, manage, and apply whole configurations of bucket properties efficiently. Bucket types also act as a third namespace in addition to buckets and keys.
- Using Bucket Types explains how to create, modify, and activate bucket types, as well as how the new system differs from the older, bucket properties-based system.
Bucket Types and Config hangout with Riak engineers Joe DeVivo and Jordan West.
Dotted Version Vectors
In prior versions of Riak, conflict resolution was managed using vector clocks, which track object update causality.
Riak 2.0 has added support for dotted version vectors (DVVs). DVVs serve an analogous role to vector clocks but are more effective at containing sibling explosion and can reduce Riak cluster latency.
- Dotted Version Vectors explains some of the theoretical nuances behind the distinction between DVVs and vector clocks and offers instructions on implementing DVVs.
New Client Libraries
While Riak offered official client libraries for Java, Ruby, Python, .NET and Erlang for versions of Riak prior to 2.0, all clients have undergone major changes in anticipation of the 2.0 release.
You will also notice that our documentation now features a wide variety of code samples from all four officially supported clients.
Some 2.0-specific features are currently not compatible with one another. Incompatibilities are marked with a ✗ in the table below.
|Search 2.0||Strong consistency||Data Types||Secondary indexes||Legacy Search|
† The data indexed by Riak Search can be
stored in a strongly consistent fashion, but indexes themselves are
‡ If secondary indexes are attached to an object, you can perform strongly consistent operations on the object but the secondary indexes will be ignored
* Legacy Search and Search 2.0 can be run side by side, but we do not recommend this