Introduction to Riak KV 2.0

Riak version 2.0 includes deep changes and many new features affecting all facets of Riak. This article gives an overview of the new features and where you can learn more about using them in your Riak installation.

For more in-depth implementation details check out the version 2.0 release notes.

If you’re upgrading to Riak 2.0 from an earlier version, please be aware that all of the new features listed below are optional:

  • Riak Data Types — Riak’s new CRDT-based Data Types can simplify modeling data in Riak, but are only used in buckets explicitly configured to use them.
  • Strong Consistency, Riak Security, and the New Riak Search — These are subsystems in Riak that must be explicitly turned on to work. If not turned on, they will have no impact on performance. Furthermore, the older Riak Search will continue to be included with Riak.
  • SecurityAuthentication and authorization can be enabled or disabled at any time.
  • Configuration management — Riak’s configuration files have been streamlined into a single file named riak.conf. If you are upgrading, however, your existing app.config and vm.args files will still be recognized in version 2.0.
  • Bucket Types — While we strongly recommend using bucket types when creating new buckets, they are not required.
  • Dotted Version Vectors (DVVs) — This alternative to traditional vector clocks is enabled by default in all bucket types, but DVVs can be disabled by setting the dvv_enabled property to false on any bucket type.

In a nutshell, upgrading to 2.0 will change how you use Riak only if you want it to. But even if you don’t plan on using the new features, there are a number of improvements that make upgrading a good choice, including the following:

  • Cluster metadata — This is a subsystem of Riak added in 2.0 that reduces the amount of inter-node gossip in Riak clusters, which can reduce network congestion.
  • Active Anti-Entropy — While Riak has had an Active Anti-Entropy (AAE) feature that is turned on by default since version 1.3, AAE performance has been improved in version 2.0.
  • Bug patches — A variety of bugs present in earlier versions have been identified and patched.

More on upgrading can be found in our Riak 2.0 upgrade guide.

Riak Data Types

In distributed systems there is an unavoidable trade-off between consistency and availability. This can complicate some aspects of application design if you’re using Riak as a key/value store because the application is responsible for resolving conflicts between replicas of objects stored in different Riak nodes.

Riak 2.0 offers a new approach to this problem for a wide range of use cases in the form of Riak Data Types. Instead of forcing the application to resolve conflicts, Riak offers five Data Types that can reduce some of the complexities of developing using Riak: flags, registers, counters, sets, and maps.

Relevant Docs

  • Using Data Types explains how to use Riak Data Types on the application side, with usage examples for all five Data Types in all of Riak’s officially supported clients (Java, Ruby, Python, .NET and Erlang) and for Riak’s HTTP interface.
  • Data Types explains some of the theoretical concerns that drive Riak Data Types and shares details about how they are implemented in Riak.

Video

Data Structures in Riak by Riak engineers Sean Cribbs and Russell Brown.

Riak Search 2.0 (codename: Yokozuna)

Riak Search 2.0 is a complete, top-to-bottom replacement for Riak Search, integrating Riak with Apache Solr’s full-text search capabilities and supporting Solr’s client query APIs.

Relevant Docs

  • Using Search provides an overview of how to use the new Riak Search.
  • Search Schema shows you how to create and manage custom search schemas.
  • Search Details provides an in-depth look at the design considerations that went into the new Riak Search.

Video

Riak Search 2.0 by Riak engineer and documentarian Eric Redmond.

Strong Consistency

Riak is typically known as an AP system, favoring high availability and partition tolerance while sacrificing data consistency. In version 2.0, you have the option of applying strong consistency guarantees and thus of using Riak as a CP—consistent plus partition-tolerant—system for some (or perhaps all) of your data.

Relevant Docs

  • Using Strong Consistency shows you how to enable Riak’s strong consistency subsystem and to apply strong consistency guarantees to data stored in specified buckets.
  • Strong Consistency provides a theoretical treatment of how a strongly consistent system differs from an eventually consistent system, as well as details about how strong consistency is implemented in Riak.
  • Managing Strong Consistency is a guide to strong consistency for Riak operators.

Video

Bringing Consistency to Riak by Riak engineer Joseph Blomstedt. You should also check out part 2.

Security

Riak 2.0 enables you to manage:

  • Authorization to perform specific tasks, from GETs and PUTs to running MapReduce jobs to administering Riak Search.

  • Authentication of Riak clients seeking access to Riak.

Previously, securing Riak was restricted to the network level. Now, security measures can be applied to the internals of Riak itself and managed through a simple command-line interface.

Relevant Docs

  • Authentication and Authorization explains how Riak Security can be enabled and disabled, how users and groups are managed, how authorization to perform certain operations can be granted and revoked, how security ciphers can be chosen, and more.
  • Managing Security Sources is an in-depth tutorial on how to implement Riak’s four supported authentication sources: trusted networks, passwords, pluggable authentication modules, and certificates.

Video

Locking the Distributed Chicken Coop by Riak engineer Andrew Thompson.

Simplified Configuration Management

In older versions of Riak, a Riak node’s configuration was determined by two separate files: app.config and vm.args. In Riak 2.0, you have the option of either continuing to use these files, which can be useful if you’re upgrading to 2.0, or to manage configuration through a single riak.conf file in which parameters are set using the following syntax:

parameter.sub-parameter = setting

Based on Riak’s Cuttlefish project, the new system is much simpler, leaving behind the Erlang syntax required in app.config.

Note on upgrading

Version 2.0 will support both the old and the new configuration system, in case you’re upgrading. Please note, however, that if you use both systems side by side, all settings from the older, app.config/vm.args-based system will override any settings from the new system.

Relevant Docs

  • Configuration Files lists and describes all of the configurable parameters available in Riak 2.0, from configuring your chosen storage backend(s) to setting default bucket properties to controlling Riak’s logging system and much more.

Video

Lightning talk on Cuttlefish by Riak engineer Joe DeVivo.

Bucket Types

In older versions of Riak, bucket properties were managed on a bucket-by-bucket, ad hoc basis. With bucket types, you can create, manage, and apply whole configurations of bucket properties efficiently. Bucket types also act as a third namespace in addition to buckets and keys.

Relevant Docs

  • Using Bucket Types explains how to create, modify, and activate bucket types, as well as how the new system differs from the older, bucket properties-based system.

Video

Bucket Types and Config hangout with Riak engineers Joe DeVivo and Jordan West.

Dotted Version Vectors

In prior versions of Riak, conflict resolution was managed using vector clocks, which track object update causality.

Riak 2.0 has added support for dotted version vectors (DVVs). DVVs serve an analogous role to vector clocks but are more effective at containing sibling explosion and can reduce Riak cluster latency.

Relevant Docs

  • Dotted Version Vectors explains some of the theoretical nuances behind the distinction between DVVs and vector clocks and offers instructions on implementing DVVs.

New Client Libraries

While Riak offered official client libraries for Java, Ruby, Python, .NET and Erlang for versions of Riak prior to 2.0, all clients have undergone major changes in anticipation of the 2.0 release.

Language Docs
Java Javadoc
Ruby API
Python Sphinx
.NET wiki, API
Erlang EDocs

You will also notice that our documentation now features a wide variety of code samples from all four officially supported clients.

Incompatibilities

Some 2.0-specific features are currently not compatible with one another. Incompatibilities are marked with a in the table below.

Search 2.0 Strong consistency Data Types Secondary indexes Legacy Search
Strong consistency
Data Types
Secondary indexes
Legacy Search *
Security

    The data indexed by Riak Search can be stored in a strongly consistent fashion, but indexes themselves are eventually consistent
    If secondary indexes are attached to an object, you can perform strongly consistent operations on the object but the secondary indexes will be ignored
*     Legacy Search and Search 2.0 can be run side by side, but we do not recommend this