Riak Search Settings

This page covers how to use Riak Search (with Solr integration).

For a simple reference of the available configs and their defaults, see the configuration reference.

If you are looking to develop on or with Riak Search, take a look at:

Overview

We’ll be walking through:

Prerequisites

Because Solr is a Java application, you will need to install Java 7 or later on every node. Installation packages can be found on the Java SE Downloads page and instructions in the Java SE documentation site.

Enabling Riak Search

Riak Search is not enabled by default, so you must enable it in every node’s configuration file as follows:

search = on

Search Config Settings

You will find all the Riak Search configuration settings in riak.conf. Setting search to on is required, but other search settings are optional. A handy reference list of these parameters can be found in our configuration files documentation.

`search`

Enable or disable search; defaults to off.

Valid values: on or off

`search.anti_entropy.data_dir`

The directory in which Riak Search stores files related to active anti-entropy; defaults to ./data/yz_anti_entropy.

Valid values: a directory

`search.anti_entropy.throttle`

Whether the throttle for Yokozuna active anti-entropy is enabled; defaults to on.

Valid values: on or off

You can read more about throttling here.

`search.anti_entropy.throttle.$tier.delay`

Set the throttling tiers delay for active anti-entropy; no default.

Each tier is a minimum Solrq queue size and a time-delay that the throttle should observe at that size and above.

For example:

search.anti_entropy.throttle.tier1.solrq_queue_length = 0
search.anti_entropy.throttle.tier1.delay = 0ms
search.anti_entropy.throttle.tier2.solrq_queue_length = 40
search.anti_entropy.throttle.tier2.delay = 5ms

will introduce a 5 millisecond sleep for any queues of length 40 or higher. If configured, there must be a tier which includes a mailbox size of 0. Both .solrq_queue_length and .delay must be set for each tier. There is no limit to the number of tiers that may be specified. See search.anti_entropy.throttle.

Valid values: Non-negative integer

`search.anti_entropy.throttle.$tier.solrq_queue_length`

Set the throttling tiers for active anti-entropy; no default.

Each tier is a minimum Solrq queue size and a time-delay that the throttle should observe at that size and above.

For example:

search.anti_entropy.throttle.tier1.solrq_queue_length = 0
search.anti_entropy.throttle.tier1.delay = 0ms
search.anti_entropy.throttle.tier2.solrq_queue_length = 40
search.anti_entropy.throttle.tier2.delay = 5ms

Valid values: Non-negative integer

`search.dist_query`

Enable this node in distributed query plans; defaults to on.

If enabled, this node will participate in distributed Solr queries. If disabled, the node will be excluded from Riak search cover plans, and will therefore never be consulted in a distributed query. Note that this node may still be used to execute a query. Use this flag if you have a long running administrative operation (e.g. reindexing) which requires that the node be removed from query plans, and which would otherwise result in inconsistent search results.

This setting can also be changed via riak-admin by issuing one of the following commands:

riak-admin set search.dist_query=off

riak-admin set search.dist_query=on

Setting this value in riak.conf is useful when you are restarting a node which was removed from search queries with the riak-admin feature. Setting search.dis_query in riak.conf will prevent the node from being included in search queries until it is fully spun up.

Valid values: on or off

`search.index.error_threshold.failure_count`

The number of failures encountered while updating a search index within search.queue.error_threshold.failure_interval before Riak KV will skip updates to that index; defaults to 3.

Valid values: Integer

`search.index.error_threshold.failure_interval`

The window of time during which search.queue.error_threshold.failure_count failures will cause Riak KV to skip updates to a search index; defaults to 5000.

If search.queue.error_threshold.failure_count errors have occurred within this interval on a given search index, then Riak will skip updates to that index until the search.queue.error_threshold.reset_interval has passed.

Valid values: Milliseconds

`search.index.error_threshold.reset_interval`

The amount of time it takes for updates to a given search index to resume/refresh once Riak KV has started skipping update operations; defaults to 30000.

Valid values: Milliseconds

`search.queue.batch.flush_interval`

The maximum delay between notification to flush batches to Solr; defaults to 1000 (milliseconds).

This setting is used to increase or decrease the frequency of batch delivery into Solr, specifically for relatively low-volume input into Riak KV. This setting ensures that data will be delivered into Solr in accordance with the search.queue.batch.minimum and search.queue.batch.maximum settings within the specified interval. Batches that are smaller than search.queue.batch.minimum will be delivered to Solr within this interval. This setting will generally have no effect on heavily loaded systems. You may use any time unit; the default is in milliseconds.

Valid values: ms, s, m, or h

`search.queue.batch.maximum`

The maximum batch size, in number of Riak objects; defaults to 500.

Any batches that are larger than this amount will be split, where the first search.queue.batch.maximum objects will be flushed to Solr and the remaining objects enqueued for that index will be retained until the next batch is delivered. This parameter ensures that at most search.queue.batch.maximum objects will be delivered into Solr in any given request.

Valid values: Integer

`search.queue.batch.minimum`

The minimum batch size, in number of Riak objects; defaults to 10.

Any batches that are smaller than this amount will not be immediately flushed to Solr, but are guaranteed to be flushed within the search.queue.batch.flush_interval.

Valid valus: Integer

`search.queue.high_watermark`

The queue high water mark; defaults to 1000.

If the total number of queued messages in a Solrq worker instance exceed this limit, then the calling vnode will be blocked until the total number falls below this limit. This parameter exercises flow control between Riak KV and the Riak Search batching subsystem, if writes into Solr start to fall behind.

Valid values: Integer

`search.queue.high_watermark.purge_strategy`

The strategy for how purging is handled when the search.queue.high_watermark is hit; defaults to purge_one.

Valid values: purge_one, purge_index, or off

purge_one removes the oldest item on the queue from an erroring (references to fuses blown in the code) index in order to get below the search.queue.high_watermark
purge_index removes all items associated with one random erroring (references to fuses blown in the code) index in order to get below the search.queue.high_watermark
off disables purging

`search.root_dir`

The root directory in which index data and configuration is stored; defaults to ./data/yz.

Valid values: a directory

`search.solr.jvm_options`

The options to pass to the Solr JVM; defaults to -d64 -Xms1g -Xmx1g -XX:+UseStringCache -XX:+UseCompressedOops.

Non-standard options (e.g. -XX) may not be portable across JVM implementations.

Valid values: Java command-line arguments

`search.solr.jmx_port`

The port number to which Solr JMX binds (note: binds on every interface); defaults to 8985.

Valid values: Integer

`search.solr.port`

The port number to which Solr binds (note: binds on every interface); defaults to 8093.

Valid values: Integer

`search.solr.start_timeout`

How long Riak KV will wait for Solr to start (attempts twice before shutdown); defaults to 30s.

Values lower than 1s will be rounded up to 1s.

Valid values: Integer with time units (e.g. 2m)

More on Solr

Solr JVM and Ports

Riak Search runs one Solr process per node to manage its indexing and search functionality. While the underlying project manages index distribution, node coverage for queries, active anti-entropy (AAE), and JVM process management, you should provide plenty of RAM and diskspace for running both Riak and the JVM running Solr. We recommend a minimum of 6GB of RAM per node.

Concerning ports, be sure to take the necessary security precautions to prevent exposing the extra Solr and JMX ports to the outside world.

Solr for Operators

For further information on Solr monitoring, tuning, and performance, we recommend the following documents for getting started:

A wide variety of other documentation is available from the Solr OSS community.