Spark-Riak Connector Add-on (Riak TS)
    
    
    Table Range Query Partitioning
  
  
Riak TS range queries are limited to a maximum of 5 quanta (see Querying Data in Riak TS). To work around this limitation or simply achieve higher read performance, large ranges can be split into smaller sub-ranges at partitioning time.
To use this functionality, you must provide the following options:
- spark.riak.partitioning.ts-range-field-nameto identify quantized field
- spark.riak.input.split.countto identify number of partitions/subranges (default value is- 10)
For example:
   val df = sqlContext.read
      .option("spark.riak.input.split.count", "5")
      .option("spark.riak.partitioning.ts-range-field-name", "time")
      .format("org.apache.spark.sql.riak")
      .schema(schema)
      .load(ts_table_name)
      .filter(s"time >= CAST(111111 AS TIMESTAMP) AND time <= CAST(555555 AS TIMESTAMP) AND col1 = 'val1'")
df = sqlContext.read \
      .option("spark.riak.input.split.count", "5") \
      .option("spark.riak.partitioning.ts-range-field-name", "time") \
      .format("org.apache.spark.sql.riak") \
      .schema(schema) \
      .load(ts_table_name) \
      .filter(s"time >= CAST(111111 AS TIMESTAMP) AND time <= CAST(555555 AS TIMESTAMP) AND col1 = 'val1'")
The initial range query will be split into 5 subqueries (one per each partition) as follows:
- time >= CAST(111111 AS TIMESTAMP) AND time < CAST(222222 AS TIMESTAMP) AND col1 = 'val1'
- time >= CAST(222222 AS TIMESTAMP) AND time < CAST(333333 AS TIMESTAMP) AND col1 = 'val1'
- time >= CAST(333333 AS TIMESTAMP) AND time < CAST(444444 AS TIMESTAMP) AND col1 = 'val1'
- time >= CAST(444444 AS TIMESTAMP) AND time < CAST(555555 AS TIMESTAMP) AND col1 = 'val1'
- time >= CAST(555555 AS TIMESTAMP) AND time < CAST(555556 AS TIMESTAMP) AND col1 = 'val1'
Not providing the spark.riak.partitioning.ts-range-field-name property will default to having a single partition with initial query.
