Apache Ignite

GridGain Developer Hub - Apache Ignitetm

Welcome to the Apache Ignite developer hub run by GridGain. Here you'll find comprehensive guides and documentation to help you start working with Apache Ignite as quickly as possible, as well as support if you get stuck.

 

GridGain also provides Community Edition which is a distribution of Apache Ignite made available by GridGain. It is the fastest and easiest way to get started with Apache Ignite. The Community Edition is generally more stable than the Apache Ignite release available from the Apache Ignite website and may contain extra bug fixes and features that have not made it yet into the release on the Apache website.

 

Let's jump right in!

 

Documentation     Ask a Question     Download

 

Javadoc     Scaladoc     Examples

Indexes

Setting Up SQL Indexes in Apache Ignite

Overview

Apache Ignite supports advanced indexing capabilities allowing you to define a single field (aka. column) or group indexes with various parameters, to manage indexes location putting them either in Java heap or off-heap spaces and so on so forth.

Indexes in Ignite are kept in a distributed fashion the same way as cache data sets. Each node that stores a specific subset of data keeps and maintains indexes corresponding to this data as well.

From this documentation page, you'll learn how to define and manage indexes as well as queryable fields using two available approaches and how to switch between specific indexing implementations supported by data fabric.

Annotation Based Configuration

Indexes, as well as queryable fields, can be configured from code with the usage of @QuerySqlField annotation. As shown in the example below, desired fields should be marked with this annotation.

public class Person implements Serializable {
  /** Indexed field. Will be visible for SQL engine. */
	@QuerySqlField (index = true)
  private long id;
  
  /** Queryable field. Will be visible for SQL engine. */
  @QuerySqlField
  private String name;
  
  /** Will NOT be visible for SQL engine. */
  private int age;
  
  /**
   * Indexed field sorted in descending order. 
   * Will be visible for SQL engine.
   */
  @QuerySqlField(index = true, descending = true)
  private float salary;
}
case class Person (
  /** Indexed field. Will be visible for SQL engine. */
  @(QuerySqlField @field)(index = true) id: Long,

  /** Queryable field. Will be visible for SQL engine. */
  @(QuerySqlField @field) name: String,
  
  /** Will NOT be visisble for SQL engine. */
  age: Int
  
  /**
   * Indexed field sorted in descending order. 
   * Will be visible for SQL engine.
   */
  @(QuerySqlField @field)(index = true, descending = true) salary: Float
) extends Serializable {
  ...
}

Both id and salary are indexed fields. id field will be sorted in the ascending order (default) while salary in the descending order.

If you don't want to index a field but still need to use it in a SQL query, then the field has to be annotated as well omitting the index = true parameter. Such a field is called as a queryable field. As an example, name is defined as a queryable field above.

Finally, age is neither queryable nor indexed field and won't be accessible from SQL queries in Apache Ignite.

Scala Annotations

In Scala classes, the @QuerySqlField annotation must be accompanied by the @field annotation in order for a field to be visible for Ignite, like so: @(QuerySqlField @field).

Alternatively, you can also use the @ScalarCacheQuerySqlField annotation from the ignite-scalar module which is just a type alias for the @field annotation.

Registering Indexed Types

After indexed and queryable fields are defined, they have to be registered in the SQL engine along with the object types they belong to.

To tell Ignite which types should be indexed, key-value pairs can be passed into CacheConfiguration.setIndexedTypes method as it's shown in the example below.

// Preparing configuration.
CacheConfiguration<Long, Person> ccfg = new CacheConfiguration<>();

// Registering indexed type.
ccfg.setIndexedTypes(Long.class, Person.class);

Note that this method accepts only pairs of types - one for key class and another for value class. Primitives are passed as boxed types.

Predefined Fields

In addition to all the fields marked with @QuerySqlField annotation, each table will have two special predefined fields: _key and _val, which represent links to whole key and value objects. This is useful, for instance, when one of them is of a primitive type and you want to filter out by its value. To do this, execute a query like SELECT * FROM Person WHERE _key = 100.

Since Ignite supports Binary Marshaller, there is no need to add classes of indexed types to the classpath of cluster nodes. SQL query engine is able to pick up values of indexed and queryable fields avoiding object deserialization.

Group Indexes

To set up a multi-field index that will allow accelerating queries with complex conditions, you can use @QuerySqlField.Group annotation. It is possible to put multiple @QuerySqlField.Group annotations into orderedGroups if you want a field to be a part of more than one group.

For instance, in Person class below we have field age which belongs to an indexed group named "age_salary_idx" with group order 0 and descending sort order. Also, in the same group, we have field salary with group order 3 and ascending sort order. Furthermore, field salary itself is a single column index (there is index = true parameter specified in addition to orderedGroups declaration). Group order does not have to be a particular number. It is needed just to sort fields inside of a particular group.

public class Person implements Serializable {
  /** Indexed in a group index with "salary". */
  @QuerySqlField(orderedGroups={@QuerySqlField.Group(
    name = "age_salary_idx", order = 0, descending = true)})
  private int age;

  /** Indexed separately and in a group index with "age". */
  @QuerySqlField(index = true, orderedGroups={@QuerySqlField.Group(
    name = "age_salary_idx", order = 3)})
  private double salary;
}

Note that annotating a field with @QuerySqlField.Group outside of @QuerySqlField(orderedGroups={...}) will have no effect.

QueryEntity Based Configuration

Indexes and queryable fields can also be configured with org.apache.ignite.cache.QueryEntity class which is convenient for Spring XML based configuration.

All concepts that are discussed as a part of annotation based configuration above are valid for QueryEntity based approach. Furthermore, types whose fields are configured with @QuerySqlField and are registered with CacheConfiguration.setIndexedTypes method are internally turned into query entities.

The example below shows how you can define a single field and group indexes as well as queryable fields.

<bean class="org.apache.ignite.configuration.CacheConfiguration">
    <property name="name" value="mycache"/>
    <!-- Configure query entities -->
    <property name="queryEntities">
        <list>
            <bean class="org.apache.ignite.cache.QueryEntity">
                <!-- Setting indexed type's key class -->
                <property name="keyType" value="java.lang.Long"/>
              
                <!-- Setting indexed type's value class -->
                <property name="valueType"
                          value="org.apache.ignite.examples.Person"/>

                <!-- 
										Defining fields that will be either indexed or queryable.
 										Indexed fields are added to 'indexes' list below.
								-->
                <property name="fields">
                    <map>
                        <entry key="id" value="java.lang.Long"/>
                        <entry key="name" value="java.lang.String"/>
                        <entry key="salary" value="java.lang.Long "/>
                    </map>
                </property>

                <!-- 
										Defining which fields, listed above, will be treated as 
										indexed fields.
								-->
                <property name="indexes">
                    <list>
                        <!-- Single field (aka. column) index -->
                        <bean class="org.apache.ignite.cache.QueryIndex">
                            <constructor-arg value="id"/>
                        </bean>
                      
                        <!-- Group index. -->
                        <bean class="org.apache.ignite.cache.QueryIndex">
                            <constructor-arg>
                                <list>
                                    <value>id</value>
                                    <value>salary</value>
                                </list>
                            </constructor-arg>
                            <constructor-arg value="SORTED"/>
                        </bean>
                    </list>
                </property>
            </bean>
        </list>
    </property>
</bean>

SkipList Based and Snapshotable Indexes

Ignite SQL Grid provides two indexing implementations that can be used when indexes are stored in Java heap.

The first implementation is based on a skip list data structure and is enabled by default.

The second implementation is based on a modified version of an AVL tree with fast cloning. This implementation is known as snapshotable in Ignite and can be enabled with CacheConfiguration.setSnapshotableIndex(...) method.

For off-heap mode, discussed below, Ignite provides only one indexing implementation which is a modified version of an AVL tree with fast cloning.

Off-Heap SQL Indexes

Ignite supports placing indexed data in off-heap memory. This makes sense for very large datasets since keeping data in Java heap can cause high GC activity and unacceptable response times.

By default, Ignite stores SQL Indexes on heap. Ignite will store query indexes in off-heap memory if CacheConfiguration.setMemoryMode is configured to one of the off-heap memory modes - OFFHEAP_TIERED or OFFHEAP_VALUES, or CacheConfiguration.setOffHeapMaxMemory property is set to a value >= 0.

To improve the performance of SQL queries with off-heap mode enabled, you can try to increase the value of CacheConfiguration.setSqlOnheapRowCacheSize property that has a default value of '10000'.

CacheConfiguration<Object, Object> ccfg = new CacheConfiguration<>();

// Set unlimited off-heap memory for cache and enable off-heap indexes.
ccfg.setOffHeapMaxMemory(0); 

// Cache entries will be placed on heap and can be evited to off-heap.
ccfg.setMemoryMode(ONHEAP_TIERED);
ccfg.setEvictionPolicy(new RandomEvictionPolicy(100_000));

// Increase size of SQL on-heap row cache for off-heap indexes.
ccfg.setSqlOnheapRowCacheSize(100_000);

Indexing Implementation

For off-heap mode, Ignite provides only one indexing implementation which is a modified version of an AVL tree with fast cloning.

Indexes Tradeoffs

There are multiple things you should consider when choosing indexes for your Ignite application.

  • Indexes are not free. They consume memory, also each index needs to be updated separately, thus your cache update performance can be poorer when you have more indexes set up. On top of that, the optimizer might do more mistakes by choosing a wrong index to run a query.

It is a bad strategy to index everything!

  • Indexes are just sorted data structures. If you define an index for the fields (a,b,c) then the records will be sorted first by a, then by b and only then by c.

Example of Sorted Index

| A | B | C |
| 1 | 2 | 3 |
| 1 | 4 | 2 |
| 1 | 4 | 4 |
| 2 | 3 | 5 |
| 2 | 4 | 4 |
| 2 | 4 | 5 |

Any condition like a = 1 and b > 3 can be viewed as a bounded range, both bounds can be quickly looked up in in log(N) time, the result will be everything between.

The following conditions will be able to use the index:

  • a = ?
  • a = ? and b = ?
  • a = ? and b = ? and c = ?

Condition a = ? and c = ? is no better than a = ? from the index point of view.
Obviously half-bounded ranges like a > ? can be used as well.

  • Indexes on single fields are no better than group indexes on multiple fields starting with the same field (index on (a) is no better than (a,b,c)). Thus it is preferable to use group indexes.

Indexes

Setting Up SQL Indexes in Apache Ignite