Apache Ignite

GridGain Developer Hub - Apache Ignitetm

Welcome to the Apache Ignite developer hub run by GridGain. Here you'll find comprehensive guides and documentation to help you start working with Apache Ignite as quickly as possible, as well as support if you get stuck.

 

GridGain also provides Community Edition which is a distribution of Apache Ignite made available by GridGain. It is the fastest and easiest way to get started with Apache Ignite. The Community Edition is generally more stable than the Apache Ignite release available from the Apache Ignite website and may contain extra bug fixes and features that have not made it yet into the release on the Apache website.

 

Let's jump right in!

 

Documentation     Ask a Question     Download

 

Javadoc     Scaladoc     Examples

Distributed DML

Overview

Apache Ignite SQL Grid not only allows selecting data from the Data Grid, using SQL ANSI-99 syntax, it makes it possible to modify that data with well-known DML statements like INSERT, UPDATE, or DELETE. By taking advantage of this ability, you can work with Apache Ignite In-Memory Data Fabric as with an in-memory distributed database fully relying on its SQL capabilities.

SQL ANSI-99 Compliance

DML queries, as well as all the SELECT queries, are SQL ANSI-99 compliant.

Ignite stores all the data in memory in form of key-value pairs and hence all the DML related operations are converted into corresponding cache key-value based commands like cache.put(...) or cache.invokeAll(...). Let's take a deeper look at how the DML statements are implemented in Ignite.

DML API

In general, all the DML statements can be divided into two groups - Those that add new entries into a cache (INSERT and MERGE), and those that modify the existing data (UPDATE and DELETE).

To execute DML statements in Java, you need to use the same Ignite API that is used for SELECT queries - SqlFieldsQuery API. This API is used by DML operations the same way it is used by read-only queries, where SqlFieldsQuery returns QueryCursor<List<?>>. The only difference is that as a result of a DML statement execution, QueryCursor<List<?>> contains a single-item List<?> of long type that signifies the number of cache items that were affected by the DML statement, whereas as a result of a SELECT statement, QueryCursor<List<?>> will contain a list of items retrieved from the cache.

Alternative APIs

DML API is not limited by Java APIs only. You can connect to an Ignite cluster using ODBC or JDBC drivers and execute DML queries from there. Learn more about additional APIs from the pages listed below:

Basic Configuration

To start using DML operations in Ignite, you would need to configure all queryable fields using QueryEntity based approach or @QuerySqlField annotations. For example:

public class Person {
  /** Field will be accessible from DML statements. */
  @QuerySqlField
	private final String firstName;
  
  /** Indexed field that will be accessible from DML statements. */
  @QuerySqlField (index = true)
  private final String lastName;
  
  /** Field will NOT be accessible from DML statements. */
  private int age;
  
  public Person(String firstName, String lastName) {
  	this.firstName = firstName;
    this.lastName = lastName;
  }
}
<bean class="org.apache.ignite.configuration.CacheConfiguration">
    <property name="name" value="personCache"/>
    <!-- Configure query entities -->
    <property name="queryEntities">
        <list>
            <bean class="org.apache.ignite.cache.QueryEntity">
                <!-- Registering key's class. -->
                <property name="keyType" value="java.lang.Long"/>
              
                <!-- Registering value's class. -->
                <property name="valueType"
                          value="org.apache.ignite.examples.Person"/>

                <!-- 
                    Defining fields that will be accessible from DML side
                -->
                <property name="fields">
                    <map>
                        <entry key="firstName" value="java.lang.String"/>
                        <entry key="lastName" value="java.lang.String"/>
                    </map>
                </property>
              
               <!-- 
                    Defining which fields, listed above, will be treated as 
                    indexed fields as well.
                -->
                <property name="indexes">
                    <list>
                        <!-- Single field (aka. column) index -->
                        <bean class="org.apache.ignite.cache.QueryIndex">
                            <constructor-arg value="lastName"/>
                        </bean>
                    </list>
                </property>
            </bean>
        </list>
    </property>
</bean>

In addition to all the fields marked with @QuerySqlField annotation or defined with QueryEntity, there will be two special predefined fields _key and _val for every object type registered in SQL Grid. These predefined fields provide reference to key-value entries stored in a cache and can be used directly inside of DML statements.

//Preparing cache configuration.
CacheConfiguration<Long, Person> cacheCfg = new CacheConfiguration<>
    ("personCache");
      
//Registering indexed/queryable types.
cacheCfg.setIndexedTypes(Long.class, Person.class);

//Starting the cache.
IgniteCache<Long, Person> cache = ignite.cache(cacheCfg);

// Inserting a new key-value pair referring to prefedined `_key` and `_value`
// fields for Person type.
cache.query(new SqlFieldsQuery("INSERT INTO Person(_key, _val) VALUES(?, ?)")
	.setArgs(1L, new Person("John", "Smith")));

If you prefer to work with concrete fields rather than the whole object value, you can execute a query like the one shown below:

IgniteCache<Long, Person> cache = ignite.cache(cacheCfg);

cache.query(new SqlFieldsQuery(
    "INSERT INTO Person(_key, firstName, lastName) VALUES(?, ?, ?)").
    setArgs(1L, "John", "Smith"));

Note that the DML engine will be able to recreate a Person object from firstName and lastName, and put it into a cache but those fields have to be defined using QueryEntity or @QuerySqlField annotation as described above.

Advanced Configuration

Custom Keys

If you use only predefined SQL data types for cache keys, then there is no need to perform additional manipulation with DML related configuration. Those data types are defined by GridQueryProcessor.SQL_TYPES constant, as listed below.

Predefined SQL Data Types

  • all the primitives and their wrappers except char and Character.
  • String.
  • BigDecimal.
  • byte[].
  • java.util.Date, java.sql.Date, java.sql.Timestamp.
  • java.util.UUID.

However, once you decide to introduce a custom complex key and refer to its fields from DML statements, you have to:

  • Define those fields in the QueryEntity the same way as you set fields for the value object.
  • Use the new configuration parameter QueryEntity.setKeyFields(..) to distinguish key fields from value fields.

The example below shows how to achieve this.

// Preparing cache configuration.
CacheConfiguration cacheCfg = new CacheConfiguration<>("personCache");

// Creating the query entity. 
QueryEntity entity = new QueryEntity("CustomKey", "Person");

// Listing all the queryable fields.
LinkedHashMap<String, String> flds = new LinkedHashMap<>();

flds.put("intKeyField", Integer.class.getName());
flds.put("strKeyField", String.class.getName());

flds.put("firstName", String.class.getName());
flds.put("lastName", String.class.getName());

entity.setFields(flds);

// Listing a subset of the fields that belong to the key.
Set<String> keyFlds = new HashSet<>();

keyFlds.add("intKeyField");
keyFlds.add("strKeyField");

entity.setKeyFields(keyFlds);

// End of new settings, nothing else here is DML related

entity.setIndexes(Collections.<QueryIndex>emptyList());

cacheCfg.setQueryEntities(Collections.singletonList(entity));

ignite.createCache(cacheCfg);
<bean class="org.apache.ignite.configuration.CacheConfiguration">
    <property name="name" value="personCache"/>
    <!-- Configure query entities -->
    <property name="queryEntities">
        <list>
            <bean class="org.apache.ignite.cache.QueryEntity">
                <!-- Registering key's class. -->
                <property name="keyType" value="CustomKey"/>
              
                <!-- Registering value's class. -->
                <property name="valueType"
                          value="org.apache.ignite.examples.Person"/>

                <!-- 
                    Defining all the fields that will be accessible from DML.
                -->
                <property name="fields">
                    <map>
                        <entry key="firstName" value="java.lang.String"/>
                        <entry key="lastName" value="java.lang.String"/>
                      	<entry key="intKeyField" value="java.lang.Integer"/>
                      	<entry key="strKeyField" value="java.lang.String"/>
                    </map>
                </property>
              
                <!-- Defining the subset of key's fields -->
                <property name="keyFields">
                    <set>
                      	<value>intKeyField<value/>
                      	<value>strKeyField<value/>
                    </set>
                </property>
            </bean>
        </list>
    </property>
</bean>

HashCode Resolution and Equality Comparison for Custom Keys

After creating a custom key and defining its fields using QueryEntity, you need to take care of the way the hash code is calculated for the key, and the way the key is compared with others.

By default, BinaryArrayIdentityResolver is used for hash code calculation and equality comparison of all the objects that are serialized and stored or transferred in Ignite. The same resolver will be used for your custom complex keys unless you change it to BinaryFieldIdentityResolver which is more suitable for keys used in DML statements, or switch to your custom resolver.

DML Operations

MERGE

MERGE is one of the most straightforward operations because it is translated into cache.put(...) and cache.putAll(...) operations depending on the number of rows that need to be inserted or updated as part of the MERGE query.

The examples below show how to update the data set with a MERGE command by either providing a list of entries, or injecting a result of a subquery execution.

cache.query(new SqlFieldsQuery("MERGE INTO Person(_key, firstName, lastName)" + 	"values (1, 'John', 'Smith'), (5, 'Mary', 'Jones')"));
cache.query(new SqlFieldsQuery("MERGE INTO someCache.Person(_key, firstName, lastName) (SELECT _key + 1000, firstName, lastName " +
   	"FROM anotherCache.Person WHERE _key > ? AND _key < ?)").setArgs(100, 200);

INSERT

The difference between MERGE and INSERT commands is that the latter adds only those entries into a cache whose keys are not there yet.

If a single key-value pair is being added into a cache then, eventually, an INSERT statement will be converted into a cache.putIfAbsent(...) operation. In other cases, when multiple key-value pairs are inserted, the DML engine creates an EntryProcessor for each pair and uses cache.invokeAll(...) to propagate the data into a cache.

The examples below show how to insert a data set with an INSERT command by either providing a list of entries or injecting a result of a subquery execution.

cache.query(new SqlFieldsQuery("INSERT INTO Person(_key, firstName, " +
         "lastName) values (1, 'John', 'Smith'), (5, 'Mary', 'Jones')"));
cache.query(new SqlFieldsQuery("INSERT INTO someCache.Person(_key, firstName, lastName) (SELECT _key + 1000, firstName, secondName " +
   	"FROM anotherCache.Person WHERE _key > ? AND _key < ?)").setArgs(100, 200);

UPDATE

This operation updates values in a cache on per field basis.

Initially, SQL engine generates and executes a SELECT query based on the UPDATE WHERE clause and only after that it modifies the existing values that satisfy the clause result.

The modification is performed via cache.invokeAll(...) operation. Basically, it means that once the result of the SELECT query is ready, SQL Engine will prepare a number of EntryProcessors and will execute all of them using cache.invokeAll(...) operation. While the data is being modified using EntryProcessors, additional checks are performed to make sure that nobody has interfered between the SELECT and the actual update.

The following example shows how to execute an UPDATE query in Apache Ignite.

cache.query(new SqlFieldsQuery("UPDATE Person set lastName = ? " +
         "WHERE _key >= ?").setArgs("Jones", 2L));

Inability to modify a key or its fields with an UPDATE query

The reason behind that is that the state of the key determines internal data layout and its consistency (key's hashing and affinity, indexes integrity). Hence, there is no way to update a key without removing it from cache. For example, the following query:

UPDATE _key = 11 where _key = 10;

may result in the following cache operations:

val = get(10);
put(11, val);
remove(10);

DELETE

DELETE statements' execution is split into two phases and is similar to the execution of UPDATE statements.

First, using a SELECT query, the SQL engine gathers those keys that satisfy the WHERE clause in the DELETE statement. Next, after having all those keys in place, it creates a number of EntryProcessors and executes them with cache.invokeAll(...). While the data is being deleted, additional checks are performed to make sure that nobody has interfered between the SELECT and the actual removal of the data.

The following example shows how to execute a DELETE query in Apache Ignite.

cache.query(new SqlFieldsQuery("DELETE FROM Person " +
         "WHERE _key >= ?").setArgs(2L));

Modifications Order

If a DML statement inserts/updates the whole value referring to _val field and at the same time tries to modify a field that belongs to _val, then the order in which the changes are applied is :

  • The _val is updated/inserted first.
  • The field gets updated.

The order never changes regardless of how you define it in the DML statement. For example, after the statement shown below gets executed, the final Person's value will be "Mike Smith", ignoring the fact that _val field appears after firstName in the query.

cache.query(new SqlFieldsQuery("INSERT INTO Person(_key, firstName, _val)" +
           " VALUES(?, ?, ?)").setArgs(1L, "Mike", new Person("John", "Smith")));

This is similar to the execution of the query like the one below where _val appears before in the statement string.

cache.query(new SqlFieldsQuery("INSERT INTO Person(_key, _val, firstName)" +
           " VALUES(?, ?, ?)").setArgs(1L, new Person("John", "Smith"), "Mike"));

The order in which the changes are applied for _val and its fields is the same for INSERT, UPDATE and MERGE statements.

Concurrent Modifications

As explained above, UPDATE and DELETE statements generate SELECT queries internally in order to get a set of cache entries that have to be modified. The keys from the set are not locked and there is a chance that their values will be modified by other queries concurrently. A special technique is implemented by the DML engine that, first, avoids locking of keys and, second, guarantees that the values will be up-to-date at the time they will be updated by a DML statement.

Basically, the engine detects a subset of the cache entries which were modified concurrently and re-executes the SELECT statement limiting its scope to the modified keys only.

Let's say the following UPDATE statement is being executed.

// Adding the cache entry.
cache.put(1, new Person("John", "Smith");
          
// Updating the entry.          
cache.query(new SqlFieldsQuery("UPDATE Person set firstName = ? " +
         "WHERE lastName = ?").setArgs("Mike", "Smith"));

Before firstName and lastName are updated, the DML engine will generate the SELECT query to get cache entries that satisfy theUPDATE statement's WHERE clause. The statement will be the following.

SELECT _key, _value, "Mike" from Person WHERE lastName = "Smith"

Right after that, the entry that was retrieved​ with the SELECT query can be updated concurrently.

cache.put(1, new Person("Sarah", "Connor"))

The DML engine will find out that the entry with key 1 was modified at the update phase of UPDATE query execution. After that, it will stop the update and will re-execute a modified version the SELECT query in order to get latest entries' values:

SELECT _key, _value, "Mike" from Person WHERE secondName = "Smith"
    AND _key IN (SELECT * FROM TABLE(KEY long = [ 1 ]))

This query will be executed only for outdated keys. In our example, there is only one key that is 1.

This process will repeat until the DML engine is sure at the update phase that all the entries, that are going to be updated, are up-to-date. The maximum number of attempts is 4. Presently, there is no configuration parameter that can change this value.

DML engine does not re-execute the SELECT query for entries that are deleted concurrently​. The query is re-executed only for entries that are still in the cache.

Known Limitations

Subqueries in WHERE clause

SELECT queries used in INSERT and MERGE statements as well as SELECT queries automatically generated by UPDATE and DELETE operations will be distributed and executed in either collocated or non-collocated distributed modes if needed.

However, if there is a subquery that is executed as part of the WHERE clause, then it will not be executed in non-collocated distributed mode. The subquery will be executed in the collocated mode over the local data set all the times.

For example, in the following query:

DELETE FROM Person WHERE _key IN
    (SELECT personId FROM "salary".Salary s WHERE s.amount > 2000)

the DML engine will generate the SELECT query in order to get a list of entries that need to be deleted. The query will be distributed and executed across the cluster and will look like the one below:

SELECT _key, _val FROM Person WHERE _key IN
    (SELECT personId FROM "salary".Salary s WHERE s.amount > 2000)

However, the subquery from IN clause (SELECT personId FROM "salary".Salary ...) will not be distributed further and will be executed over the local data set present on a cluster node.

Transactional Support

Presently, DML is not transactional. If a DML operation is executed as a part of an Ignite transaction then it will not be enlisted in the transaction and will be executed right away.

Multiversion Concurrency Control (MVCC)

Once Apache Ignite SQL Grid is empowered with MVCC, DML will become fully transactional. MVCC development is tracked in this JIRA ticket.

EXPLAIN support for DML statements

Presently, EXPLAIN is not supported for DML operations.

One possible approach is to execute EXPLAIN for the SELECTquery that is automatically generated (UPDATE, DELETE) or used (INSERT, MERGE) by DML statements. This will give you an insight on the indexes that are used when a DML operation is executed.

Example

Ignite distribution includes ready-to-run CacheQueryDmlExample as a part of its sources. This example demonstrates the usage of all the above-mentioned DML operations.

Distributed DML