Ehcache(2.9.x) - API Developer Guide, Searching a Cache

About Searching

The Search API allows you to execute arbitrarily complex queries against caches. The development of alternative indexes on values provides the ability for data to be looked up based on multiple criteria instead of just keys.

Note: Terracotta BigMemory Go and BigMemory Max products use indexing. The Search API queries open-source Ehcache using a direct search method. For more information about indexing, see Best Practices for Optimizing Searches.

Searchable attributes can be extracted from both keys and values. Keys, values, or summary values (Aggregators) can all be returned. Here is a simple example: Search for 32-year-old males and return the cache values.

Results results = cache.createQuery().includeValues()

        .addCriteria(age.eq(32).and(gender.eq("male"))).execute();

You can formulate queries using the Search API.

// BigMemory Search API:

Attribute<Integer> age = cache.getSearchAttribute("age");

Person.createQuery().addCriteria(age.gt(30)).includeValues().execute();

Before creating a query, the cache configuration must be prepared as described in Making a Cache Searchable.

For more information about creating queries using the Search API, see Creating a Query.

What is Searchable?

Searches can be performed against Element keys and values, but they must be treated as attributes. Some Element keys and values are directly searchable and can simply be added to the search index as attributes. Some Element keys and values must be made searchable by extracting attributes with supported search types out of the keys and values. It is the attributes themselves that are searchable.

Making a Cache Searchable

Caches can be made searchable, on a per cache basis, either by configuration or programmatically.

By Configuration

Caches are made searchable by adding a <searchable/> tag to the cache definition in the ehcache.xml file.

<cache name="cache2" maxBytesLocalHeap="16M" eternal="true" maxBytesLocalOffHeap="256M">

    <persistence strategy="localRestartable"/>

    <searchable/>

</cache>

This configuration will scan keys and values and, if they are of supported search types, add them as attributes called “key” and “value,” respectively. If you do not want automatic indexing of keys and values, you can disable it using:

<cache name="cacheName" ...>

    <searchable keys="false" values="false">

       ...

    </searchable>

</cache>

You might want to do this if you have a mix of types for your keys or values. The automatic indexing will throw an exception if types are mixed.

If you think that you will want to add search attributes after the cache is initialized, you can explicitly indicate the dynamic search configuration. Set the allowDynamicIndexing attribute to “true” to enable use of the Dynamic Attributes extractor. For more information about the Dynamic Attributes extractor, see Defining Attributes.

<cache name="cacheName" ...>

    <searchable allowDynamicIndexing="true">

       ...

    </searchable>

</cache>

Often keys or values will not be directly searchable and instead you will need to extract searchable attributes from the keys or values. The following example shows a more typical case. Attribute extractors are explained in more detail in Defining Attributes.

<cache name="cache3" maxEntriesLocalHeap="10000" eternal="true" maxBytesLocalOffHeap="10G">

    <persistence strategy="localRestartable"/>

    <searchable>

        <searchAttribute name="age" class="net.sf.ehcache.search.TestAttributeExtractor"/>

        <searchAttribute name="gender" expression="value.getGender()"/>

    </searchable>

</cache>

Programmatically

The following example shows how to programmatically create the cache configuration with search attributes.

Configuration cacheManagerConfig = new Configuration();

CacheConfiguration cacheConfig = new CacheConfiguration("myCache", 0).eternal(true);

Searchable searchable = new Searchable();

cacheConfig.addSearchable(searchable);

// Create attributes to use in queries.

searchable.addSearchAttribute(new SearchAttribute().name("age"));

// Use an expression for accessing values.

searchable.addSearchAttribute(new SearchAttribute().name("first_name").expression("value.getFirstName()"));

searchable.addSearchAttribute(new SearchAttribute().name("last_name").expression("value.getLastName()"));

searchable.addSearchAttribute(

        new SearchAttribute().name("zip_code").className("net.sf.ehcache.search.TestAttributeExtractor"));

cacheManager = new CacheManager(cacheManagerConfig);

cacheManager.addCache(new Cache(cacheConfig));

Ehcache myCache = cacheManager.getEhcache("myCache");

// Now create the attributes and queries, then execute.

...

To learn more about the Search API, see the net.sf.ehcache.search* packages in the Ehcache Javadoc.

Defining Attributes

In addition to configuring a cache to be searchable, you must define the attributes to be used in searches.

Attributes are extracted from keys or values during search by using AttributeExtractors. An extracted attribute must be one of the following types:

Boolean
Byte
Character
Double
Float
Integer
Long
Short
String
java.util.Date
java.sql.Date
Enum

These types correspond to the AttributeType enum specified by the Ehcache Javadoc at http://ehcache.org/apidocs/2.9/net/sf/ehcache/search/attribute/AttributeType.html.

Type name matching is case sensitive. For example, Double resolves to the java.lang.Double class type, and double is interpreted as the primitive double type.

Search API Example

<searchable>

    <searchAttribute name="age" type="Integer"/>

</searchable>

If an attribute cannot be found or is of the wrong type, an AttributeExtractorException is thrown on search execution. !

Note: On the first use of an attribute, the attribute type is detected, validated against supported types, and saved automatically. Once the type is established, it cannot be changed. For example, if an integer value was initially returned for attribute named “Age” by the attribute extractor, it is an error for the extractor to return a float for this attribute later on.

Well-known Attributes

The parts of an Element that are well-known attributes can be referenced by some predefined, well-known names. If a key and/or value is of a supported search type, it is added automatically as an attribute with the name “key” or “value.” These well-known attributes have the convenience of being constant attributes made available in the Query class. For example, the attribute for “key” can be referenced in a query by Query.KEY. For even greater readability, statically import so that, in this example, you would use KEY.

Well-known Attribute Name	Attribute Constant
key	Query.KEY
value	Query.VALUE

Reflection Attribute Extractor

The ReflectionAttributeExtractor is a built-in search attribute extractor that uses JavaBean conventions and also understands a simple form of expression. Where a JavaBean property is available and it is of a searchable type, it can be declared:

<cache>

  <searchable>

    <searchAttribute name="age"/>

  </searchable>

</cache>

The expression language of the ReflectionAttributeExtractor also uses method/value dotted expression chains. The expression chain must start with “key”, “value”, or “element”. From the starting object, a chain of method calls or field names follows. Method calls and field names can be freely mixed in the chain:

<cache>

  <searchable>

    <searchAttribute name="age" expression="value.person.getAge()"/>

  </searchable>

</cache>

<cache>

  <searchable>

     <searchAttribute name="name" expression="element.toString()"/>

  </searchable>

</cache>

Note: The method and field name portions of the expression are case-sensitive.

Custom Attribute Extractor

In more complex situations, you can create your own attribute extractor by implementing the AttributeExtractor interface. The interface's attributeFor() method returns the attribute value for the element and attribute name you specify.

Note: These examples assume there are previously created Person objects containing attributes such as name, age, and gender.

Provide your extractor class:

<cache name="cache2" maxEntriesLocalHeap="0" eternal="true">

  <persistence strategy="none"/>

  <searchable>

     <searchAttribute name="age" class="net.sf.ehcache.search.TestAttributeExtractor"/>

  </searchable>

</cache>

A custom attribute extractor could be passed an Employee object to extract a specific attribute:

returnVal = employee.getdept();

If you need to pass state to your custom extractor, specify properties:

<cache>

  <searchable>

    <searchAttribute name="age" class="net.sf.ehcache.search.TestAttributeExtractor"

      properties="foo=this,bar=that,etc=12" />

  </searchable>

</cache>

If properties are provided, the attribute extractor implementation must have a public constructor that accepts a single java.util.Properties instance.

Dynamic Attributes Extractor

The DynamicAttributesExtractor provides flexibility by allowing the search configuration to be changed after the cache is initialized. This is done with one method call, at the point of element insertion into the cache. The DynamicAttributesExtractor method returns a map of attribute names to index and their respective values. This method is called for every Ehcache.put() and replace() invocation.

Assuming that we have previously created Person objects containing attributes such as name, age, and gender, the following example shows how to create a dynamically searchable cache and register the DynamicAttributesExtractor:

Configuration config = new Configuration();

config.setName("default");

CacheConfiguration cacheCfg = new CacheConfiguration("PersonCache");

cacheCfg.setEternal(true);

cacheCfg.terracotta(new TerracottaConfiguration().clustered(true));

Searchable searchable = new Searchable().allowDynamicIndexing(true);

cacheCfg.addSearchable(searchable);

config.addCache(cacheCfg);

CacheManager cm = new CacheManager(config);

Ehcache cache = cm.getCache("PersonCache");

final String attrNames[] = {"first_name", "age"};

// Now you can register a dynamic attribute extractor to index

// the cache elements, using a subset of known fields

cache.registerDynamicAttributesExtractor(new DynamicAttributesExtractor() {

    Map<String, Object> attributesFor(Element element) {

        Map<String, Object> attrs = new HashMap<String, Object>();

        Person value = (Person)element.getObjectValue();

        // For example, extract first name only

        String fName = value.getName() == null ? null : value.getName().split("\\s+")[0];

        attrs.put(attrNames[0], fName);

        attrs.put(attrNames[1], value.getAge());

        return attrs;

    }

});

// Now add some data to the cache

cache.put(new Element(10, new Person("John Doe", 34, Person.Gender.MALE)));

Given the code above, the newly put element would be indexed on values of name and age fields, but not gender. If, at a later time, you would like to start indexing the element data on gender, you would need to create a new DynamicAttributesExtractor instance that extracts that field for indexing.

Dynamic Search Rules

To use the DynamicAttributesExtractor, the cache must be configured to be searchable and dynamically indexable. For information about making a cache searchable, see Making a Cache Searchable.
A dynamically searchable cache must have a dynamic extractor registered BEFORE data is added to it. (This is to prevent potential races between extractor registration and cache loading which might result in an incomplete set of indexed data, leading to erroneous search results.)
Each call on the DynamicAttributesExtractor method replaces the previously registered extractor, because there can be at most one extractor instance configured for each such cache.
If a dynamically searchable cache is initially configured with a predefined set of search attributes, this set of attributes is always be queried for extracted values, regardless of whether or not a dynamic search attribute extractor has been configured.
The initial search configuration takes precedence over dynamic attributes, so if the dynamic attribute extractor returns an attribute name already used in the initial searchable configuration, an exception is thrown.

Creating a Query

Ehcache uses a fluent, object-oriented query API, following the principles of a Domain-Specific Language (DSL), which should be familiar to Java programmers. For example:

Query query = cache.createQuery().addCriteria(age.eq(35)).includeKeys().end();

Results results = query.execute();

Using Attributes in Queries

If declared and available, the well-known attributes are referenced by their names or the convenience attributes are used directly:

Results results = cache.createQuery().addCriteria(Query.KEY.eq(35)).execute();

Results results = cache.createQuery().addCriteria(Query.VALUE.lt(10)).execute();

Other attributes are referenced by the names in the configuration:

Attribute<Integer> age = cache.getSearchAttribute("age");

Attribute<String> gender = cache.getSearchAttribute("gender");

Attribute<String> name = cache.getSearchAttribute("name");

Expressions

A Query is built up using Expressions. Expressions can include logical operators such as <and> and <or>, and comparison operators such as <ge> (>=), <between>, and <like>. The configuration, addCriteria(...), is used to add a clause to a query. Adding a further clause automatically “<and>s” the clauses.

query = cache.createQuery().includeKeys()

　　　　　　　　.addCriteria(age.le(65))

　　　　　　　　.add(gender.eq("male"))

　　　　　　　　.end();

Both logical and comparison operators implement the Criteria interface. To add a criterion with a different logical operator, explicitly nest it within a new logical operator Criteria object. For example, to check for age = 35 or gender = female:

query.addCriteria(new Or(age.eq(35), gender.eq(Gender.FEMALE)));

More complex compound expressions can be created through additional nesting. For a complete list of expressions, see the Expression Javadoc at http://ehcache.org/xref/net/sf/ehcache/search/expression/package-frame.html.

List of Operators

Operators are available as methods on attributes, so they are used by adding a “.”. For example, “lt” means “less than” and is used as age.lt(10), which is a shorthand way of saying age LessThan(10).

Shorthand	Criteria Class	Description
and	And	The Boolean AND logical operator.
between	Between	A comparison operator meaning between two values.
eq	EqualTo	A comparison operator meaning Java “equals to” condition.
gt	GreaterThan	A comparison operator meaning greater than.
ge	GreaterThanOrEqual	A comparison operator meaning greater than or equal to.
in	InCollection	A comparison operator meaning in the collection given as an argument.
lt	LessThan	A comparison operator meaning less than.
le	LessThanOrEqual	A comparison operator meaning less than or equal to.
ilike	ILike	A regular expression matcher. “?” and “*” may be used. Note that placing a wildcard in front of the expression will cause a table scan. ILike is always case insensitive.
isNull	IsNull	Tests whether the value of an attribute with given name is null.
notNull	NotNull	Tests whether the value of an attribute with given name is NOT null.
not	Not	The Boolean NOT logical operator,
ne	NotEqualTo	A comparison operator meaning not the Java “equals to” condition,
or	Or	The Boolean OR logical operator.

Note: For Strings, the operators are case-insensitive.

Making Queries Immutable

By default, a query can be executed, modified, and re-executed. If end() is called, the query is made immutable.

Obtaining and Organizing Query Results

Queries return a Results object that contains a list of objects of class Result. Each element in the cache that a query finds is represented as a Result object. For example, if a query finds 350 elements, there will be 350 Result objects. However, if no keys or attributes are included but aggregators are included, there is exactly one Result present.

A Result object can contain:

The Element key - when includeKeys() is added to the query,
The Element value - when includeValues() is added to the query,
Predefined attribute(s) extracted from an Element value - when includeAttribute(...) is added to the query. To access an attribute from a Result, use getAttribute(Attribute<T> attribute).
Aggregator results - Aggregator results are summaries computed for the search. They are available through Result.getAggregatorResults, which returns a list of Aggregators in the same order in which they were used in the query.

Aggregators

Aggregators are added with query.includeAggregator(\<attribute\>.\<aggregator\>). For example, to find the sum of the age attribute:

query.includeAggregator(age.sum());

For a complete list of aggregators, see Aggregators in the Ehcache Javadoc.

Ordering Results

Query results can be ordered in ascending or descending order by adding an addOrderBy clause to the query. This clause takes as parameters the attribute to order by and the ordering direction. For example, to order the results by ages in ascending order:

query.addOrderBy(age, Direction.ASCENDING);

Grouping Results

Query query results can be grouped similarly to using an SQL GROUP BY statement. The GroupBy feature provides the option to group results according to specified attributes. You can add an addGroupBy clause to the query, which takes as parameters the attributes to group by. For example, you can group results by department and location:

Query q = cache.createQuery();

Attribute<String> dept = cache.getSearchAttribute("dept");

Attribute<String> loc = cache.getSearchAttribute("location");

q.includeAttribute(dept);

q.includeAttribute(loc);

q.addCriteria(cache.getSearchAttribute("salary").gt(100000));

q.includeAggregator(Aggregators.count());

q.addGroupBy(dept, loc);

The GroupBy clause groups the results from includeAttribute( ) and allows aggregate functions to be performed on the grouped attributes. To retrieve the attributes that are associated with the aggregator results, you can use:

String dept = singleResult.getAttribute(dept);

String loc = singleResult.getAttribute(loc);

GroupBy Rules

Grouping query results adds another step to the query. First, results are returned, and second the results are grouped. Note the following rules and considerations:

In a query with a GroupBy clause, any attribute specified using includeAttribute( ) should also be included in the GroupBy clause.
Special KEY or VALUE attributes cannot be used in a GroupBy clause. This means that includeKeys( ) and includeValues( ) cannot be used in a query that has a GroupBy clause.
Adding a GroupBy clause to a query changes the semantics of any aggregators passed in, so that they apply only within each group.
As long as there is at least one aggregation function specified in a query, the grouped attributes are not required to be included in the result set, but they are typically requested anyway to make result processing easier.
An addCriteria clause applies to all results prior to grouping.
If OrderBy is used with GroupBy, the ordering attributes are limited to those listed in the GroupBy clause.

Limiting the Size of Results

By default, a query can return an unlimited number of results. For example, the following query will return all keys in the cache.

Query query = cache.createQuery();

query.includeKeys();

query.execute();

If too many results are returned, it could cause an OutOfMemoryError The maxResults clause is used to limit the size of the results. For example, to limit the above query to the first 100 elements found:

Query query = cache.createQuery();

query.includeKeys();

query.maxResults(100);

query.execute();

Note: When maxResults is used with GroupBy, it limits the number of groups.

When you are done with the results, call the discard() method to free up resources.

For additional information about managing large result sets, see the topics that relate to pagination in Best Practices for Optimizing Searches.

Interrogating Results

To determine what a query returned, use one of the interrogation methods on Results.:

hasKeys( )
hasValues( )
hasAttributes( )
hasAggregators( )

Best Practices for Optimizing Searches

Construct searches by including only the data that is actually required.
- Only use includeKeys() and/or includeAttribute() if those values are required for your application logic.
- If you do not need values or attributes, be careful not to burden your queries with unnecessary work. For example, if result.getValue() is not called in the search results, do not use includeValues() in the query.
- Consider if it would be sufficient to get attributes or keys on demand. For example, instead of running a search query with includeValues() and then result.getValue(), run the query for keys and include cache.get() for each individual key.
Note: The includeKeys() and includeValues() methods have lazy deserialization, meaning that keys and values are deserialized only when result.getKey() or result.getValue() is called. However, calls to includeKeys() and includeValues() do take time, so consider carefully when constructing your queries.
Searchable keys and values are automatically indexed by default. If you are not including them in your query, turn off automatic indexing with the following:
```
<cache name="cacheName" ...>

  <searchable keys="false" values="false"/>

  　...

  </searchable>

</cache>
```
Limit the size of the result set. Depending on your use case, you might consider maxResults, an Aggregator, or pagination:
If getting a subset of all the possible results quickly is more important than receiving all the results, consider using query.maxResults(int number_of_results). Sometimes maxResults is useful where the result set is ordered such that the items you want most are included within the maxResults.
If all you want is a summary statistic, use a built-in Aggregator function, such as count(). For details, see the net.sf.ehcache.search.aggregator package in the Ehcache Javadoc.
Make your search as specific as possible.
Tip: If you want leading wildcard searches, you should create a <searchAttribute> with the string value reversed in it, so that your query can use the trailing wildcard instead.
When possible, use the query criteria “Between” instead of “LessThan” and “GreaterThan”, or “LessThanOrEqual” and “GreaterThanOrEqual”. For example, instead of using le(startDate) and ge(endDate), try not(between(startDate,endDate)).
Index dates as integers. This can save time and can also be faster if you have to do a conversion later on.

Concurrency Considerations

Unlike cache operations, which have selectable concurrency control or transactions, queries are asynchronous and search results are “eventually consistent” with the caches.

Index Updating

Although indexes are updated synchronously, their state lags slightly behind that of the cache. The only exception is when the updating thread performs a search.

For caches with concurrency control, an index does not reflect the new state of the cache until:

The change has been applied to the cluster.
For a cache with transactions, when commit has been called.

Query Results

Unexpected results might occur if:

A search returns an Element reference that no longer exists.
Search criteria select an Element, but the Element has been updated.
Aggregators, such as sum(), disagree with the same calculation done by redoing the calculation yourself by re-accessing the cache for each key and repeating the calculation.
A value reference refers to a value that has been removed from the cache, and the cache has not yet been reindexed. If this happens, the value is null but the key and attributes supplied by the stale cache index are non-null. Because values in a cache are also allowed to be null, you cannot tell whether your value is null because it has been removed from the cache after the index was last updated or because it is a null value.

Recommendations

Because the state of the cache can change between search executions, the following is recommended:

Add all of the aggregators you want for a query at once, so that the returned aggregators are consistent.
Use null guards when accessing a cache with a key returned from a search.

Options for Working with Nulls

The Search API supports using the presence of a null as search criteria.

myQuery.addCriteria(cache.getAttribute("middle_name").isNull());

It also supports using the absence of a null as search criteria:

myQuery.addCriteria(cache.getAttribute("middle_name").notNull());

Which is equivalent to:

myQuery.addCriteria(cache.getAttribute("middle_name").isNull().not());

Alternatively, you can call constructors to set up equivalent logic:

Criteria isNull = new IsNull("middle_name");

Criteria notNull = new NotNull("middle_name");