Wednesday, December 10, 2008

Apache Derby to Handle Case Insensitive Search

I have been using Apache Derby which can be bundled easily with the 'Lite' version of the product that I am developing in Flex/Java. The fact that by default Derby gives different result set for "where name='chua'" and "where name='CHUA'" has been an egg on my face. Using UPPER/LOWER with function index doesn't sounds good solution to me, neither to my boss. Derby folks are actually aware of the importance to support this, and several proposals have been made in the mailing list, but none have been implemented yet.

However, I am lucky enough to find a solution that I like with Collator here:-

http://blogs.sun.com/kah/entry/user_defined_collation_in_apache

The approach is to create a custom collator provider which returns collator with strength of SECONDARY, which gives case-insensitive comparison for EQUAL and LIKE.

Copied from the blog above:-

"
  • Create a class that extends java.text.spi.CollatorProvider and returns a collator that orders strings the way you want it to
  • Create a text file named META-INF/services/java.text.spi.CollatorProvider which contains one line with the name of your collator provider class
  • Put the compiled class file and the text file in a jar file which you drop into your JRE's lib/ext directory or in one of the directories specified by the java.ext.dirs property
"

And the sample custom collator provider:-

"
public class MyCollatorProvider extends CollatorProvider {
public Locale[] getAvailableLocales() {
return new Locale[] {
new Locale("en", "US", "caseinsensitive")
};
}
public Collator getInstance(Locale locale) {
Collator c = Collator.getInstance(Locale.US);
// Ignore tertiary differences (case differences)
c.setStrength(Collator.SECONDARY);
return c;
}
}
"
Make sure the custom collator provider jar is in java.ext.dirs for JRE that starts Derby. Create the database using connection string similar to following:-
'jdbc:derby:NoCaseDB;territory=en_US_caseinsensitive;
collation=TERRITORY_BASED;create=true'

Using territory based collation disables query optimization when doing LIKE query, I don't know the reason behind it. In the application that I am building, the user is allowed to use every single column to search and filter data. But, not every single column is indexed. ;-)

No comments:

Post a Comment