Friday, 6 June 2014

Java Trie - Ascii only

Just uploading an Ascii only version, uses a byte instead of a char. Also uses the top bit as an end of word flag so you can not populate it with Ansi (ie 8 bit characters).

https://drive.google.com/file/d/0B_YZvz83_le0alZ1OFNMVW56cWs/edit?usp=sharing

Also probably worth mentioning that this implementation (and the other full 'char' version) will only return the longest match when there are multiple overlapping terms within the Trie. For example a Trie containing "cap", "capitulate", "capitulated" and a search string of "capitulated" would only return the longest, "capitulated".

I haven't implemented the compressed version as it turns out the above version doesn't use that much memory.

Saturday, 31 May 2014

Trie - Java implementation

I have ported the C++ code over to Java so I can use it on an android application I am developing. I need to try and reduce the memory footprint and my requirement is only for lower case ASCII so whilst this version uses a Java 'char' I am re-factoring to use a byte (plus re-use some of the top bits). I was also going to try and convert it to a compressed trie. There are a few Java implementations out there and to be honest the only difference with this one is I have used a sorted ArrayList rather than a map within the nodes. I then do a binary search of the list. This probably reduces performance but it should use less memory.

https://drive.google.com/folderview?id=0B_YZvz83_le0VTdDSjF2OU9CTW8&usp=sharing