Thursday, May 20, 2010

XML Namepool 2

Five different new NamePool implementations try to handle XML names most efficiently and with the smallest memory footprint possible.

Note, that these implementations are mostly prove-of-concept and have not yet been tested really!

Saxon 9.2
The current Saxon implementation of a name pool.

LivCos 0.7
The current, quick&practical name pool implementation for LivCos.

QualifedName Object
Strait forward approach, modeling the qualified name as simple Java objects.

LivCos2
An optimized implementation with a fingerprint array and hash map.

Saxon2
The optimized implementation for the Saxon interface.

Int-Wrapper for QualifiedName
Int index for an array of QualifiedNames to reduce DOM memory foot print on 64bit systems. Would allow to be integrated into Saxon.

String Array
Implementation of QualifiedName objects with a "simple" String array.

Evaluation
To evaluate the performance of these approaches, I've written a small, simple console application, stopping the time between tasks and roughly measuring memory usage. For the first run, the application generates 100'000 qualified names in 500 namespaces, each with 1 prefix (=> 200 local names per namespace) and adds them to the chosen namepool. Then it reads the different name parts, gets codes and compares parts of the name.
Protocol           SAXON           QNAME           COSMOS 
                 [ms]  [kB]      [ms]  [kB]      [ms]  [kB]
ALLOCATE         1644  2374       146 22713     13841 10178
READ_PREFIX      3365     0        53     0        35 -8593
READ_URI         3284     0        53     0        23     0
READ_LOCALNAME   3296     0        56     0        34     0
READ_QNAME       7615  6235       191  6235        22     0
READ_CLARKNAME  11545 -6235        53     0      1599  8437
GET_URI_CODE    53640     0       N/A   N/A     17262     0
GET_NS_CODE     95210     0       N/A   N/A       N/A   N/A
GET_NS_CODE2     3318     0       N/A   N/A       N/A   N/A
GET_PREFIX_CODE 32992     0       N/A   N/A       N/A   N/A
COMPARE_FP         14     0        71     0        26     0
COMPARE_NAME     3352     0        91     0        30     0
COMPARE_QNAME    3390     0        81     0        24     0
Every name is read and compared 100 times, the heap is on 1.5GB and it runs with the "-server" option.

Since Saxon handles "StandardNames" differently, a second run measures this set of standard names (~170).
Protocol           SAXON           STANDARD        SAXON2          STRARRAY
[ms]  [kB]      [ms]  [kB]      [ms]  [kB]      [ms]  [kB]
ALLOCATE          765    49       579     0       174   108       775 40049
READ_PREFIX        22     0        18   -20        35     0         9     0
READ_URI           24     0        18     0        26     0         9     0
READ_LOCALNAME     22     0        15     0        26     0         9     0
READ_QNAME        201    10       175    10       203    10        10    10
READ_CLARKNAME    328    10       321    10        29    10        10     0
GET_URI_CODE       71     0       N/A   N/A        57     0       N/A   N/A
GET_NS_CODE       125     0       N/A   N/A        74     0       N/A   N/A
GET_NS_CODE2       30     0       N/A   N/A        25     0       N/A   N/A
GET_PREFIX_CODE    58     0       N/A   N/A        36     0       N/A   N/A
COMPARE_FP         44     0        41     0        36     0        53     0
COMPARE_NAME      166     0        44     0        64     0        52     0
COMPARE_QNAME     182     0        56     0        81     0        50     0
In this run the names are "allocated" and read 10'000 times and compared 100'000 times.

The values for StandardNames differ less for the different implementations. Maybe there is a chance for Saxon to simplify, optimize the handling for StandardNames.

Optimize Saxon 9.2
The namepool in Saxon focuses on fingerprint comparison. I guess, it is important for processing XSLT efficiently. The optimized implementation for Saxon also uses a fingerprint bit mask on the name code. It uses some additional arrays and hash maps.
Protocol           SAXON           SAXON2          STRARRAY
[ms]  [kB]      [ms]  [kB]      [ms]  [kB]
ALLOCATE         1644  2374       240  2620       143 14822
READ_PREFIX      3365     0        94     0        25     0
READ_URI         3284     0        79     0        25     0
READ_LOCALNAME   3296     0        76     0        26     0
READ_QNAME       7615  6235       923  6235       119  6235
READ_CLARKNAME  11545 -6235       148  2671        27     0
GET_URI_CODE    53640     0       786     0       N/A   N/A
GET_NS_CODE     95210     0      1385     0       N/A   N/A
GET_NS_CODE2     3318     0        58     0       N/A   N/A
GET_PREFIX_CODE 32992     0       553     0       N/A   N/A
COMPARE_FP         14     0        14     0        34     0
COMPARE_NAME     3352     0        29     0        38     0
COMPARE_QNAME    3390     0        90     0        35     0
The StringArray uses the index as the name code. Still the compares seem to be really fast. Theoretically it would also allow to replace the Saxon namepool (int name codes), but it needs double the time to compare fingerprints (how big of a drop in performance for Saxon would that be?).

To try the Saxon2 implementation with the Saxon processor you also need to replace the StandardNames class with this code.

QualifiedName Objects
The major drawback for the direct approach, modeling the names as individual objects, is the need for every DOM node to hold an object reference (64bit systems). Another disadvantage is the difficulty to integrate into the existing Saxon processor. Both could be worked around by putting the objects into an array and working with the array indexes as the name codes.
Protocol           QNAME           QNAMEINT        STRARRAY
[ms]  [kB]      [ms]  [kB]      [ms]  [kB]
ALLOCATE          146 22713        82 10890       143 14822
READ_PREFIX        53     0        46     0        25     0
READ_URI           53     0        45     0        25     0
READ_LOCALNAME     56     0        45     0        26     0
READ_QNAME        191  6235       113  6235       119  6235
READ_CLARKNAME     53     0       118  8906        27     0
GET_URI_CODE      N/A   N/A       N/A   N/A       N/A   N/A
GET_NS_CODE       N/A   N/A       N/A   N/A       N/A   N/A
GET_NS_CODE2      N/A   N/A       N/A   N/A       N/A   N/A
GET_PREFIX_CODE   N/A   N/A       N/A   N/A       N/A   N/A
COMPARE_FP         71     0        68     0        34     0
COMPARE_NAME       91     0        66     0        38     0
COMPARE_QNAME      81     0        66     0        35     0
While the String Array is not really stores QualifiedName objects, it similarly compounds the five strings of a name in an block within the array.

As expected the large number of objects use a lot of memory, but the large java.util.HashMaps are somehow missing in the measurement...

Optimize LivCos
In the context of LivCos we have one big namepool, we don't need to compare fingerprint a lot and there can be many local names in the Cosmos namespace. For the SAX interface the QName needs to be read a lot.
Protocol           COSMOS          COSMOS3         STRARRAY        SAXON2       
[ms]  [kB]      [ms]  [kB]      [ms]  [kB]      [ms]  [kB]     
ALLOCATE        13841 10178       272  2578       143 14822       240  2620     
READ_PREFIX        35 -8593        75     0        25     0        94     0     
READ_URI           23     0        35     0        25     0        79     0     
READ_LOCALNAME     34     0        34     0        26     0        76     0     
READ_QNAME         22     0       793  6235       119  6235       923  6235     
READ_CLARKNAME   1599  8437       828  2671        27     0       148  2671     
GET_URI_CODE    17262     0       792     0       N/A   N/A       786     0     
GET_NS_CODE       N/A   N/A       N/A   N/A       N/A   N/A      1385     0     
GET_NS_CODE2      N/A   N/A       N/A   N/A       N/A   N/A        58     0     
GET_PREFIX_CODE   N/A   N/A       564     0       N/A   N/A       553     0     
COMPARE_FP         26     0        14     0        34     0        14     0     
COMPARE_NAME       30     0        29     0        38     0        29     0     
COMPARE_QNAME      24     0        40     0        35     0        90     0     
While COSMOS3 is an adapted SAXON2 copy, the StringArray seems to be the way to go. The key for the main hash map could be optimized and the clark name, since not really needed, could be removed.

Notes
- optimal fingerprint compare requires encoded name codes
- name code constants for StandardNames to be used in switch statements
- index or reference to the full qualified name allows simple caching of qNames and clarkNames
- self-made hash-to-int maps need improvement
- server HotSpot JVM optimizes the test noticeably

some values for the "-client" JVM
Protocol           SAXON           QNAME           STRARRAY        SAXON2          COSMOS3 
                 [ms]  [kB]      [ms]  [kB]      [ms]  [kB]      [ms]  [kB]      [ms]  [kB]
ALLOCATE         1756  2374        96 22713        68 14822       163  2620       164  2578
READ_PREFIX      3517     0       155     0        61     0       122     0       115     0
READ_URI         3445     0       153     0        64     0       102     0        75     0
READ_LOCALNAME   3421     0       158     0        59     0        75     0        68     0
READ_QNAME       8094  6235       241  6235       122  6235      1211  6235      1230  6235
READ_CLARKNAME  11983 -6235       150     0       138     0       157  2671      1372  2671
GET_URI_CODE    78621     0       N/A   N/A       N/A   N/A       967     0       968     0
GET_NS_CODE    106389     0       N/A   N/A       N/A   N/A      1486     0       N/A   N/A
GET_NS_CODE2     3457     0       N/A   N/A       N/A   N/A        89     0       N/A   N/A
GET_PREFIX_CODE 26850     0       N/A   N/A       N/A   N/A       390     0       392     0
COMPARE_FP         30     0       130     0        54     0        30     0        30     0
COMPARE_NAME     3515     0       130     0        53     0        50     0        52     0
COMPARE_QNAME    3485     0       139     0        93     0        91     0        91     0

Saturday, May 15, 2010

XML Namepool

Following a conversation in the Open Discussion forum of Saxon, I've started to think about different aspects of a XML namepool again. I have implemented a quick@dirty one for LivCos quite a while ago already, but always kept in mind to optimize it, once I'll find the time. Well, current weather conditions in Switzerland tell me not to go hang gliding, so...

First a deeper study of the net.saxon.om.Namepool. Here some summary notes:
- int name code
- 10bit prefix per URI index | 10bit bucket index | 10bit local name hash
- special treatment for "StandardNames"
- name code 0-1023
- 3bit URI | 7bit localname
- special codes for URIs and prefixes
- namespace code: 16bit prefix code | 16bit URI code
- URI code: index to "uri" array
- prefix code: index to "prefixes" array
- NameEntry object for every localName/nsURI
- local name hash table to lookup NameEntry
- linked NameEntry list per hash slot
- QNames and clark-names are built on demand
- int[] for every URI to hold the prefixes-per-URI
- bit mask for fingerprints 0xfffff
- max. fingerprints: ~1024..~1000000 (depending on the hash distribution)
- max. URI codes: 32000
- max. prefix codes: 32000
- max. prefixes per URI: 1023
- String.intern() on NameEntry.localname
- String.intern() on clark names

I guess, the implementation focuses mostly on solving the concrete requirements of Saxon. Namespace URI lookups for XSLT, XML Schema,... are handled via switch cases, while others are scanned via equals (probably less important in the contexts used). The hash code / bucket index part of a non-standard name-code seems to be a compact way of addressing the NameEntries and also seems to be able to handle thousands of fingerprints efficiently. A simple String.intern() helps to compare clark-names quickly. Caches for qualified names don't seem necessary.
I like the idea of encoding the prefix index per namespace URI.

My older LivCos Namepool implementation is focusing on qualified names (for some reason the SAX interface asks for them a lot). The name code consist of 12bits for the namespace URI index and 20 bits for the qualified name index. So it can only handle 4000 namespaces. But it's fast to compare, lookup qualified names. For the fingerprint it needs to index one step deeper.

The QualifiedName namepool implementation, suggested in the Saxon discussion, models every name in a uri-localname-prefix Java object. Here the drawbacks are difficulty to integrate into Saxon, the memory usage (specially on 64bit systems) and the large number of small objects for the JVM to handle.

Yet another new Idea
We start with an int name code. We focus on the fingerprints and encode a simple index to a fingerprint array into the name code. Then we use 10 bits for the prefix-per-URI index. => 4 million fingerprints, 1000 prefixes per URI. A fingerprint entry contains an index to the local-name array and one to the URI array. The URI index is also used to address the list of prefix codes. Lookup and compares should be quite fast.

The fingerprint index could also be used to address a possible clark-name cache. Qualified name caches would become more difficult.

Wait for my findings and some comparison data in the next post.