Tuesday, June 1, 2010

XML Namepool 3

In the previous 2 posts about XML namepools I have tried to find the optimal implementation for the data model. In this post I'm gonna collect my results while optimizing the performance of the StringArray (=> Cosmos4) and Saxon2 (=> Saxon5) implementation for concurrent name allocation.

Since XML often contains a large number of elements with the same name (and type), the lookup of existing names is successful many times. For these "allocations" of already added names, no synchronization is necessary. A second, synchronized lookup ensures to add every name only once.

The resizing of the int and String arrays is probably a big time consumer. Also when one thread needs to resize an array, all the following threads need to wait until this thread has allocated a new, bigger array and has copied the old values to their new destination.
At least while the allocation of the new array, the following threads could still add to the old one, when the resizing is triggered before the old array is really full. The triggering thread would still be slowed down, but some of the later thread would not have to wait.
Also the copying of the part before the trigger could be done without the need to sync with the following threads and the synchronized block would become even shorter.

The resizing process could also be placed into a separate "resizing" thread. Any allocation thread would then only be slowed down while copying the running part of the array (added since the pre-allocation trigger) in the worst case.

The actual gain in performance of this approach is difficult to measure and depends strongly on the concrete usage scenario.

Concurrency Test
For a quick&easy test of the concurrent allocation of XML names I have simply extended the tests with 16 threads, each allocating all the names.
Protocol           SAXON2       SAXON5       STRARRAY    COSMOS4 
                 [ms]  [kB]   [ms]  [kB]   [ms]  [kB]  [ms]  [kB]
ALLOCATE         4250  2623   1122  2623    948 11387   343  2761
READ_PREFIX        95     0    106     0     57     0    57 -8906
READ_URI           81     0     83     0     57     0    57     0
READ_LOCALNAME     87     0     84     0     52     0    52     0
READ_QNAME        905  6235    918  6235    136  6235    84  6235
READ_CLARKNAME    161  2671    175  2671    N/A   N/A   N/A   N/A
GET_URI_CODE      827     0    782     0    N/A   N/A   N/A   N/A
GET_NS_CODE      1512     0   1407     0    N/A   N/A   N/A   N/A
GET_NS_CODE2       65     0     85     0    N/A   N/A   N/A   N/A
GET_PREFIX_CODE   604     0    554     0    N/A   N/A   N/A   N/A
COMPARE_FP         15     0     15     0     78     0    89     0
COMPARE_NAME       31     0     77     0     82     0    82     0
COMPARE_QNAME      99     0    108     0     89     0    89     0
While this is not really a realistic scenario, it shows the improvement of the double-check approach.

Why does Saxon5 need double the time of Saxon2 to compare local names?

While the improvement by replacing the name pool for LivCos (Cosmos4) seems only minor (concurrent access is minimal), the implementation is simpler and thread safe.
A Saxon version, patched with a Saxon5 implementation, shows a gain in performance of about 5% when used in LivCos.


  1. If update can cause reorganization of the data structure (to expand it), then presumably read access needs to be disabled at least during this reorganization. So there is presumably a need for at least a shared read lock to be acquired. The current Saxon NamePool avoids this by not doing dynamic reorganization; it is effectively a fixed-size hash table extended only by adding things to hash overflow chains.

  2. Thanks for your comment!

    The idea is to allocate, initialize, copy the new array before switching the actual array reference atomically. For existing nameCodes (array indexes) access is valid for the old and the new (copy) array instance at any time.
    In allocation temporary inconsistencies between the arrays, while setting the indexes for new nameCodes, could produce false-negative lookup for the first check. The second, synchronized one corrects these cases.