The Problem
Imaging a problem ;) You have hundreds of XML documents, containing elements of different types, defined in different namespaces. On average a document contains about 100 elements in 5 namespaces, while 50 elements belong to the namespace of the root element. Each of the namespaces is one out of 10000.
Then you want to transform these documents into an output, lets say into HTML. You have XSLT template-match definitions for every element in every namespace (~50 * 10000 = 50000 matches).
One XSLT stylesheet defining all the templates, importing all the definitions, probably wouldn't perform very good and won't even fit in memory, so you have to break it down somehow...
Solutions
1) The process first scans the input document for all the namespaces contained.
Then it dynamically builds a stylesheet, importing the according stylesheets
per namespace. The result gets compiled and the input transformed.
2) The process scans the input document for the namespaces contained and builds
a pipeline with the pre-compiled stylesheets, handling these namespaces. The
input gets transformed step-by-step.
3) The process starts with the stylesheet for the root namespace. For all the
unknown namespaces transforms the node with the pre-compiled stylesheet for
the according namespace via the saxon:transform() extension. Every "foreign"
node gets transformed separately.
Tests
To compare these approaches, I've written a small test:
http://code.google.com/p/livcos/source/browse/proto/OpenXslt/src/proto/
(tested with 2 namespaces, extended to 3 namespaces)
The input document contains 1..9999 elements per namespace for 2 namespaces. The root node is in another namespace and holds 0..100 content elements (with the same namespace) in addition to the "foreign" elements. These "foreign" elements also contain content elements with the same namespace.
Here the results:
0 content elements, 100 runs
nodes |composed | pre-compiled | process pipe | extension
1 | 231ms | 16ms (-93%) | 41ms (-82%) | 33ms (-86%)
2 | 130ms | 9ms (-93%) | 19ms (-86%) | 38ms (-71%)
4 | 110ms | 8ms (-93%) | 13ms (-88%) | 46ms (-58%)
6 | 102ms | 8ms (-92%) | 14ms (-87%) | 40ms (-62%)
8 | 97ms | 9ms (-91%) | 15ms (-84%) | 46ms (-53%)
10 | 85ms | 10ms (-88%) | 17ms (-80%) | 55ms (-36%)
12 | 85ms | 12ms (-86%) | 20ms (-76%) | 64ms (-26%)
15 | 85ms | 13ms (-85%) | 24ms (-72%) | 78ms ( -9%)
20 | 79ms | 16ms (-80%) | 29ms (-63%) | 106ms ( 33%)
40 | 89ms | 29ms (-68%) | 55ms (-39%) | 208ms (133%)
60 | 101ms | 42ms (-58%) | 84ms (-17%) | 303ms (199%)
80 | 117ms | 57ms (-52%) | 107ms ( -9%) | 400ms (241%)
100 | 127ms | 68ms (-47%) | 132ms ( 3%) | 503ms (293%)
200 | 194ms | 135ms (-31%) | 258ms ( 32%) | 995ms (411%)
400 | 330ms | 266ms (-20%) | 514ms ( 55%) | 1998ms (504%)
600 | 460ms | 397ms (-14%) | 774ms ( 67%) | 2992ms (549%)
800 | 590ms | 532ms (-10%) | 1029ms ( 74%) | 3985ms (575%)
1000 | 725ms | 658ms (-10%) | 1295ms ( 78%) | 5000ms (589%)
2000 | 1387ms | 1315ms ( -6%) | 2596ms ( 87%) | 10009ms (621%)
3000 | 2037ms | 1992ms ( -3%) | 3857ms ( 89%) | 15088ms (640%)
4000 | 2716ms | 2643ms ( -3%) | 5175ms ( 90%) | 20128ms (640%)
6000 | 4043ms | 3973ms ( -2%) | 7768ms ( 92%) | 30148ms (645%)
8000 | 5354ms | 5285ms ( -2%) | 10362ms ( 93%) | 40199ms (650%)
9999 | 6675ms | 6594ms ( -2%) | 12897ms ( 93%) | 50212ms (652%)
1 content element, 100 runs
nodes |composed | pre-compiled | process pipe | extension
1 | 236ms | 20ms (-92%) | 44ms (-82%) | 32ms (-87%)
2 | 135ms | 9ms (-93%) | 18ms (-87%) | 38ms (-72%)
4 | 110ms | 10ms (-91%) | 17ms (-84%) | 46ms (-58%)
6 | 105ms | 10ms (-90%) | 20ms (-81%) | 42ms (-60%)
8 | 99ms | 12ms (-88%) | 25ms (-75%) | 50ms (-50%)
10 | 90ms | 14ms (-84%) | 28ms (-69%) | 60ms (-34%)
12 | 89ms | 16ms (-82%) | 32ms (-64%) | 71ms (-21%)
15 | 89ms | 19ms (-79%) | 38ms (-58%) | 89ms ( -1%)
20 | 87ms | 23ms (-74%) | 46ms (-47%) | 117ms ( 35%)
40 | 105ms | 43ms (-59%) | 88ms (-16%) | 228ms (117%)
60 | 122ms | 62ms (-49%) | 131ms ( 7%) | 332ms (171%)
80 | 145ms | 82ms (-43%) | 176ms ( 21%) | 441ms (204%)
100 | 163ms | 103ms (-37%) | 212ms ( 30%) | 548ms (235%)
200 | 265ms | 202ms (-24%) | 423ms ( 59%) | 1111ms (318%)
400 | 464ms | 398ms (-15%) | 836ms ( 79%) | 2227ms (379%)
600 | 665ms | 602ms (-10%) | 1252ms ( 88%) | 3342ms (402%)
800 | 863ms | 801ms ( -8%) | 1676ms ( 94%) | 4430ms (413%)
1000 | 1073ms | 1004ms ( -7%) | 2087ms ( 94%) | 5524ms (414%)
2000 | 2065ms | 1994ms ( -4%) | 4182ms (102%) | 11119ms (438%)
3000 | 3065ms | 2999ms ( -3%) | 6286ms (105%) | 16660ms (443%)
4000 | 4073ms | 3966ms ( -3%) | 8363ms (105%) | 22165ms (444%)
6000 | 6071ms | 6003ms ( -2%) | 12598ms (107%) | 33311ms (448%)
8000 | 8086ms | 8006ms ( -1%) | 16718ms (106%) | 44384ms (448%)
9999 | 10052ms | 9989ms ( -1%) | 20959ms (108%) | 55413ms (451%)
2 content elements, 100 runs
nodes |composed | pre-compiled | process pipe | extension
1 | 240ms | 19ms (-93%) | 47ms (-81%) | 32ms (-87%)
2 | 130ms | 10ms (-92%) | 19ms (-85%) | 39ms (-70%)
4 | 111ms | 11ms (-90%) | 21ms (-81%) | 47ms (-58%)
6 | 107ms | 12ms (-89%) | 25ms (-77%) | 44ms (-59%)
8 | 103ms | 14ms (-86%) | 30ms (-71%) | 52ms (-50%)
10 | 93ms | 16ms (-83%) | 35ms (-62%) | 63ms (-33%)
12 | 91ms | 19ms (-79%) | 40ms (-56%) | 74ms (-19%)
15 | 94ms | 21ms (-77%) | 48ms (-49%) | 91ms ( -3%)
20 | 90ms | 28ms (-69%) | 60ms (-33%) | 123ms ( 36%)
40 | 113ms | 55ms (-51%) | 116ms ( 2%) | 237ms (109%)
60 | 139ms | 82ms (-42%) | 172ms ( 23%) | 353ms (153%)
80 | 169ms | 108ms (-36%) | 230ms ( 36%) | 473ms (179%)
100 | 198ms | 137ms (-31%) | 290ms ( 46%) | 596ms (201%)
200 | 381ms | 314ms (-18%) | 609ms ( 59%) | 1241ms (225%)
400 | 1007ms | 943ms ( -7%) | 1531ms ( 52%) | 2683ms (166%)
600 | 1887ms | 1846ms ( -3%) | 2742ms ( 45%) | 4407ms (133%)
800 | 3244ms | 3159ms ( -3%) | 4333ms ( 33%) | 6326ms ( 95%)
1000 | 4798ms | 4735ms ( -2%) | 6216ms ( 29%) | 8440ms ( 75%)
2000 | 30837ms | 30734ms ( -1%) | 33919ms ( 9%) | 21383ms (-31%)
3000 | 70254ms | 70443ms ( 0%) | 75177ms ( 7%) | 38438ms (-46%)
4000 |126738ms |126845ms ( 0%) |131896ms ( 4%) | 59297ms (-54%)
6000 |285009ms |286287ms ( 0%) |295802ms ( 3%) |112712ms (-61%)
8000 |506940ms |508315ms ( 0%) |520498ms ( 2%) |181728ms (-65%)
9999 |794292ms |793994ms ( -1%) |810903ms ( 2%) |267628ms (-67%)
4 content elements, 100 runs
nodes |composed | pre-compiled | process pipe | extension
1 | 248ms | 17ms (-94%) | 51ms (-80%) | 32ms (-88%)
2 | 133ms | 12ms (-91%) | 23ms (-83%) | 41ms (-69%)
4 | 114ms | 14ms (-88%) | 30ms (-74%) | 51ms (-56%)
6 | 110ms | 16ms (-85%) | 36ms (-68%) | 49ms (-56%)
8 | 107ms | 19ms (-82%) | 43ms (-60%) | 57ms (-48%)
10 | 101ms | 23ms (-78%) | 51ms (-50%) | 69ms (-32%)
12 | 102ms | 26ms (-75%) | 60ms (-42%) | 85ms (-18%)
15 | 103ms | 29ms (-72%) | 71ms (-32%) | 102ms ( -1%)
20 | 99ms | 39ms (-61%) | 91ms ( -9%) | 135ms ( 35%)
40 | 141ms | 74ms (-48%) | 179ms ( 26%) | 263ms ( 86%)
60 | 170ms | 116ms (-32%) | 261ms ( 53%) | 391ms (129%)
80 | 206ms | 147ms (-29%) | 346ms ( 67%) | 517ms (150%)
100 | 258ms | 186ms (-28%) | 440ms ( 70%) | 656ms (154%)
200 | 475ms | 419ms (-12%) | 914ms ( 92%) | 1361ms (186%)
400 | 1223ms | 1157ms ( -6%) | 2152ms ( 75%) | 2923ms (138%)
600 | 2196ms | 2159ms ( -2%) | 3655ms ( 66%) | 4635ms (111%)
800 | 3503ms | 3423ms ( -3%) | 5398ms ( 54%) | 6578ms ( 87%)
1000 | 5079ms | 5295ms ( 4%) | 7791ms ( 53%) | 8668ms ( 70%)
2000 | 25708ms | 23913ms ( -7%) | 28935ms ( 12%) | 22370ms (-13%)
3000 | 65818ms | 67420ms ( 2%) | 74858ms ( 13%) | 40017ms (-40%)
4000 |125688ms |126088ms ( 0%) |136408ms ( 8%) | 61669ms (-51%)
6000 |287168ms |288365ms ( 0%) |303169ms ( 5%) |117042ms (-60%)
8000 |512868ms |513733ms ( 0%) |533504ms ( 4%) |188468ms (-64%)
9999 |799576ms |802145ms ( 0%) |828192ms ( 3%) |275584ms (-66%)
10 content elements, 100 runs
nodes |composed | pre-compiled | process pipe | extension
1 | 255ms | 18ms (-93%) | 63ms (-76%) | 35ms (-86%)
2 | 137ms | 17ms (-88%) | 37ms (-73%) | 46ms (-66%)
4 | 123ms | 23ms (-82%) | 52ms (-58%) | 59ms (-53%)
6 | 122ms | 28ms (-78%) | 66ms (-47%) | 61ms (-50%)
8 | 122ms | 34ms (-73%) | 80ms (-35%) | 72ms (-42%)
10 | 117ms | 40ms (-66%) | 97ms (-17%) | 87ms (-26%)
12 | 119ms | 47ms (-61%) | 113ms ( -6%) | 104ms (-13%)
15 | 126ms | 54ms (-57%) | 138ms ( 9%) | 128ms ( 1%)
20 | 138ms | 72ms (-48%) | 183ms ( 32%) | 170ms ( 23%)
40 | 199ms | 138ms (-31%) | 340ms ( 70%) | 324ms ( 62%)
60 | 265ms | 204ms (-24%) | 514ms ( 93%) | 495ms ( 86%)
80 | 334ms | 275ms (-18%) | 677ms (102%) | 651ms ( 94%)
100 | 402ms | 345ms (-15%) | 854ms (112%) | 819ms (103%)
200 | 797ms | 734ms ( -8%) | 1735ms (117%) | 1693ms (112%)
400 | 1884ms | 1809ms ( -4%) | 3854ms (104%) | 3634ms ( 92%)
600 | 3247ms | 3177ms ( -3%) | 6219ms ( 91%) | 5827ms ( 79%)
800 | 4934ms | 4835ms ( -3%) | 8930ms ( 80%) | 8146ms ( 65%)
1000 | 7037ms | 6977ms ( -1%) | 12088ms ( 71%) | 10806ms ( 53%)
2000 | 32551ms | 32032ms ( -2%) | 42344ms ( 30%) | 27523ms (-16%)
3000 | 86523ms | 86692ms ( 0%) |102437ms ( 18%) | 49170ms (-44%)
4000 |157235ms |157528ms ( 0%) |178602ms ( 13%) | 75523ms (-52%)
6000 |355111ms |355885ms ( 0%) |387521ms ( 9%) |142279ms (-60%)
8000 |628937ms |630111ms ( 0%) |703281ms ( 11%) |228502ms (-64%)
9999 |979335ms |980668ms ( 0%) |1037671ms ( 5%) |334994ms (-66%)
100 content elements, 10 runs
nodes |composed | pre-compiled | process pipe | extension
1 | 75ms | 10ms (-86%) | 48ms (-36%) | 11ms (-86%)
2 | 33ms | 10ms (-70%) | 25ms (-25%) | 14ms (-58%)
4 | 36ms | 15ms (-58%) | 39ms ( 10%) | 22ms (-38%)
6 | 39ms | 20ms (-49%) | 54ms ( 37%) | 32ms (-19%)
8 | 43ms | 25ms (-42%) | 70ms ( 59%) | 41ms ( -7%)
10 | 47ms | 29ms (-38%) | 82ms ( 73%) | 47ms ( 0%)
12 | 50ms | 33ms (-34%) | 96ms ( 89%) | 54ms ( 7%)
15 | 57ms | 40ms (-30%) | 115ms ( 99%) | 63ms ( 10%)
20 | 67ms | 52ms (-23%) | 152ms (124%) | 73ms ( 7%)
40 | 116ms | 99ms (-14%) | 292ms (152%) | 128ms ( 11%)
60 | 164ms | 153ms ( -7%) | 434ms (163%) | 189ms ( 14%)
80 | 219ms | 200ms ( -9%) | 571ms (159%) | 255ms ( 16%)
100 | 269ms | 253ms ( -6%) | 711ms (164%) | 320ms ( 19%)
200 | 534ms | 528ms ( -2%) | 1449ms (171%) | 642ms ( 20%)
400 | 1161ms | 1157ms ( -1%) | 2994ms (157%) | 1329ms ( 14%)
600 | 1870ms | 1878ms ( 0%) | 4667ms (149%) | 2044ms ( 9%)
800 | 2699ms | 2690ms ( -1%) | 6400ms (137%) | 2789ms ( 3%)
1000 | 3630ms | 3643ms ( 0%) | 8303ms (128%) | 3578ms ( -2%)
2000 | 10629ms | 10900ms ( 2%) | 20206ms ( 90%) | 7980ms (-25%)
3000 | 21130ms | 21548ms ( 1%) | 35574ms ( 68%) | 13244ms (-38%)
4000 | 34769ms | 35397ms ( 1%) | 54171ms ( 55%) | 19251ms (-45%)
6000 | 70734ms | 72451ms ( 2%) |100466ms ( 42%) | 33602ms (-53%)
8000 |119603ms |122070ms ( 2%) |160432ms ( 34%) | 51418ms (-58%)
9999 |182144ms |185494ms ( 1%) |231385ms ( 27%) | 72163ms (-61%)
nodes: Number of input test elements 1..9999 (per namespace => 2..19998 child nodes).
composed: Approach 1 with one XSLT importing 2 others. Scanning the input for namespaces. (100%).
pre-compiled: Run 1a with one static, pre-compiled XSLT for comparison. Still scanning the input for namespaces.
process pipe: Approach 2 with 3-stage XSLTs. Scanning the input for namespaces.
extension: Approach 3 with one XSLT for the root namespace and with a separate transformation for every test element.
Noticed
As expected the static, pre-compiled XSLT performs generally best.
As expected the pre-compiled savings become relative as the input data size grows.
Multi-stage pipeline processing performs good for small documents and ok for bigger ones. As expected more namespaces -> more stylesheets -> more stages -> more time (test with the implementation, extended to 3 namespaces).
Compared to the other approaches the per-element transformations (approach 3) becomes expensive, when all the contained elements get transformed (> 100 "foreign" elements, 0 content elements), but improves rapidly with content elements.
What happens in the process with 2 content elements? With >2000 elements, suddenly the per-element transformation is faster, than a transformation with a pre-compiled stylesheet? Even though the per-namespaces stylesheets only need to process one element per transformation, the initial stylesheet still needs to match all the elements, much like the pre-compiled, all-in-one stylesheet. The tree walk to collect the namespaces (not needed in approach 3) cannot be the issue here, can it?
Also while it takes about 7s to process 9999 elements with no content and about 10s with 1 content element, it takes more than 13min for 9999 elements with 2 content elements!
Maybe with this size of the input document the garbage collector kicks in... But GC time seems to be constant and also there no significant change on a 64bit-VM with 6GB of heap.
Also, I guess, 9999 elements with one child element should need roughly the same or even a bigger amount of memory as 5000 elements with 2 child elements. To process the second it takes more than 10 times longer.
Conclusions
To generally solve our problem, the approach 2 seems to be most suitable, with a good performance also with larger documents.
Dynamically compound stylesheets would serve documents with more than 200 nodes better. Combined with a smart cache of pre-compiled stylesheets for often used namespace sets, it could also perform best for smaller documents.
The per-element transformation approach (3) works good for small documents and strangely for large documents (> 3000 nodes, foreign/root < 1/3) as well. Due to it's flexibility (individual stylesheet selection not only by namespace), it's also worth to be offered.
Update: please see
Open XSLT Processing 2 to answer the open questions.