Since the October 30, 2017 release of Analytica 5.0 is fast approaching, it seemed like a good time to re-run the benchmark timing tests from my earlier blog posting, “Faster evaluation in Analytica 4.6“, to see how the upcoming release compares.
Speed enhancements can vary a lot across different models. I am already well-aware that models with large arrays (including large Monte Carlo runs), often experience sizeable speed-up from Analytica 5.0’s new multithreaded evaluation capability. But models without large arrays usually don’t benefit from this, since the overhead of dividing up a computation can easily outweigh the gains from utilizing multiple core. Similarly, models that let array abstraction take care of iteration the way it is intended are likely to benefit, whereas code that has explicit FOR loops, thus circumventing automatic array abstraction, has less opportunity to benefit. Anecdotally, I have already seen some individual cases where evaluation experienced no speed-up at all up to others that had a four-fold speed-up. To come up with an average speed-up, we need to come up with some sort of “representative mix” of real problems. This was the concept behind the benchmark suite first reported in the “Faster evaluation in Analytica 4.6” blog post.
I set aside those benchmark models several years ago for the sole purpose of benchmark testing. I excluded these from any profiling or speed-measurements during code development in order to prevent intentional or unintentional tuning to the benchmark suite. In fact, this suite of models have been hidden away gathering dust since I published that blog article, except that at some point, one more benchmark model (ASB) was added to the suite. The models are all actual, substantial models, in most cases pretty large.
I ran all the benchmarks under four test conditions: Analytica 4.6, and then Analytica 5.0 with 1 thread, 4 threads and 8 threads. I then repeated all these tests 10 times and averaged the result. For the speed-up percent, I compared to the Analytica 4.6 speed. All timings were run on the same computer, which in fact is the same computer used for the tests in the previous article (Intel Core i7-2600 @ 3.4GHz, 4 cores, 8 logical processors, Windows 7). All tests used Analytica 64-bit.
Benchmark | Elapsed time (sec) | Percent speed-up | |||||
4.6 | 5.0 (1) | 5.0 (4) | 5.0 (8) | 5.0 (1) | 5.0 (4) | 5.0 (8) | |
AM1 | 24.0 | 23.2 | 22.9 | 23.0 | 3% | 5% | 4% |
AT1 | 10.2 | 10.1 | 9.5 | 9.6 | 1% | 7% | 6% |
CA1 | 17.0 | 15.4 | 15.6 | 15.6 | 11% | 9% | 9% |
CE3 | 68.7 | 56.6 | 60.7 | 61.2 | 21% | 13% | 12% |
ES1 | 60.4 | 56.9 | 56.6 | 56.7 | 6% | 7% | 6% |
KI3 | 14.0 | 12.8 | 11.5 | 11.9 | 10% | 22% | 18% |
RP1 | 41.4 | 41.3 | 29.3 | 29.2 | 0% | 42% | 42% |
PO5 | 0.183 | 0.45 | 0.45 0.161 | 0.37 0.160 | -59% 9% | -60% 14% | -51% 14% |
SE1 | 73.0 | 72.1 | 69.0 | 71.7 | 1% | 6% | 2% |
SS1 | 11.4 | 11.6 | 11.7 | 11.7 | -2% | -3% | -2% |
ASB | 63.3 | 60.4 | 32.4 | 29.1 | 5% | 95% | 117% |
Ave (w/o PO5): | 6% | 20% | 21% |
For these parallelizable models, it is interesting to see a performance boost going from 1 to 4 threads, but not much from 4 to 8 threads. I think might be related the fact that my computer has 4 cores and 8 logical processors. I seems like the 4 cores may be the more important number. (?)
The RP1 and ASB benchmarks show a strong improvement. I am aware that RP1 does a fair amount of Monte Carlo simulation (whereas I think Monte Carlo simulation may be a but underrepresented by the other models in the suite). ASB does a lot of arithmetic on very large arrays. So the speed-up on those is consistent with those types of models benefiting from the utilization of multiple cores.
My original automated run showed disappointing results for PO5; however, upon further inspection I discovered that this very old model (created in Analytica 3.1) was using the + operator for text concatenation, and this was responsible for the slower numbers. I went through the model and changed the + operators to & in the places they were being used to concatenate text and reran that benchmark. The original numbers are in gray, the new numbers are in black. The gray numbers are not a fair comparison for several reasons. In the 4.6 test, a lot of pre-computation of indexes was carried out during model load, before the timing test started, whereas in the 5.0 tests, because of errors due to the + operator, indexes had not been pre-computed, and hence, a lot more computation was carried out under the timer. The black numbers seem to be the fair comparison.
I have not yet reviewed the other benchmark models to see if something similar occurs. Hopefully I will eventually get to that, since it would make for a fairer comparison (and would improve the 5.0 numbers even more).
The conclusion of these benchmarks is that Analytica 5.0 seems to be about 20% faster on average with multithreading, but with actual results highly dependent on the specifics of the model.
I am sure that future releases will continue to see improvements in evalutation speed, so I’ll make an effort to repeat these benchmark measurements and report them in a blog posting with each new release.
Be sure to visit the What’s new in Analytica 5.0 page. There’s even a video there to showcase all the huge number of new features!