Some BLAS Benchmarks
For the following benchmarks FLENS two different BLAS implementations were used as backend:
The benchmarks were performed using the
Benchmark for Templated Libraries project.
The BTL interface used for these benchmarks can be downloaded
here (this inlcudes the raw data of
the benchmarks and the used compile flags which are basically -O3 -DNDEBUG).
- Plattform: Dual-Core AMD Opteron(tm) Processor 2218, 997.538 MHz, 1024 KB cache, 8GB RAM
- Compiler: g++ (GCC) 4.2.1
- Date: 2007-07-30
The benchmarks also include a comparison with uBLAS. The uBLAS examples
do not utilze any external BLAS implementations but instead the generic BLAS implementation of uBLAS.
So what is the purpose of these tests?
- Demonstrating the FLENS provides you the power of the underlying BLAS implementation. In some cases ATLAS in other
cases MKL provides better results. This suggests implementing BLAS wrappers in FLENS that decide what underlying
BLAS implementation should be used depending on matrix/vector sizes, type of linear algebra operations, ...
- I do not want to bash uBLAS, but the tests show that in some cases you really have to use native BLAS implementations
for good performance results. And actually you can use in uBLAS bindings to native BLAS implementations. But let's
be honest: you don't really wanna compare these bindings with the FLENS high-level interface for BLAS, do you?