17
votes
5answers
1k views

Naive C++ Matrix Multiplication 100 times slower than BLAS?

I am taking a look at large matrix multiplication and ran the following experiment to form a baseline test: Randomly generate two 4096x4096 matrixes X, Y from std normal (0 mean, 1 stddev). Z = X*Y ...