doodlezuloo.blogg.se - Motrix multiplication

#MOTRIX MULTIPLICATION FREE#

In other words, Strassen is about doing smaller amount of operations, not doing them faster ‒ and it does smaller amount of both multiplications and additions.īecause on small matrices, the trade of one multiplication for several additions doesn't pay off, I switch it to the normal multiplication on the bottom of the recursion ‒ which can happily use SIMD, possibly with the FMA instructions mentioned above.Fig.2: Definition of the matrix multiplication tensor as a triad, as defined in Deep Mind paper.įig.2 reports exactly the matrix multiplication, expressed as a triad, namely three elements. That O(n 2.81) comes from rigorous mathematics around recurrences… something that I don't find as fun or enlightening, but it can be found on the wikipedia page. This is somewhat simplified (because with recursive Strassen algorithm, the multiplication is faster than O(n³), so the difference between multiplication and addition is slightly smaller), but I think this helps to build the intuition behind the idea. Therefore, if we have a really big matrix, trading one multiplication for several additions on this big matrix pays off. So while addition and multiplication can take about the same time on small a matrix, there'll be a difference on large ones, and the bigger the matrix, the bigger the difference. Therefore, if I double the side of the matrix, I expect the addition to take roughly 4 times as long, while matrix multiplication 8 times as long (if I ignore all the important details, like caches or costs to start up and finish up the algorithm). These are asymptotically more expensive than additions (while the primitive ones are constant-factor more expensive than additions).Īs an intuitive explanation (if you don't know how the asymptotic time complexity works): normal matrix multiplication is 3 nested for cycles, while addition are only 2.

You speak about the primitive instructions, while Strassen cares about matrix multiplications, on the big matrices. I guess this part actually can use it pretty easily, though: Īnd yes, as pointed out, you seem to have gotten confused a bit there (or I haven't written it clearly enough).

#MOTRIX MULTIPLICATION FREE#

But if you want to compare it, or add another library to the comparison, feel free to send merge requests :-).įirst, I haven't checked the generated assembler (I don't really know assembler). Also, I aimed for high speed on really big dense matrices ‒ but how often do you need that in practice? They are great as an exercise, but I believe MatrixMultiply aims more for real-life usage (which means it must not make multiplying small ones slower and it must handle matrices of non-square sizes). It explicitly does not use multiple threads. Also, at what size of matrix do you measure it? And how do you measure it (and on which hardware)?Īs for MatrixMultiply crate ‒ I haven't looked into it much, but I think it already uses some of the same optimisations. As for Armadillo ‒ I think it uses its own implementation, but I haven't dug into it ‒ if it found and used something already on my system, I wouldn't have known.Ĭomparing matrix multiplication libraries in terms of FLOPs? How does that work or say anything at all about the implementation? I mean, using strassen algorithm, you'll probably get lower FLOPs (because you do less work overall), but faster implementation.

The comparison to Armadillo was just a fast experiment to see I'm not off the charts completely.

I haven't compared to anything else but Armadillo ‒ my intention wasn't exactly arms race against another implementation, more like using matrix multiplication as a kind of optimisation case study ‒ it could have been sorting of numbers, or something else.