If you define a structure having members of fixed-size vectorizable Eigen types, you must ensure that calling operator new on it allocates properly aligned buffers. If you're compiling in [c++17] mode only with a sufficiently recent compiler (e.g., GCC>=7, clang>=5, MSVC>=19.12), then everything is taken care by the compiler and you can stop reading.
Otherwise, you have to overload its operator new
so that it generates properly aligned pointers (e.g., 32-bytes-aligned for Vector4d and AVX). Fortunately, Eigen provides you with a macro EIGEN_MAKE_ALIGNED_OPERATOR_NEW
that does that for you.
The kind of code that needs to be changed is this:
In other words: you have a class that has as a member a fixed-size vectorizable Eigen object, and then you dynamically create an object of that class.
Very easy, you just need to put a EIGEN_MAKE_ALIGNED_OPERATOR_NEW
macro in a public part of your class, like this:
This macro makes new Foo
always return an aligned pointer.
In [c++17], this macro is empty.
If this approach is too intrusive, see also the other solutions.
OK let's say that your code looks like this:
A Eigen::Vector4d consists of 4 doubles, which is 256 bits. This is exactly the size of an AVX register, which makes it possible to use AVX for all sorts of operations on this vector. But AVX instructions (at least the ones that Eigen uses, which are the fast ones) require 256-bit alignment. Otherwise you get a segmentation fault.
For this reason, Eigen takes care by itself to require 256-bit alignment for Eigen::Vector4d, by doing two things:
operator new
of Eigen::Vector4d so it will always return 256-bit aligned pointers. (removed in [c++17])Thus, normally, you don't have to worry about anything, Eigen handles alignment of operator new for you...
... except in one case. When you have a class Foo
like above, and you dynamically allocate a new Foo
as above, then, since Foo
doesn't have aligned operator new
, the returned pointer foo is not necessarily 256-bit aligned.
The alignment attribute of the member v
is then relative to the start of the class Foo
. If the foo
pointer wasn't aligned, then foo->v
won't be aligned either!
The solution is to let class Foo
have an aligned operator new
, as we showed in the previous section.
This explanation also holds for SSE/NEON/MSA/Altivec/VSX targets, which require 16-bytes alignment, and AVX512 which requires 64-bytes alignment for fixed-size objects multiple of 64 bytes (e.g., Eigen::Matrix4d).
That's not required. Since Eigen takes care of declaring adequate alignment, all members that need it are automatically aligned relatively to the class. So code like this works fine:
That said, as usual, it is recommended to sort the members so that alignment does not waste memory. In the above example, with AVX, the compiler will have to reserve 24 empty bytes between x
and v
.
Dynamic-size matrices and vectors, such as Eigen::VectorXd, allocate dynamically their own array of coefficients, so they take care of requiring absolute alignment automatically. So they don't cause this issue. The issue discussed here is only with fixed-size vectorizable matrices and vectors.
No, it's not our bug. It's more like an inherent problem of the c++ language specification that has been solved in c++17 through the feature known as dynamic memory allocation for over-aligned data.
For this situation, we offer the macro EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF(NeedsToAlign)
. It will generate aligned operators like EIGEN_MAKE_ALIGNED_OPERATOR_NEW
if NeedsToAlign
is true. It will generate operators with the default alignment if NeedsToAlign
is false. In [c++17], this macro is empty.
Example:
In case putting the EIGEN_MAKE_ALIGNED_OPERATOR_NEW
macro everywhere is too intrusive, there exists at least two other solutions.
The first is to disable alignment requirement for the fixed size members:
This v
is fully compatible with aligned Eigen::Vector4d. This has only for effect to make load/stores to v
more expensive (usually slightly, but that's hardware dependent).
The second consist in storing the fixed-size objects into a private struct which will be dynamically allocated at the construction time of the main object:
The clear advantage here is that the class Foo
remains unchanged regarding alignment issues. The drawback is that an additional heap allocation will be required whatsoever.