Skip to content

Conversation

@cjones051073
Copy link
Contributor

This PR extends the type templation that was already present in some classes in the GenVector library to improve support for using vectorised (Vc) types. Specifically it :-

  1. Extends the templation to the Plane3D, Transformation3D and Translation3D types.
  2. Where necessary provides specialised methods for the vector types, when the original code was not generic enough to work in both scalar and vector scenarios. Typically this happens in the case of conditionals, where the differences required (booleans versus masks) are difficult to avoid.

This PR is not complete, in that there are still some classes in GenVector that still do not support Vc types, as they are still not templated, such as the Rotation like transformations. It would be nice to add this at some point, but the code associated to these is more extensive (3DConversions.cxx for instance) and that will require some work.

My changes pass the built in ROOT tests. In addition I have prepared a simple test case for the Vc types (attached) that I have used to check the scalar and vector types give equivalent results. Tested on OS X with the compilation command

clang++ -O3 -mavx2 -mfma root-config --cflags -I/Users/chris/Projects/Vc/install/include main.cpp root-config --libs -lGenVector /Users/chris/Projects/Vc/install/lib/libVc.a

main-cpp.txt

…ests. Start with a limited scope and just update Plane3D, Translation3D and Transformation3D
Scalar Mag2() const { return fX*fX + fY*fY; }
Scalar R() const { return std::sqrt( Mag2());}
Scalar Phi() const { return (fX==0 && fY==0) ? 0.0 : atan2(fY,fX);}
Scalar R() const { using namespace std; return sqrt( Mag2()); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, what is the advantage of the using namespace std; pattern, instead of using return std::sqrt...?

Copy link
Contributor Author

@cjones051073 cjones051073 Mar 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is required to support Vc types. For Vc types you need to use the versions of sqrt etc. shipped as part of that library. Using std::sqrt does not allow this. The namespace trick means you use the std:: versions when appropriate, but also allows other implementations when not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, thanks for explaining! Neat and subtle at the same time. Btw, there is an inherited extra space after the open paren.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, it's @lmoneta's call but I'd prefer to have a short comment saying what the pattern does, or maybe a #define R__enable_vc_types using namespace std; This would make this easier to read by an unarmed eye.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike preprocessor directives unless absolutely required, and I don't think here adding one really helps. To my eye the pattern as is is clear, but as you say its up to @lmoneta . I would be OK with adding some sort of comment somewhere ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment would be as clear and more concise than a ifdef

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

I don't understand this. Are you sure it is needed for Vc ? In the past I have seen Vc replacing automatically its vectorised function implementations in the std namespace

Copy link
Contributor Author

@cjones051073 cjones051073 Mar 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure I ran into some issue that lead me to doing this... Also note the templation is in principle there for other types than Vc. This change in principle supports types that do not extend the math functions under std::, as well as those that do.

*begin++ = a;
*begin++ = b;
*begin = c;
Scalar a,b,c = Scalar(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that leave a and b uninitialized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite right.... Will fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix pushed.

@cjones051073
Copy link
Contributor Author

enable_if implementations updated.

*/
template <typename SCALAR = Scalar>
typename std::enable_if<std::is_arithmetic<SCALAR>::value, DisplacementVector3D>::type Unit() const
template <typename SCALAR = Scalar, typename std::enable_if<std::is_arithmetic<SCALAR>::value>::type * = nullptr>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you transform this:
template <typename SCALAR = Scalar, typename std::enable_if<std::is_arithmetic<SCALAR>::value>::type * = nullptr>

into

template <typename SCALAR = Scalar, typename = std::enable_if<std::is_arithmetic<SCALAR>::value>::type>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Forget my last comment... having a bad comment day).

You are missing one typename, but yes, I can make that change..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it fails to compile when there are two methods. i.e.

 /**
         return unit vector parallel to this (scalar)
      */
      template <typename SCALAR = Scalar, typename = typename std::enable_if<std::is_arithmetic<SCALAR>::value>::type>
      DisplacementVector3D Unit() const
      {
         const auto tot = R();
         return tot == 0 ? *this : DisplacementVector3D(*this) / tot;
      }

      /**
         return unit vector parallel to this (vector)
      */
      template <typename SCALAR = Scalar, typename = typename std::enable_if<!std::is_arithmetic<SCALAR>::value>::type>
      DisplacementVector3D Unit() const
      {
         SCALAR tot            = R();
         tot(tot == SCALAR(0)) = SCALAR(1);
         return DisplacementVector3D(*this) / tot;
      }

gives

In file included from /Users/chris/Projects/ROOT/source/math/genvector/src/Boost.cxx:16:
In file included from /Users/chris/Projects/ROOT/build/include/Math/GenVector/Boost.h:20:
In file included from /Users/chris/Projects/ROOT/build/include/Math/GenVector/LorentzVector.h:23:
/Users/chris/Projects/ROOT/build/include/Math/GenVector/DisplacementVector3D.h:351:28: error: class member cannot be redeclared
      DisplacementVector3D Unit() const
                           ^
/Users/chris/Projects/ROOT/build/include/Math/GenVector/DisplacementVector3D.h:341:28: note: previous declaration is here
      DisplacementVector3D Unit() const
                           ^

I'll stick with what is currently there and works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, indeed.

… expect speed up based on the SIMD register size is observed (with 10% safety factor)
@cjones051073
Copy link
Contributor Author

I have extended the Vc GenVector test as requested to include a speed test. The test asserts that, for an equal number of 'photons' the processing time for the vectorised test should be a factor of Vc::double_v::Size faster than the scalar test (to within 10% to account for fluctuating machine conditions etc.). Output looks like

Scalar best time        = 0.00278807
Vectorised Vc best time = 0.00138211
Vectorised Vc SIMD size = 2
Vectorised Vc speedup   = 2.01725

So indeed for the default (SSE4) vectorisation level with double a factor of 2 is observed.

Chris

@vgvassilev
Copy link
Member

@phsft-bot build!

Thanks for the updated test Chris!

@cjones051073
Copy link
Contributor Author

running the test a few times it seems 10% is too tight a margin. The fluctuations from just rerunning the test a few times can be quite sizeable. To avoid false positives I have extended the margin to 25%.

@vgvassilev
Copy link
Member

Hm..., now I am confused, according to the test results, the vectorisation on slc6 and centos7 has a negative effect...

@vgvassilev
Copy link
Member

@phsft-bot build!

@cjones051073
Copy link
Contributor Author

why do you say that ?

@cjones051073
Copy link
Contributor Author

I know for a fact the results are as expected on SLC6. The results in the talk I posted where run on this platform.

@vgvassilev
Copy link
Member

I am puzzled why this happens, do you have an idea (the invariant is the gcc49).

You say results on SLC6 are expected: you mean the ones we observed in the jenkins build?

@cjones051073
Copy link
Contributor Author

which jenkins build ?

Note that to get good SIMD results requires the test to run on a machine that properly supports SIMD instructions. As I mention in my talk if the machine you use is a VM, the results could be way off.

@cjones051073
Copy link
Contributor Author

The only Vc results I really trust are those run on a 'real' CPU, not a VM.

@vgvassilev
Copy link
Member

vgvassilev commented Mar 14, 2017

Ok, then we should not assert failure from in test case. Let's just print out the information for reference.

@cjones051073
Copy link
Contributor Author

What sort of a machine do the tests

https://phsft-jenkins.cern.ch/job/root-pullrequests/127/BUILDTYPE=Debug,COMPILER=gcc49,LABEL=slc6/testReport/junit/projectroot/test/test_GenVector_Vc/

run on ?

My bet is this is not a real CPU that properly supports SIMD instructions (SSE4 in this case) or a VM machine where the timings cannot be trusted.

Sorry, but if you want a timing test for the Vc types you have to run this on a machine where the SIMD types are properly supported, and looking at the results above I would say this is currently not the case.

Chris

@cjones051073
Copy link
Contributor Author

I agree then. If you run the jenkins tests on VM then the timing tests cannot be trusted. I will disable the failure mode from them.

@vgvassilev
Copy link
Member

Thanks! With that I think we are ready to land this nice piece of work given we get green light from jenkins.

@cjones051073
Copy link
Contributor Author

I've turned off the failure return value from the timing tests, as if they are run from VMs the results cannot really be trusted.

@vgvassilev
Copy link
Member

@phsft-bot build!

@vgvassilev vgvassilev merged commit 8c30c8d into root-project:master Mar 15, 2017
@vgvassilev
Copy link
Member

Finally merged! Thanks for the contribution. I am looking forward to reviewing more PR going in this direction!

@cjones051073
Copy link
Contributor Author

Many thanks for your help as well. I may well return to some of the other genvector classes at some point, the rotations for instance, as I think I will eventually have a use for them. That will be more work though... Hopefully next time things will be smoother as I now know a lot more about what is expected in a ROOT PR... ;)

Copy link
Member

@vgvassilev vgvassilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some post-commit review comments. Could you address them please?


template <typename POINT, typename VECTOR, typename FTYPE>
inline
typename std::enable_if<std::is_arithmetic<typename POINT::Scalar>::value &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjones051073 it seems that we omitted transforming this code here as a default template parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I over looked these instances of enable_if. will change them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I guess I should open a new PR for this small fix, as this one is closed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. see PR #427.


template <typename POINT, typename VECTOR, typename FTYPE>
inline
typename std::enable_if<!std::is_arithmetic<typename POINT::Scalar>::value &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

}

template <typename POINT, typename VECTOR, typename PLANE>
inline typename std::enable_if<std::is_arithmetic<typename POINT::Scalar>::value &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

}

template <typename POINT, typename VECTOR, typename PLANE, typename FTYPE = typename POINT::Scalar>
inline typename std::enable_if<!std::is_arithmetic<typename POINT::Scalar>::value &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants