#include <TensorReductionSycl.h>

Public Types
typedef Self::CoeffReturnType	CoeffReturnType

Static Public Member Functions
static void	run (const Self &self, Op &reducer, const Eigen::SyclDevice &dev, CoeffReturnType *output)

Static Public Attributes
static const bool	HasOptimizedImplementation = false

Detailed Description

template<typename Self, typename Op, bool Vectorizable>
struct Eigen::internal::FullReducer< Self, Op, const Eigen::SyclDevice, Vectorizable >

For now let's start with a full reducer Self is useless here because in expression construction we are going to treat reduction as a leafnode. we want to take reduction child and then build a construction and apply the full reducer function on it. Fullreducre applies the reduction operation on the child of the reduction. once it is done the reduction is an empty shell and can be thrown away and treated as

Definition at line 103 of file TensorReductionSycl.h.

Member Typedef Documentation

template<typename Self , typename Op , bool Vectorizable>

typedef Self::CoeffReturnType Eigen::internal::FullReducer< Self, Op, const Eigen::SyclDevice, Vectorizable >::CoeffReturnType

Definition at line 105 of file TensorReductionSycl.h.

Member Function Documentation

template<typename Self , typename Op , bool Vectorizable>

static void Eigen::internal::FullReducer< Self, Op, const Eigen::SyclDevice, Vectorizable >::run	(	const Self &	self,
		Op &	reducer,
		const Eigen::SyclDevice &	dev,
		CoeffReturnType *	output
	)

inlinestatic

this is the child of reduction

initial reduction. If the size is less than red_factor we only creates one thread.

if the shared memory is less than the GRange, we set shared_mem size to the TotalSize and in this case one kernel would be created for recursion to reduce all to one.

creating the shared memory for calculating reduction. This one is used to collect all the reduced value of shared memory as we dont have global barrier on GPU. Once it is saved we can recursively apply reduction on it in order to reduce the whole.

reduction cannot be captured automatically through our device conversion recursion. The reason is that reduction has two behaviour the first behaviour is when it is used as a root to lauch the sub-kernel. The second one is when it is treated as a leafnode to pass the calculated result to its parent kernel. While the latter is automatically detected through our device expression generator. The former is created here.

This is the evaluator for device_self_expr. This is exactly similar to the self which has been passed to run function. The difference is the device_evaluator is detectable and recognisable on the device.

const cast added as a naive solution to solve the qualifier drop error

This is used to recursively reduce the tmp value to an element of 1;

Definition at line 108 of file TensorReductionSycl.h.

Member Data Documentation

template<typename Self , typename Op , bool Vectorizable>

const bool Eigen::internal::FullReducer< Self, Op, const Eigen::SyclDevice, Vectorizable >::HasOptimizedImplementation = false

static

Definition at line 106 of file TensorReductionSycl.h.

The documentation for this struct was generated from the following file:

TensorReductionSycl.h

Public Types

Static Public Member Functions

Static Public Attributes

Detailed Description

template<typename Self, typename Op, bool Vectorizable> struct Eigen::internal::FullReducer< Self, Op, const Eigen::SyclDevice, Vectorizable >

Member Typedef Documentation

Member Function Documentation

Member Data Documentation

template<typename Self, typename Op, bool Vectorizable>
struct Eigen::internal::FullReducer< Self, Op, const Eigen::SyclDevice, Vectorizable >