#include <TensorReductionSycl.h>
Public Types | |
typedef Self::CoeffReturnType | CoeffReturnType |
Static Public Member Functions | |
static bool | run (const Self &self, Op &reducer, const Eigen::SyclDevice &dev, CoeffReturnType *output, typename Self::Index, typename Self::Index num_coeffs_to_preserve) |
Static Public Attributes | |
static const bool | HasOptimizedImplementation = false |
Definition at line 182 of file TensorReductionSycl.h.
typedef Self::CoeffReturnType Eigen::internal::InnerReducer< Self, Op, const Eigen::SyclDevice >::CoeffReturnType |
Definition at line 184 of file TensorReductionSycl.h.
|
inlinestatic |
this is the child of reduction
creating the shared memory for calculating reduction. This one is used to collect all the reduced value of shared memory as we dont have global barrier on GPU. Once it is saved we can recursively apply reduction on it in order to reduce the whole.
reduction cannot be captured automatically through our device conversion recursion. The reason is that reduction has two behaviour the first behaviour is when it is used as a root to lauch the sub-kernel. The second one is when it is treated as a leafnode to pass the calculated result to its parent kernel. While the latter is automatically detected through our device expression generator. The former is created here.
This is the evaluator for device_self_expr. This is exactly similar to the self which has been passed to run function. The difference is the device_evaluator is detectable and recognisable on the device.
const cast added as a naive solution to solve the qualifier drop error
Definition at line 187 of file TensorReductionSycl.h.
|
static |
Definition at line 185 of file TensorReductionSycl.h.