Custom kernels#

cupy.ElementwiseKernel(in_params, ...[, ...])

User-defined elementwise kernel.

cupy.ReductionKernel(unicode in_params, ...)

User-defined reduction kernel.

cupy.RawKernel(unicode code, unicode name, ...)

User-defined custom kernel.

cupy.RawModule(unicode code=None, *, ...[, ...])

User-defined custom module.

cupy.fuse(*args, **kwargs)

Decorator that fuses a function.

JIT kernel definition#

Supported Python built-in functions include: range, len(), max(), min().

Note

If loop unrolling is needed, use cupyx.jit.range() instead of the built-in range.

cupyx.jit.rawkernel(*[, mode, device])

A decorator compiles a Python function into CUDA kernel.

cupyx.jit.threadIdx

dim3 threadIdx

cupyx.jit.blockDim

dim3 blockDim

cupyx.jit.blockIdx

dim3 blockIdx

cupyx.jit.gridDim

dim3 gridDim

cupyx.jit.grid(ndim)

Compute the thread index in the grid.

cupyx.jit.gridsize(ndim)

Compute the grid size.

cupyx.jit.laneid()

Returns the lane ID of the calling thread, ranging in [0, jit.warpsize).

cupyx.jit.warpsize

Returns the number of threads in a warp.

cupyx.jit.range(*args[, unroll])

Range with loop unrolling support.

cupyx.jit.syncthreads()

Calls __syncthreads().

cupyx.jit.syncwarp(*[, mask])

Calls __syncwarp().

cupyx.jit.shfl_sync(mask, var, val_id, *[, ...])

Calls the __shfl_sync function.

cupyx.jit.shfl_up_sync(mask, var, val_id, *)

Calls the __shfl_up_sync function.

cupyx.jit.shfl_down_sync(mask, var, val_id, *)

Calls the __shfl_down_sync function.

cupyx.jit.shfl_xor_sync(mask, var, val_id, *)

Calls the __shfl_xor_sync function.

cupyx.jit.shared_memory(dtype, size[, alignment])

Allocates shared memory and returns it as a 1-D array.

cupyx.jit.atomic_add(array, index, value[, ...])

Calls the atomicAdd function to operate atomically on array[index].

cupyx.jit.atomic_sub(array, index, value[, ...])

Calls the atomicSub function to operate atomically on array[index].

cupyx.jit.atomic_exch(array, index, value[, ...])

Calls the atomicExch function to operate atomically on array[index].

cupyx.jit.atomic_min(array, index, value[, ...])

Calls the atomicMin function to operate atomically on array[index].

cupyx.jit.atomic_max(array, index, value[, ...])

Calls the atomicMax function to operate atomically on array[index].

cupyx.jit.atomic_inc(array, index, value[, ...])

Calls the atomicInc function to operate atomically on array[index].

cupyx.jit.atomic_dec(array, index, value[, ...])

Calls the atomicDec function to operate atomically on array[index].

cupyx.jit.atomic_cas(array, index, value[, ...])

Calls the atomicCAS function to operate atomically on array[index].

cupyx.jit.atomic_and(array, index, value[, ...])

Calls the atomicAnd function to operate atomically on array[index].

cupyx.jit.atomic_or(array, index, value[, ...])

Calls the atomicOr function to operate atomically on array[index].

cupyx.jit.atomic_xor(array, index, value[, ...])

Calls the atomicXor function to operate atomically on array[index].

cupyx.jit.cg.this_grid()

Returns the current grid group (_GridGroup).

cupyx.jit.cg.this_thread_block()

Returns the current thread block group (_ThreadBlockGroup).

cupyx.jit.cg.sync(group)

Calls cg::sync().

cupyx.jit.cg.memcpy_async(group, dst, ...[, ...])

Calls cg::memcpy_sync().

cupyx.jit.cg.wait(group)

Calls cg::wait().

cupyx.jit.cg.wait_prior(group)

Calls cg::wait_prior<N>().

cupyx.jit._interface._JitRawKernel(func, ...)

JIT CUDA kernel object.

Kernel binary memoization#

cupy.memoize(bool for_each_device=False)

Makes a function memoizing the result for each argument and device.

cupy.clear_memo()

Clears the memoized results for all functions decorated by memoize.