Low-Level CUDA Support

Device management

cupy.cuda.Device

Object that represents a CUDA device.

Memory management

cupy.get_default_memory_pool

Returns CuPy default memory pool for GPU memory.

cupy.get_default_pinned_memory_pool

Returns CuPy default memory pool for pinned memory.

cupy.cuda.Memory

Memory allocation on a CUDA device.

cupy.cuda.UnownedMemory

CUDA memory that is not owned by CuPy.

cupy.cuda.PinnedMemory

Pinned memory allocation on host.

cupy.cuda.MemoryPointer

Pointer to a point on a device memory.

cupy.cuda.PinnedMemoryPointer

Pointer of a pinned memory.

cupy.cuda.alloc

Calls the current allocator.

cupy.cuda.alloc_pinned_memory

Calls the current allocator.

cupy.cuda.get_allocator

Returns the current allocator for GPU memory.

cupy.cuda.set_allocator

Sets the current allocator for GPU memory.

cupy.cuda.using_allocator

Sets a thread-local allocator for GPU memory inside

cupy.cuda.set_pinned_memory_allocator

Sets the current allocator for the pinned memory.

cupy.cuda.MemoryPool

Memory pool for all GPU devices on the host.

cupy.cuda.PinnedMemoryPool

Memory pool for pinned memory on the host.

Memory hook

cupy.cuda.MemoryHook

Base class of hooks for Memory allocations.

cupy.cuda.memory_hooks.DebugPrintHook

Memory hook that prints debug information.

cupy.cuda.memory_hooks.LineProfileHook

Code line CuPy memory profiler.

Streams and events

cupy.cuda.Stream

CUDA stream.

cupy.cuda.ExternalStream

CUDA stream.

cupy.cuda.get_current_stream

Gets current CUDA stream.

cupy.cuda.Event

CUDA event, a synchronization point of CUDA streams.

cupy.cuda.get_elapsed_time

Gets the elapsed time between two events.

Texture and surface memory

cupy.cuda.texture.ChannelFormatDescriptor

A class that holds the channel format description.

cupy.cuda.texture.CUDAarray

Allocate a CUDA array (cudaArray_t) that can be used as texture memory.

cupy.cuda.texture.ResourceDescriptor

A class that holds the resource description.

cupy.cuda.texture.TextureDescriptor

A class that holds the texture description.

cupy.cuda.texture.TextureObject

A class that holds a texture object.

cupy.cuda.texture.SurfaceObject

A class that holds a surface object.

cupy.cuda.texture.TextureReference

A class that holds a texture reference.

Profiler

cupy.cuda.profile

Enable CUDA profiling during with statement.

cupy.cuda.profiler.initialize

Initialize the CUDA profiler.

cupy.cuda.profiler.start

Enable profiling.

cupy.cuda.profiler.stop

Disable profiling.

cupy.cuda.nvtx.Mark

Marks an instantaneous event (marker) in the application.

cupy.cuda.nvtx.MarkC

Marks an instantaneous event (marker) in the application.

cupy.cuda.nvtx.RangePush

Starts a nested range.

cupy.cuda.nvtx.RangePushC

Starts a nested range.

cupy.cuda.nvtx.RangePop

Ends a nested range.

NCCL

cupy.cuda.nccl.NcclCommunicator

Initialize an NCCL communicator for one device controlled by one process.

cupy.cuda.nccl.get_build_version

cupy.cuda.nccl.get_version

Returns the runtime version of NCCL.

cupy.cuda.nccl.get_unique_id

cupy.cuda.nccl.groupStart

Start a group of NCCL calls.

cupy.cuda.nccl.groupEnd

End a group of NCCL calls.

Runtime API

CuPy wraps CUDA Runtime APIs to provide the native CUDA operations. Please check the Original CUDA Runtime API document to use these functions.

cupy.cuda.runtime.driverGetVersion

cupy.cuda.runtime.runtimeGetVersion

cupy.cuda.runtime.getDevice

cupy.cuda.runtime.deviceGetAttribute

cupy.cuda.runtime.deviceGetByPCIBusId

cupy.cuda.runtime.deviceGetPCIBusId

cupy.cuda.runtime.getDeviceCount

cupy.cuda.runtime.setDevice

cupy.cuda.runtime.deviceSynchronize

cupy.cuda.runtime.deviceCanAccessPeer

cupy.cuda.runtime.deviceEnablePeerAccess

cupy.cuda.runtime.malloc

cupy.cuda.runtime.mallocManaged

cupy.cuda.runtime.malloc3DArray

cupy.cuda.runtime.mallocArray

cupy.cuda.runtime.hostAlloc

cupy.cuda.runtime.hostRegister

cupy.cuda.runtime.hostUnregister

cupy.cuda.runtime.free

cupy.cuda.runtime.freeHost

cupy.cuda.runtime.freeArray

cupy.cuda.runtime.memGetInfo

cupy.cuda.runtime.memcpy

cupy.cuda.runtime.memcpyAsync

cupy.cuda.runtime.memcpyPeer

cupy.cuda.runtime.memcpyPeerAsync

cupy.cuda.runtime.memcpy2D

cupy.cuda.runtime.memcpy2DAsync

cupy.cuda.runtime.memcpy2DFromArray

cupy.cuda.runtime.memcpy2DFromArrayAsync

cupy.cuda.runtime.memcpy2DToArray

cupy.cuda.runtime.memcpy2DToArrayAsync

cupy.cuda.runtime.memcpy3D

cupy.cuda.runtime.memcpy3DAsync

cupy.cuda.runtime.memset

cupy.cuda.runtime.memsetAsync

cupy.cuda.runtime.memPrefetchAsync

cupy.cuda.runtime.memAdvise

cupy.cuda.runtime.pointerGetAttributes

cupy.cuda.runtime.streamCreate

cupy.cuda.runtime.streamCreateWithFlags

cupy.cuda.runtime.streamDestroy

cupy.cuda.runtime.streamSynchronize

cupy.cuda.runtime.streamAddCallback

cupy.cuda.runtime.streamQuery

cupy.cuda.runtime.streamWaitEvent

cupy.cuda.runtime.eventCreate

cupy.cuda.runtime.eventCreateWithFlags

cupy.cuda.runtime.eventDestroy

cupy.cuda.runtime.eventElapsedTime

cupy.cuda.runtime.eventQuery

cupy.cuda.runtime.eventRecord

cupy.cuda.runtime.eventSynchronize