$treeview $search $mathjax
Palabos  Version 1.1
$projectbrief
$projectbrief
$searchbox

plb::CoProcessor3D< T > Class Template Reference

#include <coProcessor3D.h>

Inheritance diagram for plb::CoProcessor3D< T >:

List of all members.

Public Member Functions

virtual ~CoProcessor3D ()
virtual int addDomain (plint nx, plint ny, plint nz, T omega, int &domainHandle)=0
 Add a domain for which the co-processor will perform computations.
virtual int send (int domainHandle, Box3D const &subDomain, std::vector< char > const &data)=0
 Copy data from Palabos' CPU memory to the co-processors' device memory.
virtual int receive (int domainHandle, Box3D const &subDomain, std::vector< char > &data) const =0
 Copy data from the co-processors' device memory to Palabos' CPU memory.
virtual int collideAndStream (int domainHandle)=0
 Execute a collision step on each cell, and then a streaming on the full domain.

Detailed Description

template<typename T>
class plb::CoProcessor3D< T >

A co-processor provides access to a computational hardware unit, such as a GPU or a FPGA. An instance of the CoProcessor3D class is considered to represent a single hardware unit. In a multi-GPU machine for instance, a new CoProcessor3D is instantiated for each GPU.

A co-processor acts exclusively on rectangular domains, and can be responsible for more than one domain. The method addDomain is used to add new domains for which the co-processor is reponsible.

The memory is considered to be duplicated. It allocated once on the CPU by Palabos and once on the device by the co-processor. The send() and receive() methods are responsible for communication between the two memory spaces, while the collideAndStream() method works on device memory only.

At this stage, co-processors implement only BGK dynamics on a D3Q19 lattice. Also, only the collide-and-stream operation is performed by the device at this point. Both these aspects will be generalized in the future.


Constructor & Destructor Documentation

template<typename T >
virtual plb::CoProcessor3D< T >::~CoProcessor3D (  )  [inline, virtual]

Member Function Documentation

template<typename T >
virtual int plb::CoProcessor3D< T >::addDomain ( plint  nx,
plint  ny,
plint  nz,
omega,
int &  domainHandle 
) [pure virtual]

Add a domain for which the co-processor will perform computations.

All domains range from 0 to nx-1, from 0 to ny-1, and from 0 to nz-1 at the present interface representation, no matter where they are actually placed in the physical space.

The relaxation parameter omega is used to implement the BGK collision rule on the device.

A handle "domainHandle" is returned by the co-processor, and is subsequently used to identify the various domains during the calls to send(), receive(), and collideAndStream().

The method returns an error code: 1=success, 0=failure.

Implemented in plb::D3Q19ExampleCoProcessor3D< T >, and plb::D3Q19CudaCoProcessor3D< T >.

template<typename T >
virtual int plb::CoProcessor3D< T >::collideAndStream ( int  domainHandle  )  [pure virtual]

Execute a collision step on each cell, and then a streaming on the full domain.

Note that the result of the streaming step is undefined in a one-cell layer at the outer border of the domain. The method collideAndStream() is free to produce whatever result it wishes inside this layer.

It is also mentioned that the collideAndStream() operation is blocking: it does not terminated before the operation is fully completed. In order to overlay computations, you must use the MPI-based multi-thread mechanism in Palabos.

Implemented in plb::D3Q19ExampleCoProcessor3D< T >, and plb::D3Q19CudaCoProcessor3D< T >.

template<typename T >
virtual int plb::CoProcessor3D< T >::receive ( int  domainHandle,
Box3D const &  subDomain,
std::vector< char > &  data 
) const [pure virtual]

Copy data from the co-processors' device memory to Palabos' CPU memory.

The method returns an error code: 1=success, 0=failure. Further information on the memory layout is available in the documentation of the method send().

Attention: it is the responsibility of the receive method to resize the data vector so it is big enough.

Implemented in plb::D3Q19ExampleCoProcessor3D< T >, and plb::D3Q19CudaCoProcessor3D< T >.

template<typename T >
virtual int plb::CoProcessor3D< T >::send ( int  domainHandle,
Box3D const &  subDomain,
std::vector< char > const &  data 
) [pure virtual]

Copy data from Palabos' CPU memory to the co-processors' device memory.

The method returns an error code: 1=success, 0=failure. Please note that the memory of a std::vector is always contiguous, which means that you can get a c-array representation of the data through the syntax T const* carray = &data[0].

The memory layout must respect the following ordering:

  • The fastest running index is for the 19 populations, with an ordering specified in the structure "D3Q19Constants" in the file "latticeBoltzmann/nearestNeighborLattices3D.hh".
  • The space indices are ordered according to the C convention, meaning that, if you take the space matrix to be declared as matrix[nx][ny][nz], then the z-index is fastest running.

Implemented in plb::D3Q19ExampleCoProcessor3D< T >, and plb::D3Q19CudaCoProcessor3D< T >.


The documentation for this class was generated from the following file: