Tutorial 2: Understanding the multi-block structure

The code structure of Palabos programs is driven by the duality between atomic-blocks, which implement regular data-arrays, and complex multi-blocks. Thanks to a practically identical interface, they appear to the user on a seemingly equal footing. In reality, however, there exists a hierarchical relationship bewteen them. The multi-block is actually a composite construct of several adjacent blocks and uses itself atomic-blocks for its internal implementation. Atomic- and multi-blocks come in three flavors: the (multi-) block-lattice which holds lattice Boltzmann variables, the (multi-) scalar-field for scalar data-arrays, and the (multi-) tensor-field for vector- or tensor-valued data-arrays.

Tutorial 2.1 exploits the similarity of interface between atomic-blocks and multi-blocks, and rewrites a program from Tutorial 1, replacing the multi block-lattice by a plain block-lattice. Tutorial 2.2 provides further insight in the multi-block and gives the user the possibility to construct such an object manually by explicitly specifying the position of the consisting atomic-blocks. In tutorial 2.3, a multi-block (or for all means, an atomic-block) is manipulated through the explicit use of so-called data processors. They provide explicit access to atomic-block cells inside a multi-block, even if the internal structure of the multi-block is unknown to the user. Tutorial 2.4 finally explains how a program is parallelized (again manually for didactic purposes) by attributing the components of a multi-block to different threads.

Attention: The purpose of Tutorial 2 is to provide deeper insight into the mechanisms of Palabos, and not to establish good coding practice. As already mentioned, you should in practice always prefer multi-blocks over atomic-blocks at the end-user level, in spite of the counter-example shown in Tutorial 2.1.

Tutorial 2.1: Formulating a program with an atomic-block

In Tutorial 1.5: Boundary conditions, a simulation for a 2D Poiseuille flow was presented, which used the data structure MultiBlock2D to hold the data. This is the right way of doing, because it is recommended to use multi-block structures for practically all purposes in end-user programs. Strictly speaking, a non-parallel version of this program could however also be coded using a simpler data structure, because of the simple, rectangular shape of the domain. Such a conversion is straightforward in Palabos, because most of the code is generic and works identically for atomic-blocks and multi-blocks. As it can be seen on the following code, also found in the file tutorial_2_1.cpp, it is sufficient to replace the keyword MultiBlockLattice2D by BlockLattice2D at two places:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
/* Code 2.1 in the Palabos tutorial
 */

// ... Definition of the Poiseuille profile, and instantiation of the geometry are
//     exactly identical with a multi-block or an atomic-block


// Multi-Block version of tutorial 1.5:
//   void writeGifs(MultiBlockLattice2D<T,DESCRIPTOR>& lattice, plint iter)
//   { ...

// Atomic-Block version of tutorial 2.1:
void writeGifs(BlockLattice2D<T,DESCRIPTOR>& lattice, plint iter)
{
// ...

int main(int argc, char* argv[])
{
    // ... Definition of the parameters are identical in both versions

    // Multi-Block version of tutorial 1.5:
    // MultiBlockLattice2D<T, DESCRIPTOR> lattice (
    //          parameters.getNx(), parameters.getNy(),
    //          new BGKdynamics<T,DESCRIPTOR>(parameters.getOmega()) );

    // Atomic-Block version of tutorial 2.1:
    BlockLattice2D<T, DESCRIPTOR> lattice (
             parameters.getNx(), parameters.getNy(),
             new BGKdynamics<T,DESCRIPTOR>(parameters.getOmega()) );

    // ... Creating the initial condition and running the simulation is
    //      identical in both versions
}

As you progress with your work with Palabos, you will learn to appreciate the abstraction mechanism through which the complex multi-block behaves like a regular construct. In practical work, it offers a relatively simple way to cope with complicated constructs.

Tutorial 2.2: Creating a multi-block structure manually

One of the uses of the multi-block consists in the memory-efficient implmenetation of sparse domains. To understand this, consider the following 2D fluid problem:
_images/halfCircle.gif
The fluid is confined in a channel shaped like a half-circle. Pressure boundary conditions are used with an appropriate difference between inlet and outlet, responsible for driving the fluid through the channel. It is obviously memory wasting to represent this problem by a regular matrix structure, because a large area in the interior of the half-circle is unused. This is therefore a typical candidate for a sparse domain implementation. While the sparse structure of a multi-block is usually automatically created by an appropriate tool, it is in the following created manually for didactical reasons. We choose an approach in which the channel is covered by three block, while the fourth block, in the center of the half-circle, is left void, as indicated by the black area on the following image:
_images/halfCircleBlocks.gif

The construction of this sparse domain is done in the code tutorial_2_2.cpp, it is integrally printed here (without the usual header lines):

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
/// Describe the geometry of the half-circular channel, used in tutorial 2.
template<typename T>
class HalfCircleDomain2D : public DomainFunctional2D {
public:
    HalfCircleDomain2D(plint N, plint radius)
        : cx(N/2),
          cy(N/2),
          innerR(radius),
          outerR(N/2)
    { }
    /// Return true for all cells outside the channel, on which bounce-back
    ///  dynamics must be instantiated.
    virtual bool operator() (plint iX, plint iY) const {
        T rSqr = util::sqr(iX-cx) + util::sqr(iY-cy);
        return rSqr <= innerR*innerR || rSqr >= outerR*outerR;
    }
    virtual HalfCircleDomain2D<T>* clone() const {
        return new HalfCircleDomain2D<T>(*this);
    }
private:
    plint cx;      //< X-position of the center of the half-circle.
    plint cy;      //< Y-position of the center of the half-circle.
    plint innerR;  //< Outer radius of the half-circle.
    plint outerR;  //< Inner radius of the half-circle.
};


void halfCircleSetup (
        MultiBlockLattice2D<T,DESCRIPTOR>& lattice, plint N, plint radius,
        OnLatticeBoundaryCondition2D<T,DESCRIPTOR>& boundaryCondition )
{
    // The channel is pressure-driven, with a difference deltaRho
    //   between inlet and outlet.
    T deltaRho = 1.e-2;
    T rhoIn  = 1. + deltaRho/2.;
    T rhoOut = 1. - deltaRho/2.;

    Box2D inlet (0,     N/2, N/2, N/2);
    Box2D outlet(N/2+1, N,   N/2, N/2);

    boundaryCondition.addPressureBoundary1P(inlet, lattice);
    boundaryCondition.addPressureBoundary1P(outlet, lattice);

    // Specify the inlet and outlet density.
    setBoundaryDensity (lattice, inlet, rhoIn);
    setBoundaryDensity (lattice, outlet, rhoOut);

    // Create the initial condition.
    Array<T,2> zeroVelocity(0.,0.);
    T constantDensity = (T)1;
    initializeAtEquilibrium (
       lattice, lattice.getBoundingBox(), constantDensity, zeroVelocity );

    defineDynamics(lattice, lattice.getBoundingBox(),
                   new HalfCircleDomain2D<T>(N, radius),
                   new BounceBack<T,DESCRIPTOR>);

    lattice.initialize();
}

void writeGifs(MultiBlockLattice2D<T,DESCRIPTOR>& lattice, plint iter)
{
    const plint imSize = 600;
    ImageWriter<T> imageWriter("leeloo");
    imageWriter.writeScaledGif(createFileName("u", iter, 6),
                               *computeVelocityNorm(lattice),
                               imSize, imSize );
}

int main(int argc, char* argv[]) {
    plbInit(&argc, &argv);

    global::directories().setOutputDir("./tmp/");

    // Parameters of the simulation
    plint N         = 400;    // Use a 400x200 domain.
    plint maxT      = 20001;
    plint imageIter = 1000;
    T omega        = 1.;
    plint radius    = N/3;    // Inner radius of the half-circle.

    // Parameters for the creation of the multi-block.

    // d is the width of the block which is exempted from the full domain.
    plint d = (plint) (2.*std::sqrt(util::sqr(radius)-util::sqr(N/4.)));
    plint x0 = (N-d)/2 + 1;  // Begin of the exempted block.
    plint x1 = (N+d)/2 - 1;  // End of the exempted block.

    // Create a block distribution with the three added blocks.
    plint envelopeWidth = 1;
    SparseBlockStructure2D sparseBlock(N+1, N/2+1);
    sparseBlock.addBlock(Box2D(0, x0,      0, N/2),   sparseBlock.nextIncrementalId());
    sparseBlock.addBlock(Box2D(x0+1, x1-1, 0, N/4+1), sparseBlock.nextIncrementalId());
    sparseBlock.addBlock(Box2D(x1, N,      0, N/2),   sparseBlock.nextIncrementalId());

    // Instantiate the multi-block, based on the created block distribution and
    // on default parameters.
    MultiBlockLattice2D<T, DESCRIPTOR> lattice (
        MultiBlockManagement2D(
            blockDistribution,
            defaultMultiBlockPolicy2D().getThreadAttribution(), envelopeWidth ),
        defaultMultiBlockPolicy2D().getBlockCommunicator(),
        defaultMultiBlockPolicy2D().getCombinedStatistics(),
        defaultMultiBlockPolicy2D().getMultiCellAccess<T,DESCRIPTOR>(),
        new BGKdynamics<T,DESCRIPTOR>(omega)
    );

    OnLatticeBoundaryCondition2D<T,DESCRIPTOR>*
        boundaryCondition = createLocalBoundaryCondition2D<T,DESCRIPTOR>();

    halfCircleSetup(lattice, N, radius, *boundaryCondition);

    // Main loop over time iterations.
    for (plint iT=0; iT<maxT; ++iT) {
        if (iT%imageIter==0) {
            pcout << "Saving Gif at time step " << iT << endl;
            writeGifs(lattice, iT);
        }
        lattice.collideAndStream();
    }

    delete boundaryCondition;
}
Line 3
The half-circle shape of the simulation is described by the function class HalfCircleDomain2D. Later in the program, this object is use to define all cells outside the cell as bounce-back nodes, and inside the channel as BGK dynamics nodes. The mechanisms involved here are further investigated in Tutorial 2.3 (technically, we will refer to class HalfCircleDomain2D as a data-processor), while the present tutorial concentrates on sparse domain implementations.
Line 85-87
In this part, the coordinates of the three blocks which cover the channel are computed.
Line 91
A MultiBlockDistribution2D describes the arrangement of atomic-blocks inside a multi-block. Three blocks are added. Each of them gets an integer ID, which in this case is incremental (number 0, 1, an 2). As explained in Tutorial 2.4: Parallelism in a manually created multi-block, this ID is used in parallel programs to associate a block to a MPI process.
Line 98
Instead of the usual default constructor of the MultiBlockLattice2D, a more detailed version is used which offers the possibility to control various aspects of the multi-block instantiation, and in particular to specify the arrangement of the internal atomic-blocks. The other arguments of this constructors are not discussed here. As it is shown in the code, one can always refer to the object returned by defaultMultiBlockPolicy2D() to use a default choice for these parameters. The argument envelopeWidth=1 hints at the fact that the dynamics executed on this lattice uses a nearest-neighbor communication pattern, and an overlap of one-cell width between atomic-blocks is needed to express the streaming step inside an atomic-block consistently.

Tutorial 2.3: Understanding data processors

Both atomic-blocks and multi-blocks are most often manipulated with help of constructs known under the name of “data processor” in Palabos. A data processor specifies an operation to be executed on each cell of a chosen domain on the block. In case a multi-block is used, the desired domain is intersected with the region occupied by the internal atomic-blocks, and then executed on the area of intersection of each atomic-block. Data processors are often used indirectly, through higher-level user functions. In the code of Tutorial 2.2 for example, bounce-back nodes are defined by calling the function defineDynamics. This function in its turn instantiates a data processor, responsible for re-defining the collision step on all the chosen nodes. Other types of data processors are used for example to attribute initial values to the simulation, define boundary conditions, or compute data for post-processing. The user functions offered in Palabos for all these operations are listed in the appendix of the user guide. It should also be mentioned that data processors can act simultaneously on several blocks, and in this way create a coupling between two block-lattices, or between a block-lattice and a scalar-field or tensor-field. This behavior is for example used to compute the velocity in a block-lattice and store the result in a three-component tensor-field, or to create the coupling between two lattices for the implementation of multi-component fluids.

In order to obtain a better understanding of how such a data processor works, the call to the function defineDynamics in the previous tutorial is now replaced successively by two alternative, explicit ways of attributing a BounceBack dynamics to the chosen wall cells. All discussed code constructs can also be found in the file tutorial_2_3.cpp.

The first approach is easiest to understand, because it is entirely manual. A loop over all space directions is written manually and, with help of the function class HalfCircleDomain2D from the previous tutorial, it is decided to re-define the dynamics of chosen nodes through the interface of the multi-block:

/// Manual instantiation of the bounce-back nodes for the boundary.
/** Attention: This is NOT the right way of doing, because it is slow and
 *  non-parallelizable. Instead, use a data-processor.
 */
void createHalfCircleManually (
        MultiBlockLattice2D<T,DESCRIPTOR>& lattice, plint N, plint radius )
{
    HalfCircleDomain2D<T> domain(N,radius);
    for (plint iX=0; iX<=N; ++iX) {
        for (plint iY=0; iY<=N/2; ++iY) {
            if (domain(iX,iY)) {
                defineDynamics(lattice, iX, iY, new BounceBack<T,DESCRIPTOR>);
            }
        }
    }
}

This approach is really self-explaining. It is however crucial to understand that such an approach should never be chosen in practice, for efficiency considerations. This approach is slow, because for each access to a cell through the function defineDynamics, the atomic-block on which the cell is located must first be determined. Even worse yet, this way of accessing a multi-block is not properly parallelized (the result of the operation is correct in parallel, but the speed of the operation is equal or even inferior to the serial execution speed). As it is shown in the next tutorial, parallelism is implemented in Palabos by attributing the constituent atomic-blocks of a multi-block to different processors. During collective operations like the definition of the dynamics on a sub-domain, it is expected that each processor works only with the atomic-block assigned to it. Writing out explicitly a loop over space directions however means to assign work for the whole domain to all processors. This can impact the parallel performance of a program significantly, even if it is only part of the setup of a simulation. In programs which are massively parallelized on hundreds or thousands of processors/cores, a badly parallelized stage of initial program setup can represent a bottleneck for the whole simulation. One of the most important rules in Palabos is therefore to never write loops over space dimensions in end-user applications. Instead, space loops are only written inside data processors, as shown below.

Here’s how the construction of the half-circle domain is achieved with help of a data processor, without resorting to the helper function defineDynamics:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/// This functional defines a data processor for the instantiation
///   of bounce-back nodes following the half-circle geometry.
template<typename T, template<typename U> class Descriptor>
class HalfCircleInstantiateFunctional
    : public BoxProcessingFunctional2D_L<T,Descriptor>
{
public:
    HalfCircleInstantiateFunctional(plint N_, plint radius_)
        : N(N_), radius(radius_)
    { }
    virtual void process(Box2D domain, BlockLattice2D<T,Descriptor>& lattice) {
        HalfCircleDomain2D<T> bbDomain(N,radius);
        Dot2D relativeOffset = lattice.getLocation();
        for (plint iX=domain.x0; iX<=domain.x1; ++iX) {
            for (plint iY=domain.y0; iY<=domain.y1; ++iY) {
                if (bbDomain(iX+relativeOffset.x,iY+relativeOffset.y)) {
                    lattice.attributeDynamics (
                            iX,iY, new BounceBack<T,DESCRIPTOR> );
                }
            }
        }
    }
    virtual BlockDomain::DomainT appliesTo() const {
        // Dynamics needs to be instantiated everywhere, including envelope.
        return BlockDomain::bulkAndEnvelope;
    }
    virtual void getModificationPattern(std::vector<bool>& isWritten) const {
        isWritten[0] = true;
    }
    virtual HalfCircleInstantiateFunctional<T,Descriptor>* clone() const
    {
        return new HalfCircleInstantiateFunctional<T,Descriptor>(*this);
    }
private:
    plint N;
    plint radius;
};

/// Automatic instantiation of the bounce-back nodes for the boundary,
///   using a data processor.
void createHalfCircleFromDataProcessor (
        MultiBlockLattice2D<T,DESCRIPTOR>& lattice, plint N, plint radius )
{
    applyProcessingFunctional (
            new HalfCircleInstantiateFunctional<T,DESCRIPTOR>(N,radius),
            lattice.getBoundingBox(), lattice );
}
Line 4
The class HalfCircleInstantiateFunctional represents the data processor for the instantiation of the half-circle geometry. It inherits from BoxProcessingFunctional2D_L, where the L stands for “lattice”, in reference to the fact that the processor acts on a lattice. The two other possibilities are S for scalar-field, and T for tensor-field. Similarly, a coupling between a block-lattice and a scalar-field is obtained by inheriting from BoxProcessingFunctional2D_LS.
Line 11
The method process defines the heart of the data processor. It is invoked repeatedly on all atomic blocks, after intersection of the original domain with the domain of the individual blocks: Palabos computes the sub-division, and you write the code to be executed on the sub-domains. The parameters delivered to the method process are an atomic-block and the domain on which the operation is executed, and it is your responsibility to write a loop over all cells of this domain. It should be pointed out that Palabos could have had a simpler interface, in which the data processor acts on a single cell, and the space loop is written somewhere in Palabos’ library code. The present choice is however preferrable for efficiency reasons, because it avoids the costs of a function call on each treated cell. While such a function call is acceptable for computationally intensive operations (such as, computing the collision step for a cell), in can have an important relative performance impact on lightweight data processors which execute, say, a simple arithmetic operation. As it is written currently, the overhead due to the data processor’s interface is practically invisible, and the data is processed with the same efficiency as manually programmed, raw data constructs. When this raw efficiency is not required (for example during an initialization step which takes little time), it is possible to use more elegant interfaces to the data-processor which free you from the burden of writing loops manually. For more information, refer to the constructs OneCellFunctionalXD and OneCellIndexedFunctionalXD in the user guide (appendix-functions). An example usage of class OneCellIndexedFunctional2D is presented in the program multiComponent2d/rayleighTaylor.cpp.
Line 13
The domain delivered by parameter to the method process uses coordinates relative to the atomic-block. They are not compatible with the global coordinate system of the multi-block, and can therefore not be used directly to decide whether a given cell is part of the fluid channel or not. First, they need to be converted to global coordinates by accessing the relative position of the atomi-block within the multi-block, through the command lattice.getLocation(). Again, this coordinate transformation is done manually here for efficiency reasons; you can also chose to use a slightly less efficient, but more elegant automatic mechanism by using the construct OneCellIndexedFunctionalXD (see appendix-functions in the user guide).
Line 23-26
All atomic-blocks exceed the domain assigned to them by a cell layer, with a width of one cell for nearest-neighbor lattice Boltzmann models, or more for higher order models. This excess, referred to as “the envelope”, is required to separate conceptually the collision/streaming cycle within atomic blocks from a communication step between them. A data processor can either act on the bulk (the domain without the envelope), as indicated by the keyword BlockDomain::bulk, or on the entire domain (BlockDomain::bulkAndEnvelope). The former approach is necessary if the data processor is non-local, which happens for example when you evaluate gradients through a finite difference scheme, and need to read values located outside the area domain provided to the data processor, with an excess of maximally one cell. In this case, Palabos will instantiate a communication between atomic-blocks after the execution of the data processor, to assign values to the envelope nodes. If on the other hand the operation of the data processor is local, you can chose to act on the envelope as well, and to save Palabos the burden of a time consuming communication step.
Line 27-29
Tell Palabos that the data processor performs write-accesses to the block-lattice. This information is needed internally by Palabos in order to construct a communication pattern, and to only communicate values which have actually been modified.
Line 30-33
The definition of the method clone is paradigmatic in Palabos: it is required for internal memory management mechanisms. All classes situated in an Palabos inheritance hierarchy have such a method, and this method has always exactly the same content. You’ll get used to writing this method without even thinking about it.
Line 44
Through a function call to applyProcessingFunctional, a data processor is executed exactly once on a multi-block or an atomic-block. Alternatively, the function integrateProcessingFunctional is used to add a data processor to a block, and have it executed periodically at each time iteration cycle. The latter approach is chosen for data processors which are part of the dynamics of the simulation. Examples are non-local boundary conditions, or couplings between lattices in multi-component fluids.

Tutorial 2.4: Parallelism in a manually created multi-block

If you use multi-blocks as your basic data type, if you construct them in a standard way, as in all examples of Tutorial 1, and if you compile the program with the MPI_PARALLEL flag, then the program is automatically parallelized. This means that the multi-block is automatically subdivided into components which are associated to the different processors.

In the previous tutorials, the multi-block structure was created manually, which means that parallelism must be handled manually too. Let’s take for example the code from Tutorial 2.2: Creating a multi-block structure manually, edit the Makefile, set MPI_PARALLEL=true, and recompile. Because the domain is manually created with three blocks in this example, it can only be parallelized by means of exactly three processes: mpirun -np 3 ./tutorial_2_2. The algorithm which associates blocks to MPI processes is provided as an argument to the constructor of the multi-block, in the listing of Tutorial 2.2: Creating a multi-block structure manually above, at line 101, an argument defaultMultiBlockPolicy2D().getThreadAttribution(), which imposes a default parallelization strategy: the block with ID 0 go to the MPI thread with ID 0, block #1 goes to MPI thread #1, and so on. The block-to-thread attribution can however also be directed manually, by creating an object of type ExplicitThreadAttribution, configuring it, and providing it to the constructor of the multi-block (see file multiBlock/threadAttribution.h for a definition of the class ExplicitThreadAttribution).

We leave it as an exercice to modify the code of tutorial 2.2 in such a way that it can be executed on exactly two MPI processes, by assigning two blocks to the first process, and the third block to the second process.