# Tutorial 2: Understanding the multi-block structure¶

The code structure of Palabos programs is driven by the duality between atomic-blocks, which represent regular data-arrays, and complex multi-blocks. Thanks to a practically identical interface, they appear to the user on a seemingly equal footing. In reality, however, there exists a hierarchical relationship bewteen them. The multi-block is actually a composite construct of several adjacent blocks and uses itself atomic-blocks for its internal implementation. Atomic- and multi-blocks come in three flavors: the (multi-) block-lattice which holds lattice Boltzmann variables, the (multi-) scalar-field for scalar data-arrays, and the (multi-) tensor-field for vector- or tensor-valued data-arrays.

Tutorial 2.1 exploits the similarity of interface between atomic-blocks and multi-blocks, and rewrites a program from Tutorial 1, replacing the multi block-lattice by a plain block-lattice. Tutorial 2.2 provides further insight in the multi-block and gives the user the possibility to construct such an object manually by explicitly specifying the position of the consisting atomic-blocks. In tutorial 2.3, a multi-block (or for all means, an atomic-block) is manipulated through the explicit use of so-called data processors. They provide explicit access to atomic-block cells inside a multi-block, even if the internal structure of the multi-block is unknown to the user. Tutorial 2.4 finally explains how a program is parallelized (again manually for didactic purposes) by attributing the components of a multi-block to different threads.

Warning

The purpose of Tutorial 2 is to provide deeper insight into the mechanisms of Palabos, and not to establish good coding practice. As already mentioned, you should in practice always prefer multi-blocks over atomic-blocks at the end-user level, in spite of the counter-example shown in Tutorial 2.1.

## Tutorial 2.1: Formulating a program with an atomic-block¶

In Tutorial 1.5: Boundary conditions, a simulation for a 2D Poiseuille flow was presented, which used the data structure MultiBlock2D to hold the data. This is the right way of doing, because it is recommended to use multi-block structures for practically all purposes in end-user programs. Strictly speaking, a non-parallel version of this program could however also be coded using a simpler data structure, because of the simple, rectangular shape of the domain. Such a conversion is straightforward in Palabos, because most of the code is generic and works identically for atomic-blocks and multi-blocks. As it can be seen on the following code, also found in the file tutorial_2_1.cpp, it is sufficient to replace the keyword MultiBlockLattice2D by BlockLattice2D at two places:

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33``` ```/* Code 2.1 in the Palabos tutorial */ // ... Definition of the Poiseuille profile, and instantiation of the geometry are // exactly identical with a multi-block or an atomic-block // Multi-Block version of tutorial 1.5: // void writeGifs(MultiBlockLattice2D& lattice, plint iter) // { ... // Atomic-Block version of tutorial 2.1: void writeGifs(BlockLattice2D& lattice, plint iter) { // ... int main(int argc, char* argv[]) { // ... Definition of the parameters are identical in both versions // Multi-Block version of tutorial 1.5: // MultiBlockLattice2D lattice ( // parameters.getNx(), parameters.getNy(), // new BGKdynamics(parameters.getOmega()) ); // Atomic-Block version of tutorial 2.1: BlockLattice2D lattice ( parameters.getNx(), parameters.getNy(), new BGKdynamics(parameters.getOmega()) ); // ... Creating the initial condition and running the simulation is // identical in both versions } ```

Although very little has changed on the surface, the algorithm instantiated in the above code is much simpler than in the multi-block case. You can view the BlockLattice2D as a simple nx-by-ny-by-9 value array, just as you would use it in a straightforward example lattice Boltzmann code.

As you progress with your work with Palabos, you will learn to appreciate the abstraction mechanism through which the complex multi-block behaves like a regular construct. In practical work, it offers a relatively simple way to cope with complicated constructs.

## Tutorial 2.2: Creating a multi-block structure manually¶

One of the uses of the multi-block consists in the memory-efficient implementation of sparse domains. To understand this, consider the following 2D fluid problem:
The fluid is confined in a channel shaped like a half-circle. Pressure boundary conditions are used with an appropriate difference between inlet and outlet, responsible for driving the fluid through the channel. It is obviously memory wasting to represent this problem by a regular matrix structure, because a large area in the interior of the half-circle is unused. This is therefore a typical candidate for a sparse domain implementation. While the sparse structure of a multi-block is usually automatically created by an appropriate tool, it is in the following created manually for didactical reasons. We choose an approach in which the channel is covered by three block, while the fourth block, in the center of the half-circle, is left void, as indicated by the black area on the following image:

The construction of this sparse domain is done in the code tutorial_2_2.cpp, it is integrally printed here (without the usual header lines):

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125``` ```/// Describe the geometry of the half-circular channel, used in tutorial 2. template class BounceBackNodes : public DomainFunctional2D { public: BounceBackNodes(plint N, plint radius) : cx(N/2), cy(N/2), innerR(radius), outerR(N/2) { } /// Return true for all cells outside the channel, on which bounce-back /// dynamics must be instantiated. virtual bool operator() (plint iX, plint iY) const { T rSqr = util::sqr(iX-cx) + util::sqr(iY-cy); return rSqr <= innerR*innerR || rSqr >= outerR*outerR; } virtual BounceBackNodes* clone() const { return new BounceBackNodes(*this); } private: plint cx; //< X-position of the center of the half-circle. plint cy; //< Y-position of the center of the half-circle. plint innerR; //< Outer radius of the half-circle. plint outerR; //< Inner radius of the half-circle. }; void halfCircleSetup ( MultiBlockLattice2D& lattice, plint N, plint radius, OnLatticeBoundaryCondition2D& boundaryCondition ) { // The channel is pressure-driven, with a difference deltaRho // between inlet and outlet. T deltaRho = 1.e-2; T rhoIn = 1. + deltaRho/2.; T rhoOut = 1. - deltaRho/2.; Box2D inlet (0, N/2, N/2, N/2); Box2D outlet(N/2+1, N, N/2, N/2); boundaryCondition.addPressureBoundary1P(inlet, lattice); boundaryCondition.addPressureBoundary1P(outlet, lattice); // Specify the inlet and outlet density. setBoundaryDensity (lattice, inlet, rhoIn); setBoundaryDensity (lattice, outlet, rhoOut); // Create the initial condition. Array zeroVelocity(0.,0.); T constantDensity = (T)1; initializeAtEquilibrium ( lattice, lattice.getBoundingBox(), constantDensity, zeroVelocity ); defineDynamics(lattice, lattice.getBoundingBox(), new BounceBackNodes(N, radius), new BounceBack); lattice.initialize(); } void writeGifs(MultiBlockLattice2D& lattice, plint iter) { const plint imSize = 600; ImageWriter imageWriter("leeloo"); imageWriter.writeScaledGif(createFileName("u", iter, 6), *computeVelocityNorm(lattice), imSize, imSize ); } int main(int argc, char* argv[]) { plbInit(&argc, &argv); global::directories().setOutputDir("./tmp/"); // Parameters of the simulation plint N = 400; // Use a 400x200 domain. plint maxT = 20001; plint imageIter = 1000; T omega = 1.; plint radius = N/3; // Inner radius of the half-circle. // Parameters for the creation of the multi-block. // d is the width of the block which is exempted from the full domain. plint d = (plint) (2.*std::sqrt(util::sqr(radius)-util::sqr(N/4.))); plint x0 = (N-d)/2 + 1; // Begin of the exempted block. plint x1 = (N+d)/2 - 1; // End of the exempted block. // Create a block distribution with the three added blocks. plint envelopeWidth = 1; SparseBlockStructure2D sparseBlock(N+1, N/2+1); sparseBlock.addBlock(Box2D(0, x0, 0, N/2), sparseBlock.nextIncrementalId()); sparseBlock.addBlock(Box2D(x0+1, x1-1, 0, N/4+1), sparseBlock.nextIncrementalId()); sparseBlock.addBlock(Box2D(x1, N, 0, N/2), sparseBlock.nextIncrementalId()); // Instantiate the multi-block, based on the created block distribution and // on default parameters. MultiBlockLattice2D lattice ( MultiBlockManagement2D( blockDistribution, defaultMultiBlockPolicy2D().getThreadAttribution(), envelopeWidth ), defaultMultiBlockPolicy2D().getBlockCommunicator(), defaultMultiBlockPolicy2D().getCombinedStatistics(), defaultMultiBlockPolicy2D().getMultiCellAccess(), new BGKdynamics(omega) ); pcout << getMultiBlockInfo(lattice) << std::endl; OnLatticeBoundaryCondition2D* boundaryCondition = createLocalBoundaryCondition2D(); halfCircleSetup(lattice, N, radius, *boundaryCondition); // Main loop over time iterations. for (plint iT=0; iT

Among other information, the program prints the following lines to the screen, as a result of the instruction on line 108:

```Size of the multi-block:     401-by-201
Number of atomic-blocks:     3
Smallest atomic-block:       172-by-102
Largest atomic-block:        115-by-201
Number of allocated cells:   0.063573 million
Fraction of allocated cells: 78.8737 percent
```

This behavior, as well as other parts of the code, are commented in the following:

Line 3
The half-circle shape of the simulation is described by the function class BounceBackNodes. Later in the program, this object is use to define all cells outside the cell as bounce-back nodes, and inside the channel as BGK dynamics nodes. The mechanisms involved here are further investigated in Tutorial 2.3 (technically, we will refer to class BounceBackNodes as a data-processor), while the present tutorial concentrates on sparse domain implementations.
Line 85-87
In this part, the coordinates of the three blocks which cover the channel are computed.
Line 91
A MultiBlockDistribution2D describes the arrangement of atomic-blocks inside a multi-block. Three blocks are added. Each of them gets an integer ID, which in this case is incremental (number 0, 1, an 2). As explained in Tutorial 2.4: Generate the sparse-block structure automatically, this ID is used in parallel programs to associate a block to a MPI process.
Line 98
Instead of the usual default constructor of the MultiBlockLattice2D, a more detailed version is used which offers the possibility to control various aspects of the multi-block instantiation, and in particular to specify the arrangement of the internal atomic-blocks. The other arguments of this constructors are not discussed here. As it is shown in the code, one can always refer to the object returned by defaultMultiBlockPolicy2D() to use a default choice for these parameters. The argument envelopeWidth=1 hints at the fact that the dynamics executed on this lattice uses a nearest-neighbor communication pattern, and an overlap of one-cell width between atomic-blocks is needed to express the streaming step inside an atomic-block consistently.
Line 108
The getMultiBlockInfo function provides some insight in the internal structure of a multi-block. You learn for example that the block added in the middle has dimensions 172-by-102, while the two later blocks have size 115-by-201. Switching from a regular to a sparse block structure obviously provided a gain of 21% in memory usage.

## Tutorial 2.3: Understanding data processors¶

Both atomic-blocks and multi-blocks are most often manipulated with help of constructs known under the name of “data processor” in Palabos. A data processor specifies an operation to be executed on each cell of a chosen domain on the block. In case a multi-block is used, the desired domain is intersected with the region occupied by the internal atomic-blocks, and then executed on the area of intersection of each atomic-block. Data processors are often used indirectly, through higher-level user functions. In the code of Tutorial 2.2 for example, bounce-back nodes are defined by calling the function defineDynamics. This function in its turn instantiates a data processor, responsible for re-defining the collision step on all the chosen nodes. Other types of data processors are used for example to attribute initial values to the simulation, define boundary conditions, or post-process data. The user functions offered in Palabos for all these operations are listed in the appendix of the user guide. It should also be mentioned that data processors can act simultaneously on several blocks, and in this way create a coupling between two block-lattices, or between a block-lattice and a scalar-field or tensor-field. This behavior is for example used to compute the velocity in a block-lattice and store the result in a three-component tensor-field, or to create the coupling between two lattices for the implementation of multi-component fluids.

In order to obtain a better understanding of how such a data processor works, the call to the function defineDynamics in the previous tutorial is now replaced successively by two alternative, explicit ways of attributing a BounceBack dynamics to the chosen wall cells. All discussed code constructs can also be found in the file tutorial_2_3.cpp.

The first approach is easiest to understand, because it is entirely manual. A loop over all space directions is written manually and, with help of the function class BounceBackNodes from the previous tutorial, it is decided to re-define the dynamics of chosen nodes through the interface of the multi-block:

```/// Manual instantiation of the bounce-back nodes for the boundary.
/** Attention: This is NOT the right way of doing, because it is slow and
*  non-parallelizable. Instead, use a data-processor.
*/
void createHalfCircleManually (
MultiBlockLattice2D<T,DESCRIPTOR>& lattice, plint N, plint radius )
{
for (plint iX=0; iX<=N; ++iX) {
for (plint iY=0; iY<=N/2; ++iY) {
if (domain(iX,iY)) {
defineDynamics(lattice, iX, iY, new BounceBack<T,DESCRIPTOR>);
}
}
}
}
```

This approach is really self-explaining. It is however crucial to understand that such an approach should never be chosen in practice, for efficiency considerations. This approach is slow, because for each access to a cell through the function defineDynamics, the atomic-block on which the cell is located must first be determined. Even worse yet, this way of accessing a multi-block is not properly parallelized (the result of the operation is correct in parallel, but the speed of the operation is equal or even inferior to the serial execution speed). As it is shown in the next tutorial, parallelism is implemented in Palabos by assigning each atomic-block inside a multi-block to a given processor (CPU). During operations that are executed on an extended domain, such as, the definition of the collision rule to a given part of the simulation, it is expected that each processor works only with the atomic-block assigned to it. Writing out explicitly a loop over space directions however means to assign work for the whole domain to all processors. This can impact the parallel performance of a program significantly, even if it is only part of the setup of a simulation. In programs which are massively parallelized on hundreds or thousands of processors/cores, a badly parallelized initial stage can represent a bottleneck for the whole simulation. One of the most important rules in Palabos is therefore to never write loops over space dimensions in end-user applications. Instead, space loops are only written inside data processors, as shown below.

Here’s how you can manually write a data processor which creates the half-circle domain, without resorting to the helper function defineDynamics (the function defineDynamics is of course nothing else than a wrapper which instantiates a data processor for you):

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43``` ```/// This functional defines a data processor for the instantiation /// of bounce-back nodes following the half-circle geometry. template class Descriptor> class HalfCircleInstantiateFunctional : public BoxProcessingFunctional2D_L { public: HalfCircleInstantiateFunctional(plint N_, plint radius_) : N(N_), radius(radius_) { } virtual void process(Box2D domain, BlockLattice2D& lattice) { BounceBackNodes bbDomain(N,radius); Dot2D relativeOffset = lattice.getLocation(); for (plint iX=domain.x0; iX<=domain.x1; ++iX) { for (plint iY=domain.y0; iY<=domain.y1; ++iY) { if (bbDomain(iX+relativeOffset.x,iY+relativeOffset.y)) { lattice.attributeDynamics ( iX,iY, new BounceBack ); } } } } virtual void getTypeOfModification(std::vector& modified) const { modified[0] = modif::dynamicVariables; } virtual HalfCircleInstantiateFunctional* clone() const { return new HalfCircleInstantiateFunctional(*this); } private: plint N; plint radius; }; /// Automatic instantiation of the bounce-back nodes for the boundary, /// using a data processor. void createHalfCircleFromDataProcessor ( MultiBlockLattice2D& lattice, plint N, plint radius ) { applyProcessingFunctional ( new HalfCircleInstantiateFunctional(N,radius), lattice.getBoundingBox(), lattice ); } ```
Line 4
The class HalfCircleInstantiateFunctional represents the data processor for the instantiation of the half-circle geometry. It inherits from BoxProcessingFunctional2D_L, where the L stands for “lattice”, in reference to the fact that the processor acts on a lattice. The two other possibilities are S for scalar-field, and T for tensor-field. Similarly, a coupling between a block-lattice and a scalar-field is obtained by inheriting from BoxProcessingFunctional2D_LS.
Line 11
The method process defines the heart of the data processor. It is invoked repeatedly on all atomic blocks, after intersection of the original domain with the domain of the individual blocks: Palabos computes the sub-division, and you write the code to be executed on the sub-domains. The parameters delivered to the method process are an atomic-block and the domain on which the operation is executed, and it is your responsibility to write a loop over all cells of this domain. It should be pointed out that Palabos could have had a simpler interface, in which the data processor acts on a single cell, and the space loop is written somewhere in Palabos’ library code. The present choice is however preferrable for efficiency reasons, because it avoids the costs of a function call on each treated cell. While such a function call is acceptable for computationally intensive operations (such as, computing the collision step for a cell), in can have an important relative performance impact on lightweight data processors which execute, say, a simple arithmetic operation. As it is written currently, the overhead due to the data processor’s interface is practically invisible, and the data is processed with the same efficiency as manually programmed, raw data constructs. When this raw efficiency is not required (for example during an initialization step which takes little time), it is possible to use more elegant interfaces to the data-processor which free you from the burden of writing loops manually. For more information, refer to the constructs OneCellFunctionalXD and OneCellIndexedFunctionalXD in the user guide (appendix-functions). An example usage of class OneCellIndexedFunctional2D is presented in the program multiComponent2d/rayleighTaylor.cpp.
Line 13
The domain delivered by parameter to the method process uses coordinates relative to the atomic-block. They are not compatible with the global coordinate system of the multi-block, and can therefore not be used directly to decide whether a given cell is part of the fluid channel or not. First, they need to be converted to global coordinates by accessing the relative position of the atomi-block within the multi-block, through the command lattice.getLocation(). Again, this coordinate transformation is done manually here for efficiency reasons; you can also chose to use a slightly less efficient, but more elegant automatic mechanism by using the construct OneCellIndexedFunctionalXD (see appendix-functions in the user guide).
Line 23-25
Here, you inform Palabos what type of modification your data processor has performed on the block-lattice. This information is needed internally for Palabos to set up its parallel communication pattern, and to communicate only data that has been modified. Use the identifier modif::nothing to say that the data processor performed read-only accesses, modif::staticVariables if it modified the populations (or external scalars), modif::dynamicVariables if it changed the content of the dynamics objects (for example, when it locally changed the relaxation parameter), modif::allVariables when it made changes of both kinds, and, finally, modif::dataStructure when it assigned a new dynamics object to the cells (as it did in the present case).
Line 26-29
The definition of the method clone is paradigmatic in Palabos: it is required for Palabos’ internal memory management. All classes that are placed in a Palabos inheritance hierarchy have such a method, and the definition of this method always has the exact same shape. You’ll get used to writing this method without even thinking about it.
Line 40
Through a function call to applyProcessingFunctional, a data processor is executed exactly once on a multi-block or an atomic-block. Alternatively, the function integrateProcessingFunctional is used to add a data processor to a block, and have it executed periodically at each time iteration cycle. The latter approach is chosen for data processors which are part of the dynamics of the simulation. Examples are non-local boundary conditions, or couplings between lattices in multi-component fluids.

## Tutorial 2.4: Generate the sparse-block structure automatically¶

As previously pointed out, you will practically never want to compute the internal structure of a sparse multi-block manually, and instead use one of Palabos’ automatic domain generators. For example, the Palabos application in the directory examples/showCases/aneurysm shows how you can automatically read a geometry description from an STL file, “voxelize” the domain (decide which cells are part of the fluid), and then generate the sparse structue. By the way, Palabos’ automatic domain generation algorithms go even a step further than what is done in the present tutorial, as they also instantiate sophisticated algorithms for curved walls, in which the wall location is represented with second-order accuracy (while in this tutorial the wall is represented through a stair-case shape).

Anyway, let’s get back to tutorial 2.2, and modify the code in such a way that the sparse block structure is generated automatically by Palabos:

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34``` ```... template class FluidNodes { public: FluidNodes(plint N_, plint radius_) : N(N_), radius(radius_) { } bool operator() (plint iX, plint iY) const { return ! BounceBackNodes(N,radius)(iX,iY); } private: plint N, radius; }; ... MultiScalarField2D flagMatrix(N+1, N/2+1); setToFunction(flagMatrix, flagMatrix.getBoundingBox(), FluidNodes(N, radius)); plint blockSize = 15; plint envelopeWidth = 1; MultiBlockManagement2D sparseBlockManagement = computeSparseManagement ( *plb::reparallelize(flagMatrix, blockSize,blockSize), envelopeWidth ); // Instantiate the multi-block, based on the created block distribution and // on default parameters. MultiBlockLattice2D lattice ( sparseBlockManagement, defaultMultiBlockPolicy2D().getBlockCommunicator(), defaultMultiBlockPolicy2D().getCombinedStatistics(), defaultMultiBlockPolicy2D().getMultiCellAccess(), new BGKdynamics(omega) ); ```
Line 3-13
In order to create the sparse block structure, Palabos needs to know which lattice cells are part of the fluid. A specific functional is implemented here which returns this information, which is, the negation of BounceBackNodes.
Line 17 and 18
An integer-valued flag matrix is constructed which is zero valued on the bounce-back domain, and non-zero on fluid nodes.
Line 19
Let’s cover the domain with blocks of size (approximately) 15x15. The smaller these blocks, and the more memory efficient a simulation you get. If the blocks are chosen too small, you loose however efficiency, as the overhead of communicating between blocks is too large. The optimal atomic block size depends on your hardware and must be determined experimentally.
Line 23
To start with, the multi-block is restructured to be covered by blocks of size 15x15.
Line 22
As a next step, all blocks that have no fluid cell are removed. The result is saved in a data structure of type MultiBlockManagement
Line 28
The created multi-block management structure is now used to create a block-lattice with the sparse block-structure.
As it can be seen from the program’s output, this code is more efficient than the one in tutorial 2.2, as it removes 45% of the cells in the original multi-block (against 11% in tutorial 2.2), due to a more efficient coverage by small blocks. The following image shows, in red, the domain for which memory is allocated in this program:

## Tutorial 2.5: Parallelism in a manually created multi-block¶

If you use multi-blocks as your basic data type, if you construct them in a standard way, as in all examples of Tutorial 1, and if you compile the program with the MPI_PARALLEL flag, then the program is automatically parallelized. This means that the multi-block is automatically subdivided into components which are associated to the different processors.