The code structure of Palabos programs is driven by the duality between atomic-blocks, which represent regular data-arrays, and complex multi-blocks. Thanks to a practically identical interface, they appear to the user on a seemingly equal footing. In reality, however, there exists a hierarchical relationship bewteen them. The multi-block is actually a composite construct of several adjacent blocks and uses itself atomic-blocks for its internal implementation. Atomic- and multi-blocks come in three flavors: the (multi-) block-lattice which holds lattice Boltzmann variables, the (multi-) scalar-field for scalar data-arrays, and the (multi-) tensor-field for vector- or tensor-valued data-arrays.
Tutorial 2.1 exploits the similarity of interface between atomic-blocks and multi-blocks, and rewrites a program from Tutorial 1, replacing the multi block-lattice by a plain block-lattice. Tutorial 2.2 provides further insight in the multi-block and gives the user the possibility to construct such an object manually by explicitly specifying the position of the consisting atomic-blocks. In tutorial 2.3, a multi-block (or for all means, an atomic-block) is manipulated through the explicit use of so-called data processors. They provide explicit access to atomic-block cells inside a multi-block, even if the internal structure of the multi-block is unknown to the user. Tutorial 2.4 finally explains how a program is parallelized (again manually for didactic purposes) by attributing the components of a multi-block to different threads.
Warning
The purpose of Tutorial 2 is to provide deeper insight into the mechanisms of Palabos, and not to establish good coding practice. As already mentioned, you should in practice always prefer multi-blocks over atomic-blocks at the end-user level, in spite of the counter-example shown in Tutorial 2.1.
In Tutorial 1.5: Boundary conditions, a simulation for a 2D Poiseuille flow was presented, which used the data structure MultiBlock2D to hold the data. This is the right way of doing, because it is recommended to use multi-block structures for practically all purposes in end-user programs. Strictly speaking, a non-parallel version of this program could however also be coded using a simpler data structure, because of the simple, rectangular shape of the domain. Such a conversion is straightforward in Palabos, because most of the code is generic and works identically for atomic-blocks and multi-blocks. As it can be seen on the following code, also found in the file tutorial_2_1.cpp, it is sufficient to replace the keyword MultiBlockLattice2D by BlockLattice2D at two places:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | /* Code 2.1 in the Palabos tutorial
*/
// ... Definition of the Poiseuille profile, and instantiation of the geometry are
// exactly identical with a multi-block or an atomic-block
// Multi-Block version of tutorial 1.5:
// void writeGifs(MultiBlockLattice2D<T,DESCRIPTOR>& lattice, plint iter)
// { ...
// Atomic-Block version of tutorial 2.1:
void writeGifs(BlockLattice2D<T,DESCRIPTOR>& lattice, plint iter)
{
// ...
int main(int argc, char* argv[])
{
// ... Definition of the parameters are identical in both versions
// Multi-Block version of tutorial 1.5:
// MultiBlockLattice2D<T, DESCRIPTOR> lattice (
// parameters.getNx(), parameters.getNy(),
// new BGKdynamics<T,DESCRIPTOR>(parameters.getOmega()) );
// Atomic-Block version of tutorial 2.1:
BlockLattice2D<T, DESCRIPTOR> lattice (
parameters.getNx(), parameters.getNy(),
new BGKdynamics<T,DESCRIPTOR>(parameters.getOmega()) );
// ... Creating the initial condition and running the simulation is
// identical in both versions
}
|
Although very little has changed on the surface, the algorithm instantiated in the above code is much simpler than in the multi-block case. You can view the BlockLattice2D as a simple nx-by-ny-by-9 value array, just as you would use it in a straightforward example lattice Boltzmann code.
As you progress with your work with Palabos, you will learn to appreciate the abstraction mechanism through which the complex multi-block behaves like a regular construct. In practical work, it offers a relatively simple way to cope with complicated constructs.
The construction of this sparse domain is done in the code tutorial_2_2.cpp, it is integrally printed here (without the usual header lines):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | /// Describe the geometry of the half-circular channel, used in tutorial 2.
template<typename T>
class BounceBackNodes : public DomainFunctional2D {
public:
BounceBackNodes(plint N, plint radius)
: cx(N/2),
cy(N/2),
innerR(radius),
outerR(N/2)
{ }
/// Return true for all cells outside the channel, on which bounce-back
/// dynamics must be instantiated.
virtual bool operator() (plint iX, plint iY) const {
T rSqr = util::sqr(iX-cx) + util::sqr(iY-cy);
return rSqr <= innerR*innerR || rSqr >= outerR*outerR;
}
virtual BounceBackNodes<T>* clone() const {
return new BounceBackNodes<T>(*this);
}
private:
plint cx; //< X-position of the center of the half-circle.
plint cy; //< Y-position of the center of the half-circle.
plint innerR; //< Outer radius of the half-circle.
plint outerR; //< Inner radius of the half-circle.
};
void halfCircleSetup (
MultiBlockLattice2D<T,DESCRIPTOR>& lattice, plint N, plint radius,
OnLatticeBoundaryCondition2D<T,DESCRIPTOR>& boundaryCondition )
{
// The channel is pressure-driven, with a difference deltaRho
// between inlet and outlet.
T deltaRho = 1.e-2;
T rhoIn = 1. + deltaRho/2.;
T rhoOut = 1. - deltaRho/2.;
Box2D inlet (0, N/2, N/2, N/2);
Box2D outlet(N/2+1, N, N/2, N/2);
boundaryCondition.addPressureBoundary1P(inlet, lattice);
boundaryCondition.addPressureBoundary1P(outlet, lattice);
// Specify the inlet and outlet density.
setBoundaryDensity (lattice, inlet, rhoIn);
setBoundaryDensity (lattice, outlet, rhoOut);
// Create the initial condition.
Array<T,2> zeroVelocity(0.,0.);
T constantDensity = (T)1;
initializeAtEquilibrium (
lattice, lattice.getBoundingBox(), constantDensity, zeroVelocity );
defineDynamics(lattice, lattice.getBoundingBox(),
new BounceBackNodes<T>(N, radius),
new BounceBack<T,DESCRIPTOR>);
lattice.initialize();
}
void writeGifs(MultiBlockLattice2D<T,DESCRIPTOR>& lattice, plint iter)
{
const plint imSize = 600;
ImageWriter<T> imageWriter("leeloo");
imageWriter.writeScaledGif(createFileName("u", iter, 6),
*computeVelocityNorm(lattice),
imSize, imSize );
}
int main(int argc, char* argv[]) {
plbInit(&argc, &argv);
global::directories().setOutputDir("./tmp/");
// Parameters of the simulation
plint N = 400; // Use a 400x200 domain.
plint maxT = 20001;
plint imageIter = 1000;
T omega = 1.;
plint radius = N/3; // Inner radius of the half-circle.
// Parameters for the creation of the multi-block.
// d is the width of the block which is exempted from the full domain.
plint d = (plint) (2.*std::sqrt(util::sqr(radius)-util::sqr(N/4.)));
plint x0 = (N-d)/2 + 1; // Begin of the exempted block.
plint x1 = (N+d)/2 - 1; // End of the exempted block.
// Create a block distribution with the three added blocks.
plint envelopeWidth = 1;
SparseBlockStructure2D sparseBlock(N+1, N/2+1);
sparseBlock.addBlock(Box2D(0, x0, 0, N/2), sparseBlock.nextIncrementalId());
sparseBlock.addBlock(Box2D(x0+1, x1-1, 0, N/4+1), sparseBlock.nextIncrementalId());
sparseBlock.addBlock(Box2D(x1, N, 0, N/2), sparseBlock.nextIncrementalId());
// Instantiate the multi-block, based on the created block distribution and
// on default parameters.
MultiBlockLattice2D<T, DESCRIPTOR> lattice (
MultiBlockManagement2D(
blockDistribution,
defaultMultiBlockPolicy2D().getThreadAttribution(), envelopeWidth ),
defaultMultiBlockPolicy2D().getBlockCommunicator(),
defaultMultiBlockPolicy2D().getCombinedStatistics(),
defaultMultiBlockPolicy2D().getMultiCellAccess<T,DESCRIPTOR>(),
new BGKdynamics<T,DESCRIPTOR>(omega)
);
pcout << getMultiBlockInfo(lattice) << std::endl;
OnLatticeBoundaryCondition2D<T,DESCRIPTOR>*
boundaryCondition = createLocalBoundaryCondition2D<T,DESCRIPTOR>();
halfCircleSetup(lattice, N, radius, *boundaryCondition);
// Main loop over time iterations.
for (plint iT=0; iT<maxT; ++iT) {
if (iT%imageIter==0) {
pcout << "Saving Gif at time step " << iT << endl;
writeGifs(lattice, iT);
}
lattice.collideAndStream();
}
delete boundaryCondition;
}
|
Among other information, the program prints the following lines to the screen, as a result of the instruction on line 108:
Size of the multi-block: 401-by-201
Number of atomic-blocks: 3
Smallest atomic-block: 172-by-102
Largest atomic-block: 115-by-201
Number of allocated cells: 0.063573 million
Fraction of allocated cells: 78.8737 percent
This behavior, as well as other parts of the code, are commented in the following:
Both atomic-blocks and multi-blocks are most often manipulated with help of constructs known under the name of “data processor” in Palabos. A data processor specifies an operation to be executed on each cell of a chosen domain on the block. In case a multi-block is used, the desired domain is intersected with the region occupied by the internal atomic-blocks, and then executed on the area of intersection of each atomic-block. Data processors are often used indirectly, through higher-level user functions. In the code of Tutorial 2.2 for example, bounce-back nodes are defined by calling the function defineDynamics. This function in its turn instantiates a data processor, responsible for re-defining the collision step on all the chosen nodes. Other types of data processors are used for example to attribute initial values to the simulation, define boundary conditions, or post-process data. The user functions offered in Palabos for all these operations are listed in the appendix of the user guide. It should also be mentioned that data processors can act simultaneously on several blocks, and in this way create a coupling between two block-lattices, or between a block-lattice and a scalar-field or tensor-field. This behavior is for example used to compute the velocity in a block-lattice and store the result in a three-component tensor-field, or to create the coupling between two lattices for the implementation of multi-component fluids.
In order to obtain a better understanding of how such a data processor works, the call to the function defineDynamics in the previous tutorial is now replaced successively by two alternative, explicit ways of attributing a BounceBack dynamics to the chosen wall cells. All discussed code constructs can also be found in the file tutorial_2_3.cpp.
The first approach is easiest to understand, because it is entirely manual. A loop over all space directions is written manually and, with help of the function class BounceBackNodes from the previous tutorial, it is decided to re-define the dynamics of chosen nodes through the interface of the multi-block:
/// Manual instantiation of the bounce-back nodes for the boundary.
/** Attention: This is NOT the right way of doing, because it is slow and
* non-parallelizable. Instead, use a data-processor.
*/
void createHalfCircleManually (
MultiBlockLattice2D<T,DESCRIPTOR>& lattice, plint N, plint radius )
{
BounceBackNodes<T> domain(N,radius);
for (plint iX=0; iX<=N; ++iX) {
for (plint iY=0; iY<=N/2; ++iY) {
if (domain(iX,iY)) {
defineDynamics(lattice, iX, iY, new BounceBack<T,DESCRIPTOR>);
}
}
}
}
This approach is really self-explaining. It is however crucial to understand that such an approach should never be chosen in practice, for efficiency considerations. This approach is slow, because for each access to a cell through the function defineDynamics, the atomic-block on which the cell is located must first be determined. Even worse yet, this way of accessing a multi-block is not properly parallelized (the result of the operation is correct in parallel, but the speed of the operation is equal or even inferior to the serial execution speed). As it is shown in the next tutorial, parallelism is implemented in Palabos by assigning each atomic-block inside a multi-block to a given processor (CPU). During operations that are executed on an extended domain, such as, the definition of the collision rule to a given part of the simulation, it is expected that each processor works only with the atomic-block assigned to it. Writing out explicitly a loop over space directions however means to assign work for the whole domain to all processors. This can impact the parallel performance of a program significantly, even if it is only part of the setup of a simulation. In programs which are massively parallelized on hundreds or thousands of processors/cores, a badly parallelized initial stage can represent a bottleneck for the whole simulation. One of the most important rules in Palabos is therefore to never write loops over space dimensions in end-user applications. Instead, space loops are only written inside data processors, as shown below.
Here’s how you can manually write a data processor which creates the half-circle domain, without resorting to the helper function defineDynamics (the function defineDynamics is of course nothing else than a wrapper which instantiates a data processor for you):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | /// This functional defines a data processor for the instantiation
/// of bounce-back nodes following the half-circle geometry.
template<typename T, template<typename U> class Descriptor>
class HalfCircleInstantiateFunctional
: public BoxProcessingFunctional2D_L<T,Descriptor>
{
public:
HalfCircleInstantiateFunctional(plint N_, plint radius_)
: N(N_), radius(radius_)
{ }
virtual void process(Box2D domain, BlockLattice2D<T,Descriptor>& lattice) {
BounceBackNodes<T> bbDomain(N,radius);
Dot2D relativeOffset = lattice.getLocation();
for (plint iX=domain.x0; iX<=domain.x1; ++iX) {
for (plint iY=domain.y0; iY<=domain.y1; ++iY) {
if (bbDomain(iX+relativeOffset.x,iY+relativeOffset.y)) {
lattice.attributeDynamics (
iX,iY, new BounceBack<T,DESCRIPTOR> );
}
}
}
}
virtual void getTypeOfModification(std::vector<modif::ModifT>& modified) const {
modified[0] = modif::dynamicVariables;
}
virtual HalfCircleInstantiateFunctional<T,Descriptor>* clone() const
{
return new HalfCircleInstantiateFunctional<T,Descriptor>(*this);
}
private:
plint N;
plint radius;
};
/// Automatic instantiation of the bounce-back nodes for the boundary,
/// using a data processor.
void createHalfCircleFromDataProcessor (
MultiBlockLattice2D<T,DESCRIPTOR>& lattice, plint N, plint radius )
{
applyProcessingFunctional (
new HalfCircleInstantiateFunctional<T,DESCRIPTOR>(N,radius),
lattice.getBoundingBox(), lattice );
}
|
As previously pointed out, you will practically never want to compute the internal structure of a sparse multi-block manually, and instead use one of Palabos’ automatic domain generators. For example, the Palabos application in the directory examples/showCases/aneurysm shows how you can automatically read a geometry description from an STL file, “voxelize” the domain (decide which cells are part of the fluid), and then generate the sparse structue. By the way, Palabos’ automatic domain generation algorithms go even a step further than what is done in the present tutorial, as they also instantiate sophisticated algorithms for curved walls, in which the wall location is represented with second-order accuracy (while in this tutorial the wall is represented through a stair-case shape).
Anyway, let’s get back to tutorial 2.2, and modify the code in such a way that the sparse block structure is generated automatically by Palabos:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | ...
template<typename T>
class FluidNodes {
public:
FluidNodes(plint N_, plint radius_) : N(N_), radius(radius_)
{ }
bool operator() (plint iX, plint iY) const {
return ! BounceBackNodes<T>(N,radius)(iX,iY);
}
private:
plint N, radius;
};
...
MultiScalarField2D<int> flagMatrix(N+1, N/2+1);
setToFunction(flagMatrix, flagMatrix.getBoundingBox(), FluidNodes<T>(N, radius));
plint blockSize = 15;
plint envelopeWidth = 1;
MultiBlockManagement2D sparseBlockManagement =
computeSparseManagement (
*plb::reparallelize(flagMatrix, blockSize,blockSize),
envelopeWidth );
// Instantiate the multi-block, based on the created block distribution and
// on default parameters.
MultiBlockLattice2D<T, DESCRIPTOR> lattice (
sparseBlockManagement,
defaultMultiBlockPolicy2D().getBlockCommunicator(),
defaultMultiBlockPolicy2D().getCombinedStatistics(),
defaultMultiBlockPolicy2D().getMultiCellAccess<T,DESCRIPTOR>(),
new BGKdynamics<T,DESCRIPTOR>(omega)
);
|
If you use multi-blocks as your basic data type, if you construct them in a standard way, as in all examples of Tutorial 1, and if you compile the program with the MPI_PARALLEL flag, then the program is automatically parallelized. This means that the multi-block is automatically subdivided into components which are associated to the different processors.
In the previous tutorials, the multi-block structure was created manually, which means that parallelism must be handled manually too. Let’s take for example the code from Tutorial 2.2: Creating a multi-block structure manually, edit the Makefile, set MPI_PARALLEL=true, and recompile. Because the domain is manually created with three blocks in this example, it can only be parallelized by means of exactly three processes: mpirun -np 3 ./tutorial_2_2. The algorithm which associates blocks to MPI processes is provided as an argument to the constructor of the multi-block, in the listing of Tutorial 2.2: Creating a multi-block structure manually above, at line 101, an argument defaultMultiBlockPolicy2D().getThreadAttribution(), which imposes a default parallelization strategy: the block with ID 0 go to the MPI thread with ID 0, block #1 goes to MPI thread #1, and so on. The block-to-thread attribution can however also be directed manually, by creating an object of type ExplicitThreadAttribution, configuring it, and providing it to the constructor of the multi-block (see file multiBlock/threadAttribution.h for a definition of the class ExplicitThreadAttribution).
We leave it as an exercice to modify the code of tutorial 2.2 in such a way that it can be executed on exactly two MPI processes, by assigning two blocks to the first process, and the third block to the second process.