Parallelism

In this benchmark we are interested in the parallel efficiency of Palabos for the pre-processing (reading of an input file, voxellization, and setting up of the domain), and the simulation of fluid flow in complex geometries. To this aim we used a real giant aneurysm geometry which is depicted in the first figure. Using a mono-block approach in the kind of situations where the majority of the domain conatins no fluid cells would be a killer. The multi-block approach allows for a more efficient memory management.

In this figure, the multi-block structure used is also depicted. One can see that blocks of size of about 253 are covering only the fluid cells regions, dramatically limiting the computational overhead. For example, in the cases presented below the size of the virtual box covering the complete domain is 1825x1921x1494 (5.2 x109 bounding-box cells). By covering it with 65'336 blocks of size of approximatively 253, the number of allocated cells drops to 109, and one has a gain of a factor five in terms of memory use. The number of fluid nodes being equal to about 0.9 x109 cells, the overhead is of a bit more than 10%.

 giant aneu blocks

The giant aneurysm geometry from two different angles (left column) and the multi-block structure for each view (right column).
The grey scale represents the force norm applied from the fluid on the wall, from white (low) to black (high).


The two main components of the simulation have been tested separately. First, the efficiency of the grid generation from the triangular mesh, and then the efficiency of the fluid flow simulation. This test has been performed on the CADMOS project, Blue-Gene/P computer on a number of cores ranging from 128 to 1282 =16'384. In order to have a grid that still fits on 128 cores while being large enough to be parallelizable on 16'384 cores, we chose the geometry described above which contains roughly 109 allocated cells. Results are depicted in the following figure:


speedup

Speedup of the fluid simulation between 128 and 16'384 processors on a Blue-Gene/P in log-log scale.

The pre-processing stage (everything from program start to execution of the first iteration, from reading the STL file to voxelizing and setting up the domain) shows an acceptable scalability (as is seen in the last figure). For this benchmark, pre-processing takes 65 minutes on 128 cores, and 4.5 minutes on 16'384 cores. Although in this figure the speedup is of less than 10%, the complex and iterative nature of the voxelization algorithm induces a relatively high penalty. Nevertheless, the performance does not saturate and keeps increasing with the number of processors. It must also be pointed out than on typical applications this process takes very little time, when compared with the time taken by the actual simulation.

speedup-geom

Speedup of the mesh generation between 128 and 16'384 processors in log-log scale.

 

Copyright © 2011-2012 FlowKit Ltd.

original joomla template