A three-level parallelisation scheme and application to the Nelder-Mead algorithm
Abstract
We consider a three-level parallelisation scheme. The second and third levels define a classical two-level parallelisation scheme and some load balancing algorithm is used to distribute tasks among processes. It is well-known that for many applications the efficiency of parallel algorithms of these two levels starts to drop down after some critical parallelisation degree is reached. This weakness of the twolevel template is addressed by introduction of one additional parallelisation level. As an alternative to the basic solver some new or modified algorithms are considered on this level. The idea of the proposed methodology is to increase the parallelisation degree by using possibly less efficient algorithms in comparison with the basic solver. As an example we investigate two modified Nelder-Mead methods. For the selected application, a Schro¨dinger equation is solved numerically on the second level, and on the third level the parallel Wang’s algorithm is used to solve systems of linear equations with tridiagonal matrices. A greedy workload balancing heuristic is proposed, which is oriented to the case of a large number of available processors. The complexity estimates of the computational tasks are model-based, i.e. they use empirical computational data.
Keyword : multi-level parallelisation, load balancing and task assignment, parallel optimisation, Nelder-Mead algorithm, Wang's algorithm, model-based parallelisation, finite difference methods, Schrödinger equation
This work is licensed under a Creative Commons Attribution 4.0 International License.
References
J. Bilmes, K. Asanovic, Ch.-W. Chin and J. Demmel. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In ACM International Conference on Supercomputing 25th Anniversary Volume, pp. 253–260, New York, NY, USA, 1997. Association for Computing Machinery. ISBN 9781450328401. https://doi.org/10.1145/2591635.2667174
A. Bugajev, R. Čiegis, R. Kriauzienė, T. Leonavičienė and J. Žilinskas. On the accuracy of some absorbing boundary conditions for the Schrödinger equation. Mathematical Modelling and Analysis, 22(3):408–423, 2017. ISSN 1392-6292. https://doi.org/10.3846/13926292.2017.1306725
J. Carretero, S. Distefano, D. Petcu, D. Pop, Th. Rauber, G. Rünger and D.E. Singh. Energy-efficient algorithms for ultrascale systems. Supercomputing frontiers and innovations, 2(2):77–104, 2015.
R. Ciegis and M. Baravykaite. Implementation of a black-box global optimization algorithm with a parallel branch and bound template. Applied Parallel Computing: State of the Art in Scientific Computing, 4699:1115–1125, 2007. ISSN 0302-9743. https://doi.org/10.1007/978-3-540-75755-9 129
R. Čiegis and G. Šilko. A scheme for partitioning regular graphs. In R. Wyrzykowski, E. Deelman, J. Dongarra, K. Karczewski, J. Kitowski and K. Wiatr (Eds.), Proc. 4th International Conference on Parallel Processing and Applied Mathematics (PPAM2001, Naleczsow, Poland, September9-12, 2001), volume 2328 of Lecture Notes in Computer Science, pp. 404–409, Berlin, Germany, 2002. Springer.
A. Datta, A. Kaur, T. Lauer and S. Chabbouh. Exploiting multi–core and many– core parallelism for subspace clustering. International Journal of Applied Mathematics and Computer Science, 29(1):81–91, 2019. https://doi.org/10.2478/amcs2019-0006
I. Fajfar, Á. Bürmen and J. Puhan. The Nelder–Mead simplex algorithm with perturbed centroid for high-dimensional function optimization. Optimization Letters, 13(5):1011–1025, 2019. https://doi.org/10.1007/s11590-018-1306-2
D.E. Finkel. DIRECT optimization algorithm user guide. Center for Research in Scientific Computation, North Carolina State University, 2:1–14, 2003.
M. Frigo and S.G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216–231, 2005. https://doi.org/10.1109/JPROC.2004.840301
M. Furuichi and D. Nishiura. Iterative load-balancing method with multigrid level relaxation for particle simulation with short-range interactions. Computer Physics Communications, 219:135–148, 2017. ISSN 0010-4655. https://doi.org/10.1016/j.cpc.2017.05.015
F. Glover. Tabu search–part I. ORSA Journal on Computing, 1(3):190–206, 1989. https://doi.org/10.1287/ijoc.1.3.190
F. Glover. Tabu search–part II. ORSA Journal on Computing, 2(1):4–32, 1990. https://doi.org/10.1287/ijoc.2.1.4
I. Huismann, J. Stiller and J. Frohlich. Two-level parallelization of a fluid mechanics algorithm exploiting hardware heterogeneity. Computers & Fluids, 117:114–124, 2015. ISSN 0045-7930. https://doi.org/10.1016/j.compfluid.2015.05.012
E.-J. Im and K. Yelick. Optimizing sparse matrix computations for register reuse in SPARSITY. In V.N. Alexandrov, J.J. Dongarra, B.A. Juliano, R.S. Renner and C.J.K. Tan(Eds.), Computational Science – ICCS 2001, volume 2073, pp. 127–136, Berlin, Heidelberg, 2001. Springer Berlin Heidelberg.
https://doi.org/10.1007/3-540-45545-0_22
S. Imam and V. Sarkar. Load balancing prioritized tasks via work-stealing. Euro-Par 2015: Parallel Processing, 9233:222–234, 2015. ISSN 0302-9743. https://doi.org/10.1007/978-3-662-48096-0_18
C.T. Kelley. Iterative methods for optimization. Society for Industrial and Applied Mathematics, Philadelphia, 1999. https://doi.org/10.1137/1.9781611970920
S. Kirkpatrick, C.D. Gelatt and M.P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671–680, 1983. https://doi.org/10.1126/science.220.4598.671
K. Klein and J. Neira. Nelder-Mead simplex optimization routine for large-scale problems: A distributed memory implementation. Computational Economics, 43(4):447–461, Apr 2014. ISSN 1572-9974. https://doi.org/10.1007/s10614-013-9377-8
A. Lastovetsky and R.R. Manumachu. New model-based methods and algorithms for performance and energy optimization of data parallel applications on homogeneous multicore clusters. IEEE Transactions on Parallel and Distributed Systems, 28(4):1119–1133, 2017. ISSN 1045-9219. https://doi.org/10.1109/tpds.2016.2608824
A. Lastovetsky, L. Szustak and R. Wyrzykowski. Model-based optimization of EULAG kernel on Intel Xeon Phi through load imbalancing. IEEE Transactions on Parallel and Distributed Systems, 28(3):787–797, 2017. ISSN 1045-9219. https://doi.org/10.1109/tpds.2016.2599527
D. Lee and M. Wiswall. A parallel implementation of the simplex function minimization routine. Computational Economics, 30(2):171–187, 2007. https://doi.org/10.1007/s10614-007-9094-2
A. Llanes, J.M. Cecilia, A. Sanchez, J.M. Garcia, M. Amos and M. Ujaldon. Dynamic load balancing on heterogeneous clusters for parallel ant colony optimization. Cluster Computing-the Journal of Networks Software Tools and Applications, 19(1):1–11, 2016. ISSN 1386-7857. https://doi.org/10.1007/s10586-016-0534-4
K.I.M. McKinnon. Convergence of the Nelder–Mead simplex method to a nonstationary point. SIAM Journal on Optimization, 9(1):148–158, 1998. https://doi.org/10.1137/S1052623496303482
J.A. Nelder and R. Mead. A simplex method for function minimization. The computer journal, 7(4):308–313, 1965. https://doi.org/10.1093/comjnl/7.4.308
B.S. Nemalikanti, P. Sindhura, P.K. Tumalla and S. Vemuru. Achieving green computing through algorithmic efficiency. i-Manager’s Journal on Information Technology, 1(1):39, 2011.
B. Perez, E. Stafford, J. Bosque and R. Beivide. Energy efficiency of load balancing for data-parallel applications in heterogeneous systems. Journal of Supercomputing, 73(1):330–342, 2017. ISSN 0920-8542. https://doi.org/10.1007/s11227-016-1864-y
M. Radziunas, R. Čiegis and A. Mirinavičius. On compact high order finite difference schemes for linear Schrödinger problem on non-uniform meshes. International Journal of Numerical Analysis and Modelling, 11(2):303–314, 2014.
J.A. Rico-Gallego and J.C. Diaz-Martin. τ-Lop: Modeling performance of shared memory MPI. Parallel Computing, 46:14–31, 2015. ISSN 0167-8191. https://doi.org/10.1016/j.parco.2015.02.006
J.A. Rico-Gallego, A.L. Lastovetsky and J.C. Diaz-Martin. Modelbased estimation of the communication cost of hybrid data-parallel applications on heterogeneous clusters. Ieee Transactions on Parallel and Distributed Systems, 28(11):3215–3228, 2017. ISSN 1045-9219. https://doi.org/10.1109/tpds.2017.2715809
R.D. Righi, R.D. Gomes, V.F. Rodrigues, C.A. da Costa, A.M. Alberti, L.L. Pilla and P.O.A. Navaux. MigPF: Towards on self-organizing process rescheduling of bulk-synchronous parallel applications. Future Generation Computer Systems-the International Journal of Escience, 78:272–286, 2018. ISSN 0167739X. https://doi.org/10.1016/j.future.2016.05.004
B. Saha. Green computing: Current research trends. International Journal of Computer Sciences and Engineering, 6, 05, 2018. https://doi.org/10.26438/ijcse/v6i3.467469
A. Sharma and M. Kaur. An efficient task scheduling of multiprocessor using genetic algorithm based on task height. Journal of Information Technology & Software Engineering, 5(2):1000151, 2015. https://doi.org/10.4172/2165-7866.1000151
R. Singh. Task scheduling in parallel systems using genetic algorithm. International Journal of Computer Applications, 108(16):34–40, 2014. https://doi.org/10.5120/18999-0470
W. Spendley, G.R. Hext and F.R. Himsworth. Sequential application of simplex designs in optimisation and evolutionary operation. Technometrics, 4(4):441– 461, 1962. https://doi.org/10.1080/00401706.1962.10490033
L. Stripinis, R. Paulavičius and J. Žilinskas. Improved scheme for selection of potentially optimal hyper-rectangles in DIRECT. Optimization Letters, 12(7):1699–1712, Oct 2018. ISSN 1862-4480. https://doi.org/10.1007/s11590-017-1228-4
J. Szeftel. Design of absorbing boundary conditions for Schrödinger equations in Rd. SIAM Journal on Numerical Analysis, 42:1527–1551, 2004. https://doi.org/10.1137/S0036142902418345
R. Vuduc, J.W. Demmel and K.A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series, 16:521–530, 2005. https://doi.org/10.1088/1742-6596/16/1/071
H.H. Wang. A parallel method for tridiagonal equations. ACM Transactions on Mathematical Software (TOMS), 7(2):170–183, 1981.
Th. Weise. Global optimization algorithms. Theory and application. Self-Published Thomas Weise, 2009. Available from Internet: http://www.it-weise.de
R.C. Whaley and J.J. Dongarra. Automatically tuned linear algebra software. In Supercomputing, 1998. SC98. IEEE/ACM Conference on, pp. 38–38. IEEE, 1998.
A. Zlotnik and I. Zlotnik. Remarks on discrete and semi-discrete transparent boundary conditions for solving the time-dependent Schro¨dinger equation on the half-axis. Russian Journal of Numerical Analysis and Mathematical Modelling, 31(1):51–64, 2016. https://doi.org/10.1515/rnam-2016-0005