First International Workshop on Deepening Performance Models for Automatic Tuning (DPMAT)

September 7th (Wed), 2016
, 13:00-17:40

Room IB-014, IB Building (Integrated Building)

Higashiyama Campus, Nagoya University

Main Sponsorship:
Japan Society of Promotion Science (JSPS) Bilateral Joint Research Projects (Open Partnership), “Deepening Performance Models for Automatic Tuning with International Collaboration”

Grant-in-Aid for challenging Exploratory Research, “Development of Technologies of High Performance Computing for Accuracy Assurance”

Grant-in-Aid for Scientific Research (B), “A Novel Development of Auto-tuning Technologies for Communication Avoiding and Reducing Algorithms”

Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures" and "High Performance Computing Infrastructure" in Japan (JHPCN), “High-performance Randomized Matrix Computations for Big Data Analytics and Applications”

Aim of the workshop:
Advanced computer architectures toward to exascale computing are getting complex, such as many-core processors, deep memory hierarchy, and heterogeneity of components.Development of high performance software is also being costly. One of candidates to solve this “complex” situation is a technology for automatic performance tuning (AT).

In this first workshop, we discuss emergent problems and state-of-the-art technologies for AT with researchers between Japan and Taiwan. The key topics of the workshop are summarized as follows:

Invited Speakers:

Prof. Weichung Wang (National Taiwan University, Taiwan)
Prof. Feng-Nan Hwang (National Central University, Taiwan)

Domestic Speakers:
Prof. Takahiro Katagiri (Nagoya University, Japan)
Prof. Kengo Nakajima (The University of Tokyo, Japan)
Prof. Satoshi Ohshima (The University of Tokyo, Japan)
Prof. Reiji Suda  (The University of Tokyo, Japan)
Prof. Akihiro Fujii (Kogakuin University, Japan)


* 11:00 - 12:45 A Lunch Meeting
 @Seminar room, 5F ITC

* 13:00 - 13:10 Opening Talk
Takahiro Katagiri (Information Technology Center, Nagoya University, Japan)

* Invited Talk (1)  13:10 - 13:55
(Chair: Takahiro Katagiri)


A Development of Integrated Singular Value Decomposition for Large Matrices

Weichung Wang
(Institute of Applied Mathematical Sciences, National Taiwan University, Taiwan)


The latest CPU and GPU equip advanced features, and more and more (co-)processors are connected to form large parallel systems. On the other hand, a fast growing size of data sets is collected, censored or computed ubiquitously. To fully take advantages of these modern and powerful computer systems to perform large-scale data analysis or scientific computations, we need to design novel algorithms that explore and utilize the properties of the target problems, the matrix structures, and the hardware architectures. Focusing on solving numerical linear algebra problems arising on large-scale data analysis and scientific computing problems, we consider Monte Carlo algorithms based on multiple random sketches and integration process. Such approaches have a potential to be broadly applied to various problems such as singular value decomposition, eigenvalue problems, nonnegative matrix factorization, tensor decomposition, and others. In this talk, we take integrated singular value decomposition (iSVD) as an example by introducing the iSVD algorithm and the theoretical and statistical properties. We will also sketch the plan to port iSVD to parallel computers and the possible needs of auto-tuning. More importantly, we look forward to developing interdisciplinary collaborations to achieve the goals and make impacts on real-world applications

* 13:55 – 14:00 Break

(Chair: Kengo Nakajima)
* 14:00-14:30
Auto-tuning Towards to Post Moore’s Era: Adapting a new concept from FLOPS to BYTES

Takahiro Katagiri
(Information Technology Center, Nagoya University, Japan)

It is said that Moore’s low will be broken within next two decades due to physical performance limit of semiconductors. On the other hand, some scientists mention that ability of data access (BYTES) increases relatively to ability of computations (FLOPS) by adapting upcoming technologies, such as 3D stacking technologies for memory.

In this presentation, we focus on this statement, and we discuss a change of numerical algorithms for matrix computations. We also mention that auto-tuning (AT) will be still one of key technologies to establish high performance in era of post Moore. State-of-the-art technologies of AT are shown to enumerate issues of current AT toward to the era of Post Moore. Some evidences to show the statement will be shown by utilizing current computers with a 3D stacking memory.

* 14:30 – 15:00
Fast d-Spline parameter estimation for multi-dimensional parameter space

Teruo Tanaka, Masayoshi Mochizuki, Akihiro Fujii
(Kogakuin University, Japan)

We have proposed an efficient parameter estimation method, Incremental Performance Parameter Estimation (IPPE), based on discretized spline function (d-Spline). d-Spline is highly adaptable and requires little estimation time. In this study, IPPE is extended to simultaneous estimation of multi-dimensional parameters. When the dimension of the parameters increases, the order of computational cost of d-Spline increases. We propose a new method to reduce the computational cost significantly by defining the relationship of multi-dimensional parameters using linear equations. In our method, the estimation of multi-dimensional parameters is performed by using one dimensional d-Spline repeatedly.

* 15:00-15:30
Auto-tuning of "Modified/generated code by optimization and auto-tuning" problem

Satoshi Ohshima
(Information Technology Center, The University of Tokyo, Japan)

There are several tuning techniques which modify and/or generate code such as loop unrolling, loop fusion, and loop collapse.
Many compilers support these techniques today. Because these optimization techniques can't cause performance improvement at all times, applying them are the AT issues. Moreover, the structure of the code written by users and the code optimized by compilers will be different.

In this talk, some AT issues caused by these optimization techniques are shown and discussed.

* 15:30 - 15:40 Break (Taking a Group Photo)

* Invited Talk (2) 15:40 -16:25
(Chair: Takahiro Katagiri)

Parallel performance study of Newton-Krylov-Schwarz algorithm with applications in incompressible fluid flows and colloidal particle interaction

Feng-Nan Hwang
(Department of Mathematics, National Central University, Taiwan)

We will report the preliminary parallel performance studies for both of parallel unstructured mesh finite element incompressible fluid solver and parallel Poisson-Boltzmann solver developed by our research group. The test platforms include the supercomputer, ALPS at the National Center for High-performance Computing in Taiwan and the one, FX10 at the University of Tokyo, Japan. Our codes both are implemented based on fully parallel Newton-Krylov-Schwarz (NKS) algorithm. One-level method is nonlinearly scalable, i.e., the number of Newton iterations is independent of the number of processors (np). But NKS is not linearly scalable, i.e., the number of the Krylov subspace iterations grows as np increases. To increase the global communications among subdomains in the Schwarz method, a coarse grid space is needed. In our implementation, we employ one-level Schwarz type preconditioner in conjunction with a smoothed aggregation coarsen scheme. Several benchmark problems such as lid-driven cavity problem, the fluid flow passing an object, and the colloid particle interaction problem are tested.

* 16:25 – 16:30 Break

(Chair: Akihiro Fujii)
* 16:30-17:00
Diamond Tiling Extended to General Sparse Matrix Powers Kernel

Reiji Suda
(Graduate School of Information Science and Technology, The University of Tokyo, Japan)

In stencil computing, various temporal blocking techniques for improving computational intensity are developed and known. But most of them depends on the coordinates of grid points, and have not been extended to general sparse Matrix Powers Kernel (MPK). In this talk we propose an extension of "diamond tiling" to general sparse matrix. Data with small test matrices are shown.

* 17:00 – 17:30

Parallel Iterative Solvers with Preconditioning in the Post-Moore Era

Kengo Nakajima
(Information Technology Center, The University of Tokyo, Japan)

Preconditioned iterative solvers are widely used for solving linear equations with sparse matrices derived from various types of scientific and engineering applications. In this talk, we introduce recent developments in this area, such as pipelined algorithms, and loop scheduling. Moreover, results using Oakleaf-FX (Fujitsu PRIMEHPC FX10) with up to 4,800 nodes (76,800 cores) and Reedbush-U (Intel Broadwell
cluster) with up to 384 nodes (12,288 cores) are presented.

* 17:30 – 17:40 Closing Remarks
Takahiro Katagiri (Information Technology Center, Nagoya University, Japan)

* 18:00 - Reception


* Reception:
We are planning a reception with speakers in Nagoya after finishing the workshop.

If you are interested in joining the reception, please contact to Prof. Katagiri. His e-mail address is “katagiri _AT_”, where “_AT_” needs to replace with “@”.

Reception fee is around 5000 Yen/person, and reception place, however that is still under planning, is inside or near Nagoya University.


The number of attenders: 16 (including 3 foreign nationalities)


[Latest updates] 8th Sep.,2016