Azzam Haidar
Azzam Haidar
Department of Electrical Engineering and Computer Science, University of Tennessee
Bestätigte E-Mail-Adresse bei - Startseite
Zitiert von
Zitiert von
Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers
A Haidar, S Tomov, J Dongarra, NJ Higham
SC18: International Conference for High Performance Computing, Networking …, 2018
Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA
G Bosilca, A Bouteiller, A Danalis, M Faverge, A Haidar, T Herault, ...
2011 IEEE International Symposium on Parallel and Distributed Processing …, 2011
Performance, design, and autotuning of batched GEMM for GPUs
A Abdelfattah, A Haidar, S Tomov, J Dongarra
High Performance Computing: 31st International Conference, ISC High …, 2016
Accelerating numerical dense linear algebra calculations with GPUs
J Dongarra, M Gates, A Haidar, J Kurzak, P Luszczek, S Tomov, ...
Numerical computations with GPUs, 3-28, 2014
Seismic wave modeling for seismic imaging
J Virieux, S Operto, H Ben-Hadj-Ali, R Brossier, V Etienne, F Sourbier, ...
The Leading Edge 28 (5), 538-544, 2009
The singular value decomposition: Anatomy of optimizing an algorithm for extreme scale
J Dongarra, M Gates, A Haidar, J Kurzak, P Luszczek, S Tomov, ...
SIAM review 60 (4), 808-865, 2018
Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels
A Haidar, H Ltaief, J Dongarra
Proceedings of 2011 International Conference for High Performance Computing …, 2011
Investigating half precision arithmetic to accelerate dense linear system solvers
A Haidar, P Wu, S Tomov, J Dongarra
Proceedings of the 8th workshop on latest advances in scalable algorithms …, 2017
RETRACTED: Batched matrix computations on hardware accelerators based on GPUs
A Haidar, T Dong, P Luszczek, S Tomov, J Dongarra
The International Journal of High Performance Computing Applications 29 (2 …, 2015
High-performance tensor contractions for GPUs
A Abdelfattah, M Baboulin, V Dobrev, J Dongarra, C Earl, J Falcou, ...
Procedia Computer Science 80, 108-118, 2016
High-performance matrix-matrix multiplications of very small matrices
I Masliah, A Abdelfattah, A Haidar, S Tomov, M Baboulin, J Falcou, ...
Euro-Par 2016: Parallel Processing: 22nd International Conference on …, 2016
LU factorization of small matrices: Accelerating batched DGETRF on the GPU
T Dong, A Haidar, P Luszczek, JA Harris, S Tomov, J Dongarra
2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 …, 2014
PLASMA: Parallel linear algebra software for multicore using OpenMP
J Dongarra, M Gates, A Haidar, J Kurzak, P Luszczek, P Wu, I Yamazaki, ...
ACM Transactions on Mathematical Software (TOMS) 45 (2), 1-35, 2019
Parallel programming models for dense linear algebra on heterogeneous systems
J Dongarra, M Abalenkovs, A Abdelfattah, M Gates, A Haidar, J Kurzak, ...
Supercomputing frontiers and innovations 2 (4), 67-86, 2015
heFFTe: Highly Efficient FFT for Exascale
A Ayala, S Tomov, A Haidar, J Dongarra
International Conference on Computational Science, 262-275, 2020
Unified development for mixed multi-gpu and multi-coprocessor environments using a lightweight runtime environment
A Haidar, C Cao, A Yarkhan, P Luszczek, S Tomov, K Kabir, J Dongarra
2014 IEEE 28th International Parallel and Distributed Processing Symposium …, 2014
Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems
A Haidar, H Bayraktar, S Tomov, J Dongarra, NJ Higham
Proceedings of the Royal Society A 476 (2243), 20200110, 2020
A framework for batched and GPU-resident factorization algorithms applied to block householder transformations
A Haidar, TT Dong, S Tomov, P Luszczek, J Dongarra
High Performance Computing: 30th International Conference, ISC High …, 2015
Investigating power capping toward energy‐efficient scientific applications
A Haidar, H Jagode, P Vaccaro, A YarKhan, S Tomov, J Dongarra
Concurrency and Computation: Practice and Experience 31 (6), e4485, 2019
An improved parallel singular value algorithm and its implementation for multicore hardware
A Haidar, J Kurzak, P Luszczek
Proceedings of the International Conference on High Performance Computing …, 2013
Das System kann den Vorgang jetzt nicht ausführen. Versuchen Sie es später erneut.
Artikel 1–20