Nvidia CUDA QCD project
発表
-
2010/06/18(Talk) : Yusuke Osaki* and Ken-Ichi Ishikawa, "Domain decomposition method on GPU cluster for Lattice", The XXVIII International Symposium on Lattice Field Theory,
Villasimius, Cagliari, Italy
[発表資料]
[Proceeding,arXiv:1011.3318]
-
2010/06/15(Poster) : M.Hayakawa, K.-I.Ishikawa*, Y.Osaki, S.Takeda, S.Uno, N.Yamada, "Improving many flavor QCD simulations using multiple GPU's", The XXVIII International Symposium on Lattice Field Theory, Villasimius, Cagliari, Italy
[発表資料]
[Proceeding,arXiv:1009.5169]
-
2010/05/27 : Ken-Ichi Ishikawa, "Accelerating lattice QCD simulations using multiple GPUs", Multi-core and GPU computing Workshop, KIAS, Korea, 2010 May 27-28.
[発表資料]
-
2010/03/20 : 石川健一, 他5名, 「多フレーバー格子ゲージ理論計算におけるシュレーディンガー汎関数法のGPUを用いた加速について」, 日本物理学会第65回年次大会, 岡山大学
[発表資料]
-
2010/03/20 : 尾崎祐介, 石川健一, 「GPUクラスタによる格子QCD計算」, 日本物理学会第65回年次大会, 岡山大学
[発表資料]
-
2009/03/28 : 尾崎祐介, 「GPUを用いたlattice計算」, 日本物理学会第64回年次大会, 立教大学
[発表資料]
Nvidia CUDA QCD programs (ABSOLUTELY NO WARRANTY!)
Versions with HMC algorithm (K.-I. Ishikawa and Y. Ozaki)
-
2009/03/30: HMC algorithm with CUDA mixed precision solver.
[program]
-
Even-odd site preconditioned. Written in FORTRAN90. Intel compiler is recommended.
Dirac representation for Dirac-Gamma matrices (gamma_4 is diagonal).
Periodic boundary condition.
-
CUDA solver does not work in parallel environment (MPI). Use it in single process.
-
Due to the limitation of tha reduction algorithm (for inner-product),
the lattice size is limited to 2^x (power of 2) lattices.
-
config.h : lattice size parameters, make.inc : compiler settings.
-
HMCLIB/bicgstab_hmc_mpgpu.F : double precision outer solver which calls cuda single prec solver. (written in FORTRAN90)
-
GPUSolverLIB/ : CUDA single precision BiCGStabl solver directory. (wrriten in CUDA an extention to C/C++ languages for GPGPU)
-
GPUSolverLIB/Makefile : CUDA compiler settings.
-
GPUSolverLIB/gpu_config.h : CUDA threads, blocks parameter settings (sub-lattice mapping).
-
GPUSolverLIB/s_bicgstab_hmc_gpu.cu : cuda bicgstab code
-
GPUSolverLIB/cumult_wd.h : cuda hopping matrix multiplication cuda code (odd->even hopping)
-
GPUSolverLIB/cuWqfAlgebra.h : cuda wilson fermion field linear algebra codes.
-
GPUSolverLIB/conv.h, GPUSolverLIB/conv.c : data format converters for link, fermion and clover term fields
[double precision CPU data format -> single precision GPU data format].
Solver Benchmark (CPU vs CPU+GPU) (K.-I. Ishikawa and Y. Ozaki)
-
[SolverBench-v0.0.tar.gz](2009/4/2)
Even-odd site preconditioned solver for clover fermion,
BiCGStab benchmark (BiCGStab[CPU] vs Nested BiCGStab[CPU+GPU]) program.
Gauge links are generated randomly in the program.
OLD versions (single prec. part only)
-
[CUDA_QCDmult_0.07.SH.tar.gz]
Lattice QCD (clover+)hopping matrix kernel for Nvidia CUDA (A test version,
Shared memory used for y (fermion vec) 2008/01/20)
-
[CUDA_QCDmult_0.08.SH.tar.gz]
Lattice QCD (clover+)hopping matrix kernel for Nvidia CUDA (A test version,
Texture fetching used for U (link), 2008/03/03)
-
[CudaQCDSolver_0.02.tar.gz]
Lattice QCD Wilson/Clover-Dirac quark solver for Nvidia CUDA (A test version, 2008/01/28),
Dirichlet boundary condition, no-preconditioning, BiCGStab.
Lattice size should be 2^x.
-
[CudaQCDSolver_0.06.tar.gz]
Lattice QCD Wilson/Clover-Dirac quark solver for Nvidia CUDA (A test version, 2008/04/09),
Dirichlet boundary condition, block domain decomposition preconditioning, BiCGStab,
One physical domain = One thread block. Texture fetching for U, Shareed memory for y.
Lattice size should be 2^x.
"00TEST.log" was produced with {NTX_=8,NTY_=8,NTZ_=8,NTT_=8,kappa=0.1500} using the following single precision lattice config.
-
Sample link/clover term fields data,(16^3x32 lattice) :
-