Ken-Ichi Ishikawa's web page

Nvidia CUDA QCD project

2010/06/18(Talk) : Yusuke Osaki* and Ken-Ichi Ishikawa, "Domain decomposition method on GPU cluster for Lattice", The XXVIII International Symposium on Lattice Field Theory, Villasimius, Cagliari, Italy [発表資料] [Proceeding,arXiv:1011.3318]
2010/06/15(Poster) : M.Hayakawa, K.-I.Ishikawa*, Y.Osaki, S.Takeda, S.Uno, N.Yamada, "Improving many flavor QCD simulations using multiple GPU's", The XXVIII International Symposium on Lattice Field Theory, Villasimius, Cagliari, Italy [発表資料] [Proceeding,arXiv:1009.5169]
2010/05/27 : Ken-Ichi Ishikawa, "Accelerating lattice QCD simulations using multiple GPUs", Multi-core and GPU computing Workshop, KIAS, Korea, 2010 May 27-28. [発表資料]
2010/03/20 : 石川健一, 他5名, 「多フレーバー格子ゲージ理論計算におけるシュレーディンガー汎関数法のGPUを用いた加速について」, 日本物理学会第65回年次大会, 岡山大学 [発表資料]
2010/03/20 : 尾崎祐介, 石川健一, 「GPUクラスタによる格子QCD計算」, 日本物理学会第65回年次大会, 岡山大学 [発表資料]
2009/03/28 : 尾崎祐介, 「GPUを用いたlattice計算」, 日本物理学会第64回年次大会, 立教大学 [発表資料]

[SolverBench-v0.0.tar.gz](2009/4/2) Even-odd site preconditioned solver for clover fermion, BiCGStab benchmark (BiCGStab[CPU] vs Nested BiCGStab[CPU+GPU]) program. Gauge links are generated randomly in the program.

OLD versions (single prec. part only)

[CUDA_QCDmult_0.07.SH.tar.gz] Lattice QCD (clover+)hopping matrix kernel for Nvidia CUDA (A test version, Shared memory used for y (fermion vec) 2008/01/20)
[CUDA_QCDmult_0.08.SH.tar.gz] Lattice QCD (clover+)hopping matrix kernel for Nvidia CUDA (A test version, Texture fetching used for U (link), 2008/03/03)
[CudaQCDSolver_0.02.tar.gz] Lattice QCD Wilson/Clover-Dirac quark solver for Nvidia CUDA (A test version, 2008/01/28), Dirichlet boundary condition, no-preconditioning, BiCGStab. Lattice size should be 2^x.
[CudaQCDSolver_0.06.tar.gz] Lattice QCD Wilson/Clover-Dirac quark solver for Nvidia CUDA (A test version, 2008/04/09), Dirichlet boundary condition, block domain decomposition preconditioning, BiCGStab, One physical domain = One thread block. Texture fetching for U, Shareed memory for y. Lattice size should be 2^x.

"00TEST.log" was produced with {NTX_=8,NTY_=8,NTZ_=8,NTT_=8,kappa=0.1500} using the following single precision lattice config.
Sample link/clover term fields data,(16^3x32 lattice) :
- LINK field (single prec.): ./CUDA/LINK.dat.gz
- Inverse Clover term field (single prec.): ./CUDA/CLVINV.dat.gz