Benchmark GPU dense Computing -> ~ 98% GPU time
- Llama3.2-3B
- 50 training steps
- with FSDP
- with torch.compile
- Download Model : Llama-3.2-3B-Instruct
- Download Dataset : hieunguyenminh/roleplay
- Run benchmark with command:
sbatch slurm/bench_h100_cap.slurm
sbatch slurm/bench_h100_nocap.slurm
- using
pytorch-gpu/py3/2.5.0
module
(Please see requierements.txt
to have module equivalence)
But imported libraries list should be:
- torch==2.5.0
- transformers==4.46.0
- datasets==3.0.2
- idr_torch==2.2.0
- torchmetrics==1.5.1