Skip to content

Benchmark calcul dense sur GPU - FSDP, torch.compile, Llama3.2

License

Notifications You must be signed in to change notification settings

idriscnrs/bench_fsdp_dlojz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Benchmark FSDP DLO-JZ

Benchmark GPU dense Computing -> ~ 98% GPU time

  • Llama3.2-3B
  • 50 training steps
  • with FSDP
  • with torch.compile

Prerequisites

Environment & Running

On Jean-Zay

  • Run benchmark with command:
    • sbatch slurm/bench_h100_cap.slurm
    • sbatch slurm/bench_h100_nocap.slurm
  • using pytorch-gpu/py3/2.5.0 module

Other system

(Please see requierements.txt to have module equivalence)

But imported libraries list should be:

  • torch==2.5.0
  • transformers==4.46.0
  • datasets==3.0.2
  • idr_torch==2.2.0
  • torchmetrics==1.5.1

About

Benchmark calcul dense sur GPU - FSDP, torch.compile, Llama3.2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published