Issue search results

Filter by

65 results

(52 ms)indatabricks/megablocks (press backspace or delete to remove)

databricks/megablocks
MOE uses more memory than dense model and is slower

I am training a ~520 M model, but I have found that the megablocks moe version uses substantially more memory and takes longer to train than a dense model of corresponding size. I am using a model embedding ...

samuelwheeler

Opened
9 days ago

#166

databricks/megablocks
bump version for PyTorch 2.6

Same as https://github.com/databricks/megablocks/issues/159 ~Solution is to install megablocks without causing a torch reinstall. (In particular, changing to torch =2.0.0 doesn t work, but torch =2.6.0 ...

ad8e

Opened
15 days ago

#165

databricks/megablocks
[rank5]: TypeError: SortOp.forward() takes from 2 to 3 positional arguments but 5 were given When running moe script

[rank5]: Traceback (most recent call last): [rank5]: File /root/Stanford-Megatron-LM/pretrain_gpt.py , line 154, in module [rank5]: pretrain(train_valid_test_datasets_provider, model_provider, ...

rtmadduri

Opened
on Feb 8

#164

databricks/megablocks
Grouped GEMM execution not possible with HW

When running the grouped gemm implementation and expert parallelism, i am faced with the following error: [rank5]: File /env/lib/python3.11/site-packages/megablocks-0.8.0.dev0-py3.11-linux-x86_64.egg/megablocks/layers/glu.py ...

cassanof

Opened
on Jan 3

#163

databricks/megablocks
CUDA Error When Running Single GPU Experiment

I have been trying to run some of the exp training code onnvcr.io/nvidia/pytorch:23.09-py3 . However, I seem to keep getting errors regardless of the scripts. After some testing, it seems that even running ...

kevin3567

Opened
on Nov 14, 2024

#161

databricks/megablocks
Is current megablocks compatible with distributed optimizer in Megatron-LM?

Hi there, thanks for the amazing work! I found expert parallel is not compatible with the distributed optimizer in the fork version of Megatron-LM here: https://github.com/stanford-futuredata/Megatron-LM/blob/85f95aef3b648075fe6f291c86714fdcbd9cd1f5/megatron/arguments.py#L352-L356 ...

Spico197

Opened
on Nov 11, 2024

#160

databricks/megablocks
Pytorch 2.5.0 + Megablocks undefined symbol error

Hi there, there seems to be an error with the newer pytorch (I already relaxed the 2.4.1 constraint in setup.py): 10 # Wrap this in a try-block with better error message and 11 # instructions ...

jramapuram

Opened
on Oct 19, 2024

#159

databricks/megablocks
amp_C undefined symbol after installing Megablocks

I am trying to setup and use megablocks to train MoE models, but I see the following error: Traceback (most recent call last): File /n/holyscratch01/dam_lab/brachit/moes/megablocks/third_party/Megatron-LM/pretrain_gpt.py ...

RachitBansal

Opened
on Oct 11, 2024

#157

databricks/megablocks
Do you support the bias of mlp？

Thanks for your work. i check the code and find the mlp only have w1 and w2. Does the sparse mlp support the bias? thanks a lot! i want to initialize the mlp with our original mlp with bias. Do you have ...

maobenz

Opened
on Oct 9, 2024

#156

databricks/megablocks
what devices are supported?

hello, i have tried to use megablocks in V100 + pytorch2.4.0+cu121, but get error with cannot support bf16 . If i use megablocks in fp32, i get error group gemm must use bf16 . So i change my enviroment ...

Guodanding

Opened
on Oct 8, 2024

#155

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip!

Press the

key to activate the search input again and adjust your query.

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip!

Restrict your search to the title by using the in:title qualifier.

Languages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter by

State

Advanced

databricks/megablocks
MOE uses more memory than dense model and is slower

databricks/megablocks
bump version for PyTorch 2.6

databricks/megablocks
[rank5]: TypeError: SortOp.forward() takes from 2 to 3 positional arguments but 5 were given When running moe script

databricks/megablocks
Grouped GEMM execution not possible with HW

databricks/megablocks
CUDA Error When Running Single GPU Experiment

databricks/megablocks
Is current megablocks compatible with distributed optimizer in Megatron-LM?

databricks/megablocks
Pytorch 2.5.0 + Megablocks undefined symbol error

databricks/megablocks
amp_C undefined symbol after installing Megablocks

databricks/megablocks
Do you support the bias of mlp？

databricks/megablocks
what devices are supported?

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.

issues Search Results · repo:databricks/megablocks language:Python

Filter by

State

Advanced

65 results

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.