add flux support #356

leejet · 2024-08-21T13:27:47Z

Although the architecture is similar to sd3, flux actually has a lot of additional things to implement, so adding flux support took me a bit longer. After merging this pr, I will take some time to merge the PRs of other contributors.

How to Use

Download weights

Download flux-dev from https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/flux1-dev.safetensors
Download flux-schnell from https://huggingface.co/black-forest-labs/FLUX.1-schnell/blob/main/flux1-schnell.safetensors
Download vae from https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/ae.safetensors
Download clip_l from https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/clip_l.safetensors
Download t5xxl from https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/t5xxl_fp16.safetensors

Convert flux weights

Using fp16 will lead to overflow, but ggml's support for bf16 is not yet fully developed. Therefore, we need to convert flux to gguf format here, which also saves VRAM. For example:

.\bin\Release\sd.exe -M convert -m ..\..\ComfyUI\models\unet\flux1-dev.sft -o ..\models\flux1-dev-q8_0.gguf -v --type q8_0

Run

--cfg-scale is recommended to be set to 1.

Flux-dev q8_0

 .\bin\Release\sd.exe --diffusion-model  ..\models\flux1-dev-q8_0.gguf --vae ..\..\ComfyUI\models\vae\ae.sft --clip_l ..\..\ComfyUI\models\clip\clip_l.safetensors --t5xxl ..\..\ComfyUI\models\clip\t5xxl_fp16.safetensors  -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v

Flux-dev q4_0

.\bin\Release\sd.exe --diffusion-model  ..\models\flux1-dev-q4_0.gguf --vae ..\..\ComfyUI\models\vae\ae.sft --clip_l ..\..\ComfyUI\models\clip\clip_l.safetensors --t5xxl ..\..\ComfyUI\models\clip\t5xxl_fp16.safetensors  -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v

Flux-dev q3_k

.\bin\Release\sd.exe --diffusion-model  ..\models\flux1-dev-q3_k.gguf --vae ..\..\ComfyUI\models\vae\ae.sft --clip_l ..\..\ComfyUI\models\clip\clip_l.safetensors --t5xxl ..\..\ComfyUI\models\clip\t5xxl_fp16.safetensors  -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v

Flux-dev q2_k

.\bin\Release\sd.exe --diffusion-model  ..\models\flux1-dev-q2_k.gguf --vae ..\..\ComfyUI\models\vae\ae.sft --clip_l ..\..\ComfyUI\models\clip\clip_l.safetensors --t5xxl ..\..\ComfyUI\models\clip\t5xxl_fp16.safetensors  -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v

Flux-schnell q8_0

 .\bin\Release\sd.exe --diffusion-model  ..\models\flux1-schnell-q8_0.gguf --vae ..\..\ComfyUI\models\vae\ae.sft --clip_l ..\..\ComfyUI\models\clip\clip_l.safetensors --t5xxl ..\..\ComfyUI\models\clip\t5xxl_fp16.safetensors  -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v --steps 4

Run with LoRA

Since many flux LoRA training libraries have used various LoRA naming formats, it is possible that not all flux LoRA naming formats are supported. It is recommended to use LoRA with naming formats compatible with ComfyUI.

Flux dev q8_0 with LoRA

LoRA model from https://huggingface.co/XLabs-AI/flux-lora-collection/tree/main (using comfy converted version!!!)

.\bin\Release\sd.exe --diffusion-model  ..\..\flux-gguf\flux1-dev-q8_0.gguf --vae ..\..\ComfyUI\models\vae\ae.sft --clip_l ..\..\ComfyUI\models\clip\clip_l.safetensors --t5xxl ..\..\ComfyUI\models\clip\t5xxl_fp16.safetensors  -p "a lovely cat holding a sign says 'flux.cpp'<lora:realism_lora_comfy_converted:1>" --cfg-scale 1.0 --sampling-method euler -v --lora-model-dir ../models

stduhpf · 2024-08-21T16:04:30Z

Doesn't compile for me. Somthing about ggml_group_norm expecting 4 arguments in ggml_extend.hpp, but only 3 are passed. If I set the 4th argument to some float it does compile, but I'm not sure if it will work.

~~EDIT: I set the 4th argument to EPS, the compilation goes fine, but it consitently crashes when loading the flux model.~~

EDIT 2: I was just being stupid, nevermind

phudtran · 2024-08-21T16:51:13Z

Doesn't compile for me. Somthing about ggml_group_norm expecting 4 arguments in ggml_extend.hpp, but only 3 are passed. If I set the 4th argument to some float it does compile, but I'm not sure if it will work.

EDIT: I set the 4th argument to EPS, the compilation goes fine, but it consitently crashes when loading the flux model.

Which model did you use and how much VRAM does your GPU have? Could be a memory issue since these models are pretty large.

stduhpf · 2024-08-21T16:52:51Z

Doesn't compile for me. Somthing about ggml_group_norm expecting 4 arguments in ggml_extend.hpp, but only 3 are passed. If I set the 4th argument to some float it does compile, but I'm not sure if it will work.
EDIT: I set the 4th argument to EPS, the compilation goes fine, but it consitently crashes when loading the flux model.

Which model did you use and how much VRAM does your GPU have? Could be a memory issue since these models are pretty large.

I was trying to run it on CPU. With 32 GB of RAM + lots of swap

stduhpf · 2024-08-21T17:44:47Z

Anyways, I figured out I was just on the wrong commit for the ggml submodule, checking out the correct one fixed the compilation and now it works!

stduhpf · 2024-08-21T19:36:24Z

It's almost twice as fast as ComfyUI's implementation of GGUF support for Flux.

On my Ryzen9 5900x (no GPU) with q4_1 model (512² resolution):

sd.cpp: 52.83 s/it
ComfyUI: 97.16 s/it

Great job @leejet !

Green-Sky · 2024-08-22T01:18:28Z

Looks like conversion on cpu with 32gigs or ram + swap is not enough.

[1171961.971637] Out of memory: Killed process 1686847 (sd) total-vm:25179064kB, anon-rss:23787920kB, file-rss:768kB, shmem-rss:0kB, UID:1000 pgtables:46624kB oom_score_adj:0

[INFO ] model.cpp:737  - load models/flux1-schnell.safetensors using safetensors format
[DEBUG] model.cpp:803  - init from 'models/flux1-schnell.safetensors'
[INFO ] model.cpp:1665 - model tensors mem size: 12050.42MB
[DEBUG] model.cpp:1459 - loading tensors from models/flux1-schnell.safetensors
[INFO ] model.cpp:1704 - load tensors done
[INFO ] model.cpp:1705 - trying to save tensors to models/flux1-schnell-q8_0.gguf
Killed

update: added a 32gig swapfile, conversion works now

Green-Sky · 2024-08-22T10:26:10Z

Ok, took a stab at it on cpu only for now.

here q8_0:

and here q2_k:

while I am amazed that q2_k works this well, its obv not good rn. also its way slower on cpu, but also only uses 4.5gig of ram !

edit: redid q2_k with the other prompt and simd (was running pure scalar before, slow af)

on my system with avx2 q2_k gives 84.02s/it and for q8_0 gives 84.69s/it, so its memory bottlenecked.

edit2: q2_k with cuda on gpu is somehow slightly better

cheeseng · 2024-08-22T14:51:21Z

Worked with CPU-only build, crashed with core dump when built with cuda, probably due to low 4GB VRAM on my GTX GPU.

Thanks for the nice work!

Green-Sky · 2024-08-22T14:58:03Z

q3_k looks a lot better compared to q2_k while still being sized resonable for my 8gig vram.

Green-Sky · 2024-08-22T17:25:42Z

@leejet it would be nice if sd.cpp supported llama.cpp tensor naming conventions. Since text encoders exploded in size and now consume substantial amounts of resources, making use of ggml quantizations would be very usable. So I went and tried loading the q8_0 t5xxl from here https://huggingface.co/city96/t5-v1_1-xxl-encoder-gguf/tree/main but it does not load.

looking at the log it becomes obvious quick:

[DEBUG] model.cpp:1459 - loading tensors from models/flux-extra/t5-v1_1-xxl-encoder-Q8_0.gguf
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_k.weight | q8_0 | 2 [4096, 4096, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_o.weight | q8_0 | 2 [4096, 4096, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_q.weight | q8_0 | 2 [4096, 4096, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_rel_b.weight | f32 | 2 [64, 32, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_v.weight | q8_0 | 2 [4096, 4096, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_norm.weight | f32 | 1 [4096, 1, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.ffn_gate.weight | q8_0 | 2 [4096, 10240, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.ffn_up.weight | q8_0 | 2 [4096, 10240, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.ffn_down.weight | q8_0 | 2 [10240, 4096, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.ffn_norm.weight | f32 | 1 [4096, 1, 1, 1, 1]' in model file
...

[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.SelfAttention.k.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.SelfAttention.o.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.SelfAttention.q.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.SelfAttention.v.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.layer_norm.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.1.DenseReluDense.wi_0.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.1.DenseReluDense.wi_1.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.1.DenseReluDense.wo.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.1.layer_norm.weight' not in model file
...

I think city96's conversion is using llama.cpp's tensor name convention.

stduhpf · 2024-08-22T21:25:33Z

@leejet it would be nice if sd.cpp supported llama.cpp tensor naming conventions. Since text encoders exploded in size and now consume substantial amounts of resources, making use of ggml quantizations would be very usable. So I went and tried loading the q8_0 t5xxl from here https://huggingface.co/city96/t5-v1_1-xxl-encoder-gguf/tree/main but it does not load.

looking at the log it becomes obvious quick:

[DEBUG] model.cpp:1459 - loading tensors from models/flux-extra/t5-v1_1-xxl-encoder-Q8_0.gguf
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_k.weight | q8_0 | 2 [4096, 4096, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_o.weight | q8_0 | 2 [4096, 4096, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_q.weight | q8_0 | 2 [4096, 4096, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_rel_b.weight | f32 | 2 [64, 32, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_v.weight | q8_0 | 2 [4096, 4096, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_norm.weight | f32 | 1 [4096, 1, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.ffn_gate.weight | q8_0 | 2 [4096, 10240, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.ffn_up.weight | q8_0 | 2 [4096, 10240, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.ffn_down.weight | q8_0 | 2 [10240, 4096, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.ffn_norm.weight | f32 | 1 [4096, 1, 1, 1, 1]' in model file
...

[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.SelfAttention.k.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.SelfAttention.o.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.SelfAttention.q.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.SelfAttention.v.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.layer_norm.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.1.DenseReluDense.wi_0.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.1.DenseReluDense.wi_1.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.1.DenseReluDense.wo.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.1.layer_norm.weight' not in model file
...

I think city96's conversion is using llama.cpp's tensor name convention.

I agree being able to use llama.cpp quants could be great, though you can always quantize the t5 encoder with stable-diffusion.cpp yourself and get a working gguf.

MGTRIDER · 2024-08-23T00:10:27Z

Something seems to be really wrong with flux rendering on stable-diffusion.cpp backend. With q4_0 quantization i run out of vram and the program crashes. I have 8gb of vram and on ComfyUI i can render resolutions of 1152x896 without problems at 6s/it, with out crashes. I can even use q5_0 quantized flux model without crashing. The weird thing is that with sd models including sdxl, everything renders fine and fast with memory efficiency on stable-diffusion.cpp, so i really wonder why flux acts this way on this backend. And another thing i have noticed is that when the nvidia driver tries to send a part of the model to shared vram, it looks like the clip models get unloaded from ram, causing the program to just hang at he sampling stage.

leejet · 2024-08-23T00:37:05Z

@leejet it would be nice if sd.cpp supported llama.cpp tensor naming conventions. Since text encoders exploded in size and now consume substantial amounts of resources, making use of ggml quantizations would be very usable. So I went and tried loading the q8_0 t5xxl from here https://huggingface.co/city96/t5-v1_1-xxl-encoder-gguf/tree/main but it does not load.

looking at the log it becomes obvious quick:

[DEBUG] model.cpp:1459 - loading tensors from models/flux-extra/t5-v1_1-xxl-encoder-Q8_0.gguf
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_k.weight | q8_0 | 2 [4096, 4096, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_o.weight | q8_0 | 2 [4096, 4096, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_q.weight | q8_0 | 2 [4096, 4096, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_rel_b.weight | f32 | 2 [64, 32, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_v.weight | q8_0 | 2 [4096, 4096, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.attn_norm.weight | f32 | 1 [4096, 1, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.ffn_gate.weight | q8_0 | 2 [4096, 10240, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.ffn_up.weight | q8_0 | 2 [4096, 10240, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.ffn_down.weight | q8_0 | 2 [10240, 4096, 1, 1, 1]' in model file
[INFO ] model.cpp:1605 - unknown tensor 'text_encoders.t5xxl.enc.blk.0.ffn_norm.weight | f32 | 1 [4096, 1, 1, 1, 1]' in model file
...

[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.SelfAttention.k.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.SelfAttention.o.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.SelfAttention.q.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.SelfAttention.v.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.0.layer_norm.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.1.DenseReluDense.wi_0.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.1.DenseReluDense.wi_1.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.1.DenseReluDense.wo.weight' not in model file
[ERROR] model.cpp:1649 - tensor 'text_encoders.t5xxl.encoder.block.0.layer.1.layer_norm.weight' not in model file
...

I think city96's conversion is using llama.cpp's tensor name convention.

You can perform the quantization yourself.

Green-Sky · 2024-08-23T07:13:05Z

You can perform the quantization yourself.

You are right I tried it the wrong way first.

Here is q3_k flux with q8_0 t5xxl

I can not spot a difference to f16 t5xxl, so I recommend this over f16 in any case.

However, it does look like it is not using less memory.

[DEBUG] ggml_extend.hpp:1019 - t5 params backend buffer size = 9083.77 MB(RAM) (219 tensors)
vs f16:
[DEBUG] ggml_extend.hpp:1019 - t5 params backend buffer size = 9083.77 MB(RAM) (219 tensors)

edit: using q4_k for t5xxl does indeed change the result a bit

bit still acceptable.
but also
[DEBUG] ggml_extend.hpp:1019 - t5 params backend buffer size = 9083.77 MB(RAM) (219 tensors)

Green-Sky · 2024-08-23T16:25:10Z

I wanted to test a fine tune and merge(?) of flux.1 schnell and dev (??), but it contained f8_e4m3 tensors. So I went ahead and wrote an inplace up-convert similar to how it is done to bf16.

will make a pr in a bit.

edit: pr here #359

in the same way it is done for bf16 like how bf16 converts losslessly to fp32, f8_e4m3 converts losslessly to fp16

leejet · 2024-08-24T04:26:51Z

I tried uploading some quantized models to Hugging Face, but no matter which network I use, the upload speed is limited to 3 Mbps.

leejet · 2024-08-24T05:33:33Z

LoRA support has been added!

Green-Sky · 2024-08-24T06:10:39Z

I tried uploading some quantized models to Hugging Face, but no matter which network I use, the upload speed is limited to 3 Mbps.

Yea I also had issues with uploads being canceled all the time... no idea why.
Anyway some files got though and live here https://huggingface.co/Green-Sky/flux.1-schnell-GGUF/tree/main

Green-Sky · 2024-08-24T08:18:00Z

I also uploaded a f16 conversion of the vae, it looks almost lossless to me.

stduhpf · 2024-08-24T09:10:45Z

I also uploaded a f16 conversion of the vae, it looks almost lossless to me.

Even a q2_k vae looks good enough.

grigio · 2024-08-24T09:57:37Z

Does this Also work with AMD rocm ?

Green-Sky · 2024-08-24T10:40:01Z

I also uploaded a f16 conversion of the vae, it looks almost lossless to me.

Even a q2_k vae looks good enough.

If you look at the file sizes, it blocks anything lower than f16, so you are looking at f16.

Green-Sky · 2024-08-24T10:40:59Z

Does this Also work with AMD rocm ?

Not sure if anyone tried yet, but you can grab a build from here https://github.com/leejet/stable-diffusion.cpp/releases/tag/master-64d231f (if you run windows)

JohnClaw · 2024-08-24T11:05:36Z

I tried uploading some quantized models to Hugging Face, but no matter which network I use, the upload speed is limited to 3 Mbps.

Yea I also had issues with uploads being canceled all the time... no idea why. Anyway some files got though and live here https://huggingface.co/Green-Sky/flux.1-schnell-GGUF/tree/main

Thank you very much for uploading flux schnell gguf. Could you upload clip_l.safetensors or clip_l.gguf for this model, please?

Green-Sky · 2024-08-24T11:34:19Z

I tried uploading some quantized models to Hugging Face, but no matter which network I use, the upload speed is limited to 3 Mbps.

Yea I also had issues with uploads being canceled all the time... no idea why. Anyway some files got though and live here https://huggingface.co/Green-Sky/flux.1-schnell-GGUF/tree/main

Thank you very much for uploading flux schnell gguf. Could you upload clip_l.safetensors or clip_l.gguf for this model, please?

Sure, I uploaded gguf f16 (same as source safetensors) and q8_0.

If you want the safetensors, check the op for a link.

edit: I am not seeing much of a difference between f16 and q8_0 either.

JohnClaw · 2024-08-24T13:27:54Z

I tried uploading some quantized models to Hugging Face, but no matter which network I use, the upload speed is limited to 3 Mbps.

Yea I also had issues with uploads being canceled all the time... no idea why. Anyway some files got though and live here https://huggingface.co/Green-Sky/flux.1-schnell-GGUF/tree/main

Thank you very much for uploading flux schnell gguf. Could you upload clip_l.safetensors or clip_l.gguf for this model, please?

Sure, I uploaded gguf f16 (same as source safetensors) and q8_0.

If you want the safetensors, check the op for a link.

edit: I am not seeing much of a difference between f16 and q8_0 either.

Thanks. Tested it with this command-line parameters: sd.exe --diffusion-model ./models/flux1-schnell-q2_k.gguf --vae ./models/ae-f16.gguf --clip_l ./models/clip_l-f16.gguf --t5xxl ./models/t5xxl_q2_k.gguf -p "a lovely cat holding a sign says 'flux.cpp'" -t 8 --steps 4 --cfg-scale 1.0 --sampling-method euler -v

My system configuration: Ryzen 7 4700u, igpu Vega 7, 16gb ram, ssd, Windows 11. Image generation took 520 seconds or so. Each step took 110 seconds. Hope that kobold.cpp will upgrade it's stable-duffusion plugin to support flux, because kobold.cpp uses Vulkan acceleration which makes generation much faster. Are there any plans to add Vulkan build to next releases of stable-diffusion.cpp? By the way, i recently downloaded Amuse windows app (https://www.amuse-ai.com/) and it generates images very fast because it uses DirectML acceleration, onnx and SD-turbo technologies. 512x512, 4 steps image generation takes only 7 seconds! I'm very sad that there is no DirectML acceleration in stable-diffusion.cpp and llama.cpp. Another thing which makes me cry is the fact that flux onnx model can't be quantatized to fit into my 16gb ram. Or i don't know something and it can be done?

stduhpf · 2024-08-24T13:53:49Z

I tried uploading some quantized models to Hugging Face, but no matter which network I use, the upload speed is limited to 3 Mbps.

Yea I also had issues with uploads being canceled all the time... no idea why. Anyway some files got though and live here https://huggingface.co/Green-Sky/flux.1-schnell-GGUF/tree/main

Thank you very much for uploading flux schnell gguf. Could you upload clip_l.safetensors or clip_l.gguf for this model, please?

Sure, I uploaded gguf f16 (same as source safetensors) and q8_0.
If you want the safetensors, check the op for a link.
edit: I am not seeing much of a difference between f16 and q8_0 either.

Thanks. Tested it with this command-line parameters: sd.exe --diffusion-model ./models/flux1-schnell-q2_k.gguf --vae ./models/ae-f16.gguf --clip_l ./models/clip_l-f16.gguf --t5xxl ./models/t5xxl_q2_k.gguf -p "a lovely cat holding a sign says 'flux.cpp'" -t 8 --steps 4 --cfg-scale 1.0 --sampling-method euler -v

My system configuration: Ryzen 7 4700u, igpu Vega 7, 16gb ram, ssd, Windows 11. Image generation took 520 seconds or so. Each step took 110 seconds. Hope that kobold.cpp will upgrade it's stable-duffusion plugin to support flux, because kobold.cpp uses Vulkan acceleration which makes generation much faster. Are there any plans to add Vulkan build to next releases of stable-diffusion.cpp? By the way, i recently downloaded Amuse windows app (https://www.amuse-ai.com/) and it generates images very fast because it uses DirectML acceleration, onnx and SD-turbo technologies. 512x512, 4 steps image generation takes only 7 seconds! I'm very sad that there is no DirectML acceleration in stable-diffusion.cpp and llama.cpp. Another thing which makes me cry is the fact that flux onnx model can't be quantatized to fit into my 16gb ram. Or i don't know something and it can be done?

If you want Vulkan support, take a look at the discussion here: #291
(Especially this part: #291 (comment))

stduhpf · 2024-08-24T14:34:54Z

I also uploaded a f16 conversion of the vae, it looks almost lossless to me.

Even a q2_k vae looks good enough.

If you look at the file sizes, it blocks anything lower than f16, so you are looking at f16.

Interestingly, the file sizes are very close, but still slightly different.

f16: 163728 kB
q8_0: 163669 kB
q2_k: 163630 kB

There's also some very slight artifacting (a bit like jpeg) with the q2_k auto encoder that isn't noticable with the other quants I tested (q8 and f16):

Comparison

q2_k ae:

q8_0 ae:

f16 ae:

original floats:

I'm not sure if saving only a few kilobytes is worth a barely noticable difference in output. That's a strange dilemma. Quantized is definitely worth it compared to full size though.

leejet · 2024-08-24T16:18:26Z

Only a very small number of tensors of ae will be quantized.

phudtran · 2024-08-24T17:00:30Z

Would it be possible to package and use the flux unet, clip, ae, etc into a single file like with the SD models?

city96 · 2024-08-24T18:12:57Z

Figured I'd chime in. I've been doing some work over at ComfyUI-GGUF to support flux quantization for image gen.

I've noticed some differences between my version and this version by @Green-Sky higher up in the thread.

The most obvious thing is that the bias weights are quantized. These can be kept in FP32 without adding more than at most 40MBs to the final model file, and doing this should increase both quality and speed (since less tensors have to be dequantized overall, though this should be relatively fast on small tensors like that).

The second issue I noticed is that there's no logic for keeping more vital tensors in higher precision the same way llama.cpp does with LLMs. From my short tests, these benefit the most from doing so while only adding ~100MB:

For the text encoder, I've used the default llama.cpp binary to create them as both the full encoder/decoder as well as the encoder only model is supported natively now. Assuming your code can handle mixed quantization, I recommend using this method since keeping the token_embed and the norm/biases in higher precisions makes the effects of quantization a lot less severe.

Mapping the keys back to the original names is fairly straight forward. This is the mapping I ended up with for the replacement:

clip_sd_map = {
    "enc.": "encoder.",
    ".blk.": ".block.",
    "token_embd": "shared",
    "output_norm": "final_layer_norm",
    "attn_q": "layer.0.SelfAttention.q",
    "attn_k": "layer.0.SelfAttention.k",
    "attn_v": "layer.0.SelfAttention.v",
    "attn_o": "layer.0.SelfAttention.o",
    "attn_norm": "layer.0.layer_norm",
    "attn_rel_b": "layer.0.SelfAttention.relative_attention_bias",
    "ffn_up": "layer.1.DenseReluDense.wi_1",
    "ffn_down": "layer.1.DenseReluDense.wo",
    "ffn_gate": "layer.1.DenseReluDense.wi_0",
    "ffn_norm": "layer.1.layer_norm",
}

for k,v in state_dict.items():
    for s,d in clip_sd_map.items():
        k = k.replace(s,d)
    ...

Hope this helps!

leejet · 2024-08-25T06:10:53Z

In my tests, not converting the bias didn't make something better. Moreover, if I convert the txt_in/img_in layers, I even get worse results.

Green-Sky · 2024-08-25T09:49:20Z

like for sd3, there exists an tiny auto encoder that does not work with sd.cpp yet
https://huggingface.co/madebyollin/taef1

edit: this is not a priority, since vae speed has improved since taesd was first implemented in sd.cpp and uses less compute compared to flux diffusion anyway.

Green-Sky · 2024-08-25T11:40:12Z

q2_k	q3_k

flux.1-schnell 1024x1024 4step using the new q2_k and q3_k variants converted using 5c561ea

also using quants for:

ae f16
clip_l q8_0
t5xxl q8_0

q3_k looks like a real winner here. It looks ok with small imperfections but still very small.

(also i hate comic sans 🙈 )

bssrdf · 2024-09-01T17:05:49Z

I am seeing "unknown tensor " using 58d5473

[INFO ] model.cpp:829  - load ..\models\ae.safetensors using safetensors format
[DEBUG] model.cpp:897  - init from '..\models\ae.safetensors'
[INFO ] stable-diffusion.cpp:237  - Version: Flux Dev
[INFO ] stable-diffusion.cpp:268  - Weight type:                 f16
[INFO ] stable-diffusion.cpp:269  - Conditioner weight type:     f16
[INFO ] stable-diffusion.cpp:270  - Diffusion model weight type: q8_0
[INFO ] stable-diffusion.cpp:271  - VAE weight type:             f32
[DEBUG] stable-diffusion.cpp:273  - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:312  - set clip_on_cpu to true
[INFO ] stable-diffusion.cpp:315  - CLIP: Using CPU backend
[DEBUG] clip.hpp:171  - vocab size: 49408
[DEBUG] clip.hpp:182  -  trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1050 - clip params backend buffer size =  235.06 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1050 - t5 params backend buffer size =  9083.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1050 - flux params backend buffer size =  12068.09 MB(VRAM) (780 tensors)
[DEBUG] ggml_extend.hpp:1050 - vae params backend buffer size =  94.57 MB(VRAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:414  - loading weights
[DEBUG] model.cpp:1568 - loading tensors from ..\models\clip_l.safetensors
[DEBUG] model.cpp:1568 - loading tensors from ..\models\t5xxl_fp16.safetensors
[INFO ] model.cpp:1723 - unknown tensor 'text_encoders.t5xxl.encoder.embed_tokens.weight | f16 | 2 [4096, 32128, 1, 1, 1]' in model file
[DEBUG] model.cpp:1568 - loading tensors from ..\models\flux1-dev-q8_0.gguf
[DEBUG] model.cpp:1568 - loading tensors from ..\models\ae.safetensors
[INFO ] stable-diffusion.cpp:513  - total params memory size = 21481.50MB (VRAM 12162.66MB, RAM 9318.83MB): clip 9318.83MB(RAM), unet 12068.09MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:517  - loading model from '' completed, taking 36.52s
[INFO ] stable-diffusion.cpp:534  - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:588  - finished loaded file
[DEBUG] stable-diffusion.cpp:1405 - txt2img 832x1216
[DEBUG] stable-diffusion.cpp:1146 - prompt after extract and remove lora: "model as a navy officer on a ship, colorful, perfect face, natural skin, hard shadows, highly detail"

I am using t5xxl from https://huggingface.co/comfyanonymous/flux_text_encoders/blob/main/t5xxl_fp16.safetensors

ayttop · 2024-09-29T03:28:49Z

.\bin\Release\sd.exe
Where is the path?

CrushDemo01 · 2024-10-28T06:04:34Z

Hello, I want the quantized versions of T5 and CLIP to also use video memory. Is there any way to achieve this?

stduhpf · 2024-10-28T09:40:38Z

Hello, I want the quantized versions of T5 and CLIP to also use video memory. Is there any way to achieve this?

Yeah, sure. Just remove those lines and compile it again.
https://github.com/leejet/stable-diffusion.cpp/blob/master/stable-diffusion.cpp#L317C1-L320C17

CrushDemo01 · 2024-10-28T10:41:36Z

Hello, I want the quantized versions of T5 and CLIP to also use video memory. Is there any way to achieve this?

Yeah, sure. Just remove those lines and compile it again. https://github.com/leejet/stable-diffusion.cpp/blob/master/stable-diffusion.cpp#L317C1-L320C17
I commented out the code and recompiled it. But it has such errors.

root@ucloud-wlcb-gpu-010:/text2img/stable-diffusion.cpp/build#  ./bin/sd --diffusion-model /mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/flux1-schnell-q2_k.gguf --vae /mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/ae-f16.gguf --clip_l /mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/clip_l-q8_0.gguf --t5xxl /mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/t5xxl_q2_k.gguf  -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v --steps 4 -o flux_schenll.png
Option: 
    n_threads:         64
    mode:              txt2img
    model_path:        
    wtype:             unspecified
    clip_l_path:       /mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/clip_l-q8_0.gguf
    clip_g_path:       
    t5xxl_path:        /mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/t5xxl_q2_k.gguf
    diffusion_model_path:   /mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/flux1-schnell-q2_k.gguf
    vae_path:          /mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/ae-f16.gguf
    taesd_path:        
    esrgan_path:       
    controlnet_path:   
    embeddings_path:   
    stacked_id_embeddings_path:   
    input_id_images_path:   
    style ratio:       20.00
    normalize input image :  false
    output_path:       flux_schenll.png
    init_img:          
    control_image:     
    clip on cpu:       false
    controlnet cpu:    false
    vae decoder on cpu:false
    strength(control): 0.90
    prompt:            a lovely cat holding a sign says 'flux.cpp'
    negative_prompt:   
    min_cfg:           1.00
    cfg_scale:         1.00
    guidance:          3.50
    clip_skip:         -1
    width:             512
    height:            512
    sample_method:     euler
    schedule:          default
    sample_steps:      4
    strength(img2img): 0.75
    rng:               cuda
    seed:              42
    batch_count:       1
    vae_tiling:        false
    upscale_repeats:   1
System Info: 
    BLAS = 1
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 1
    AVX512_VBMI = 1
    AVX512_VNNI = 1
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:159  - Using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 8 CUDA devices:
  Device 0: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes
  Device 1: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes
  Device 2: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes
  Device 3: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes
  Device 4: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes
  Device 5: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes
  Device 6: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes
  Device 7: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes
[INFO ] stable-diffusion.cpp:204  - loading clip_l from '/mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/clip_l-q8_0.gguf'
[INFO ] model.cpp:801  - load /mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/clip_l-q8_0.gguf using gguf format
[DEBUG] model.cpp:818  - init from '/mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/clip_l-q8_0.gguf'
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
[INFO ] stable-diffusion.cpp:218  - loading t5xxl from '/mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/t5xxl_q2_k.gguf'
[INFO ] model.cpp:801  - load /mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/t5xxl_q2_k.gguf using gguf format
[DEBUG] model.cpp:818  - init from '/mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/t5xxl_q2_k.gguf'
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
[INFO ] stable-diffusion.cpp:225  - loading diffusion model from '/mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/flux1-schnell-q2_k.gguf'
[INFO ] model.cpp:801  - load /mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/flux1-schnell-q2_k.gguf using gguf format
[DEBUG] model.cpp:818  - init from '/mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/flux1-schnell-q2_k.gguf'
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
[INFO ] stable-diffusion.cpp:232  - loading vae from '/mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/ae-f16.gguf'
[INFO ] model.cpp:801  - load /mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/ae-f16.gguf using gguf format
[DEBUG] model.cpp:818  - init from '/mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/ae-f16.gguf'
WARNING: Behavior may be unexpected when allocating 0 bytes for ggml_calloc!
[INFO ] stable-diffusion.cpp:244  - Version: Flux Schnell 
[INFO ] stable-diffusion.cpp:275  - Weight type:                 q8_0
[INFO ] stable-diffusion.cpp:276  - Conditioner weight type:     q8_0
[INFO ] stable-diffusion.cpp:277  - Diffusion model weight type: q2_K
[INFO ] stable-diffusion.cpp:278  - VAE weight type:             f16
[DEBUG] stable-diffusion.cpp:280  - ggml tensor size = 400 bytes
[DEBUG] clip.hpp:171  - vocab size: 49408
[DEBUG] clip.hpp:182  -  trigger word img already in vocab
[DEBUG] ggml_extend.hpp:1045 - clip params backend buffer size =  125.22 MB(VRAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1045 - t5 params backend buffer size =  4826.11 MB(VRAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1045 - flux params backend buffer size =  3732.51 MB(VRAM) (776 tensors)
[DEBUG] ggml_extend.hpp:1045 - vae params backend buffer size =  94.57 MB(VRAM) (138 tensors)
[DEBUG] stable-diffusion.cpp:407  - loading weights
[DEBUG] model.cpp:1548 - loading tensors from /mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/clip_l-q8_0.gguf
[DEBUG] model.cpp:1548 - loading tensors from /mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/t5xxl_q2_k.gguf
[INFO ] model.cpp:1703 - unknown tensor 'text_encoders.t5xxl.transformer.encoder.embed_tokens.weight | q2_K | 2 [4096, 32128, 1, 1, 1]' in model file
[DEBUG] model.cpp:1548 - loading tensors from /mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/flux1-schnell-q2_k.gguf
[DEBUG] model.cpp:1548 - loading tensors from /mnt/data/xxxyyyzzz/flux.1-schnell-GGUF/ae-f16.gguf
[INFO ] stable-diffusion.cpp:491  - total params memory size = 8778.42MB (VRAM 8778.42MB, RAM 0.00MB): clip 4951.33MB(VRAM), unet 3732.51MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:510  - loading model from '' completed, taking 45.05s
[INFO ] stable-diffusion.cpp:527  - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:581  - finished loaded file
[DEBUG] stable-diffusion.cpp:1390 - txt2img 512x512
[DEBUG] stable-diffusion.cpp:1139 - prompt after extract and remove lora: "a lovely cat holding a sign says 'flux.cpp'"
[INFO ] stable-diffusion.cpp:664  - Attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:1144 - apply_loras completed, taking 0.00s
[DEBUG] conditioner.hpp:1036 - parse 'a lovely cat holding a sign says 'flux.cpp'' to [['a lovely cat holding a sign says 'flux.cpp'', 1], ]
[DEBUG] clip.hpp:311  - token length: 77
[DEBUG] t5.hpp:397  - token length: 256
[DEBUG] ggml_extend.hpp:997  - t5 compute buffer size: 68.25 MB(VRAM)
ggml_cuda_compute_forward: GET_ROWS failed
CUDA error: invalid configuration argument
  current device: 0, in function ggml_cuda_compute_forward at /text2img/stable-diffusion.cpp/ggml/src/ggml-cuda.cu:2326
  err
/text2img/stable-diffusion.cpp/ggml/src/ggml-cuda.cu:102: CUDA error
Aborted (core dumped)

stduhpf · 2024-10-28T14:05:45Z

Ah i thought it would work, I guess I was wrong.

Glancing at the code of the Cuda backend it looks like the GET_ROWS operation isn't supported for k quants? Do you have enough VRAM to test with a q4_0 quant instead?

CrushDemo01 · 2024-10-29T03:02:00Z

Thanks. I've tried it all, and many types of quant have the same error.

leejet added 2 commits August 21, 2024 21:14

add flux support

00b542d

avoid build failures in non-CUDA environments

8650b87

leejet mentioned this pull request Aug 21, 2024

Support for Flux #323

Open

fix schnell support

77ca8e3

SkutteOleg mentioned this pull request Aug 21, 2024

Support providing diffusion models and text encoders separately? #352

Closed

add k quants support

e91ce4f

SkutteOleg mentioned this pull request Aug 21, 2024

Add vulkan backend #291

Merged

add support for applying lora to quantized tensors

29ec316

add inplace conversion support for f8_e4m3 (#359)

d8c65b4

in the same way it is done for bf16 like how bf16 converts losslessly to fp32, f8_e4m3 converts losslessly to fp16

add xlabs flux comfy converted lora support

46eeff5

update docs

d04248f

leejet merged commit 64d231f into master Aug 24, 2024
8 checks passed

add flux support #356

add flux support #356

Conversation

leejet commented Aug 21, 2024 • edited Loading

How to Use

Download weights

Convert flux weights

Run

Flux-dev q8_0

Flux-dev q4_0

Flux-dev q3_k

Flux-dev q2_k

Flux-schnell q8_0

Run with LoRA

Flux dev q8_0 with LoRA

stduhpf commented Aug 21, 2024 • edited Loading

phudtran commented Aug 21, 2024

stduhpf commented Aug 21, 2024

stduhpf commented Aug 21, 2024

stduhpf commented Aug 21, 2024

Green-Sky commented Aug 22, 2024 • edited Loading

Green-Sky commented Aug 22, 2024 • edited Loading

cheeseng commented Aug 22, 2024

Green-Sky commented Aug 22, 2024

Green-Sky commented Aug 22, 2024

stduhpf commented Aug 22, 2024

MGTRIDER commented Aug 23, 2024 • edited Loading

leejet commented Aug 23, 2024

Green-Sky commented Aug 23, 2024 • edited Loading

Green-Sky commented Aug 23, 2024 • edited Loading

leejet commented Aug 24, 2024

leejet commented Aug 24, 2024

Green-Sky commented Aug 24, 2024

Green-Sky commented Aug 24, 2024

stduhpf commented Aug 24, 2024

grigio commented Aug 24, 2024

Green-Sky commented Aug 24, 2024

Green-Sky commented Aug 24, 2024

JohnClaw commented Aug 24, 2024

Green-Sky commented Aug 24, 2024 • edited Loading

JohnClaw commented Aug 24, 2024 • edited Loading

stduhpf commented Aug 24, 2024

stduhpf commented Aug 24, 2024

leejet commented Aug 24, 2024

phudtran commented Aug 24, 2024

city96 commented Aug 24, 2024

leejet commented Aug 25, 2024

Green-Sky commented Aug 25, 2024 • edited Loading

Green-Sky commented Aug 25, 2024

bssrdf commented Sep 1, 2024

ayttop commented Sep 29, 2024

CrushDemo01 commented Oct 28, 2024

stduhpf commented Oct 28, 2024 • edited Loading

CrushDemo01 commented Oct 28, 2024

stduhpf commented Oct 28, 2024

CrushDemo01 commented Oct 29, 2024

leejet commented Aug 21, 2024 •

edited

Loading

stduhpf commented Aug 21, 2024 •

edited

Loading

Green-Sky commented Aug 22, 2024 •

edited

Loading

Green-Sky commented Aug 22, 2024 •

edited

Loading

MGTRIDER commented Aug 23, 2024 •

edited

Loading

Green-Sky commented Aug 23, 2024 •

edited

Loading

Green-Sky commented Aug 23, 2024 •

edited

Loading

Green-Sky commented Aug 24, 2024 •

edited

Loading

JohnClaw commented Aug 24, 2024 •

edited

Loading

Green-Sky commented Aug 25, 2024 •

edited

Loading

stduhpf commented Oct 28, 2024 •

edited

Loading