Deep speed#1139
Conversation
…ccelerate settings
support deepspeed
|
I tested new branch with some of settings. It seems like even if SD-variants(cascade, SD-3, etc.) come out later, they will work well with wrapping. |
|
Hey @BootsofLagrangian |
Can you attach your bash script or toml config file? |
|
@BootsofLagrangian compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
gradient_accumulation_steps: 1
offload_optimizer_device: none
offload_param_device: none
zero3_init_flag: false
zero_stage: 2
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: falseHere is how I run finetuning: accelerate launch --gpu_ids="0,1" --multi_gpu --num_processes=2 --num_cpu_threads_per_process=2 "./sdxl_train.py" \
--ddp_timeout='1000' \
--bucket_no_upscale \
--bucket_reso_steps=64 \
--cache_latents \
--cache_latents_to_disk \
--caption_extension=".txt" \
--dataset_repeats="20" \
--enable_bucket \
--min_bucket_reso=64 \
--max_bucket_reso=1024 \
--in_json="/home/storuky/ml/train/meta_cap.json" \
--gradient_checkpointing \
--learning_rate="1.2e-06" \
--learning_rate_te1="5e-07" \
--learning_rate_te2="5e-07" \
--logging_dir="/home/storuky/ml/train/log" \
--lr_scheduler="constant" \
--lr_scheduler_args \
--lr_scheduler_type "CosineAnnealingLR" \
--lr_scheduler_args "T_max=10" \
--max_data_loader_n_workers="0" \
--resolution="1024,1024" \
--max_timestep=900 \
--max_token_length=225 \
--max_train_epochs=10 \
--max_train_steps="979575" \
--min_snr_gamma=5 \
--min_timestep=100 \
--mixed_precision="bf16" \
--no_half_vae \
--noise_offset=0.0375 \
--adaptive_noise_scale=0.00375 \
--optimizer_args scale_parameter=False relative_step=False warmup_init=False weight_decay=0.01 \
--optimizer_type="Adafactor" \
--output_dir="/home/storuky/ml/out" \
--output_name="TrainingModel" \
--pretrained_model_name_or_path="/home/storuky/ml/sd_xl_base_1.0.safetensors" \
--save_every_n_epochs="1" \
--save_model_as=safetensors \
--save_precision="bf16" \
--save_state \
--seed="1234" \
--train_batch_size="1" \
--train_data_dir="/home/storuky/ml/train/dataset" \
--train_text_encoder \
--v_pred_like_loss="0.5" \
--xformers \
--deepspeed \
--zero_stage 2 \
--offload_optimizer_device cpu |
|
When you want to use cpu offloading with offload_optimizer_device=cpu, DeepSpeed will build and use CPUAdam. It is also kind of Adam. Can you change When I use adafactor, I got another error. No error with adamw. |
|
@BootsofLagrangian Yeah, I tried AdamW as well but no luck so far... Here is a full trace of issue with AdamW as optimizer (spoiler: it's happening with any kind of offload_optimizer_device... none, nvme, cpu – doesn't matter): |
|
@BootsofLagrangian even if I copy your toml conf from here , change only paths and run as you described I still get this error. Tried to reconfigure accelerate and reinstall/install another versions on Deepspeed – no affect. |
|
@BootsofLagrangian Ah, I just switched to your version and it's working! The issue just with this branch. |
Thank for your report! |
- we have to prepare optimizer and ds_model at the same time. - pull/1139#issuecomment-1986790007 Signed-off-by: BootsofLagrangian <hard2251@yonsei.ac.kr>
Fix sdxl_train.py in deepspeed branch
|
Edit: see comment below for reason (missing config.toml pretrained_model_name_or_path = "runwayml/stable-diffusion-v1-5"
dataset_config = "/home/ml/checkpoints/sd15/dataset.toml"
xformers = true
deepspeed = true
zero_stage = 2
mixed_precision = "bf16"
save_precision = "bf16"
full_bf16 = true
no_half_vae = true
train_batch_size = 24
max_data_loader_n_workers = 4
persistent_data_loader_workers = true
optimizer_type = "AdamW8bit"
optimizer_args = [ "weight_decay=1e-1", ]
lr_scheduler = "constant"
max_train_steps = 78452
gradient_checkpointing = true
gradient_accumulation_steps = 16
learning_rate = 4e-5
unet_lr = 4e-5
text_encoder_lr = 2e-5
max_grad_norm = 1.0
max_token_length = 225
network_alpha = 64
network_dim = 128
network_module = "networks.lora"
cache_latents = true
cache_latents_to_disk = true |
Original PR #1101
I think it is not necessary to set back
unetortext_encoderwith the result ofprepare_deepspeed_model. Because the model is notlist, so they are not changed in the function.