Apply Zero-3 and LoRA appears empty lora weight [0] #969

New issue

Open

Description

jiangxinke

opened

on Apr 16, 2025

System Info

accelerate 1.6.0
peft 0.15.0
transformers 4.51.3
deepspeed 0.16.5

Information

The official example scripts

My own modified scripts
Tasks

An officially supported task in the examples folder

My own task or dataset (give details below)
Reproduction

from transformers import AutoModelForCausalLM, AutoTokenizer from peft import LoraConfig, get_peft_model from accelerate import Accelerator import torch from torch.utils.data import Dataset, DataLoader class DummyDataset(Dataset): def __init__(self, tokenizer, dummy_text="Hello, world!", num_samples=100): self.tokenizer = tokenizer self.dummy_text = dummy_text self.num_samples = num_samples def __len__(self): return self.num_samples def __getitem__(self, idx): encoded = self.tokenizer(self.dummy_text, return_tensors="pt") item = {key: val.squeeze(0) for key, val in encoded.items()} return item accelerator = Accelerator() model_name = "/home/clouduser/jxk/Qwen2.5-1.5B-Instruct" model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) lora_config = LoraConfig(r=8, lora_alpha=32, lora_dropout=0.1, bias="none", task_type="CAUSAL_LM") model = get_peft_model(model, lora_config) optimizer = torch.optim.Adam(model.parameters(), lr=5e-5) dummy_dataset = DummyDataset(tokenizer, dummy_text="Hello, world!", num_samples=100) dataloader = DataLoader(dummy_dataset, batch_size=4, shuffle=True) print("++++" * 100) policy_state_dict = model.state_dict() for key, value in policy_state_dict.items(): if "lora_A" in key or "lora_B" in key: print(f"{key}: {value.shape}") print("++++" * 100) print("====" * 100) print("====" * 100) print("====" * 100) print("====" * 100) model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader) print("++++" * 100) policy_state_dict = model.state_dict() for key, value in policy_state_dict.items(): if "lora_A" in key or "lora_B" in key: print(f"{key}: {value.shape}") print("++++" * 100)

The printed results (lora weight) are:

Before using zero 3:

base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight: torch.Size([8, 1536]) base_model.model.model.layers.20.self_attn.q_proj.lora_B.def ault.weight: torch.Size([1536, 8]) base_model.model.model.layers.20.self_attn.v_proj.lora_A.def ault.weight: torch.Size([8, 1536]) base_model.model.model.layers.20.self_attn.v_proj.lora_B.def ault.weight: torch.Size([256, 8]) base_model.model.model.layers.21.self_attn.q_proj.lora_A.def ault.weight: torch.Size([8, 1536]) base_model.model.model.layers.21.self_attn.q_proj.lora_B.def ault.weight: torch.Size([1536, 8]) base_model.model.model.layers.21.self_attn.v_proj.lora_A.def ault.weight: torch.Size([8, 1536])

After using zero 3:

module.base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight: torch.Size([0]) module.base_model.model.model.layers.21.self_attn.q_proj.lor a_B.default.weight: torch.Size([0]) module.base_model.model.model.layers.21.self_attn.v_proj.lor a_A.default.weight: torch.Size([0]) module.base_model.model.model.layers.21.self_attn.v_proj.lor a_B.default.weight: torch.Size([0]) module.base_model.model.model.layers.22.self_attn.q_proj.lor a_A.default.weight: torch.Size([0]) module.base_model.model.model.layers.22.self_attn.q_proj.lor a_B.default.weight: torch.Size([0]) module.base_model.model.model.layers.22.self_attn.v_proj.lor a_A.default.weight: torch.Size([0]) module.base_model.model.model.layers.22.self_attn.v_proj.lor a_B.default.weight: torch.Size([0]) module.base_model.model.model.layers.23.self_attn.q_proj.lor a_A.default.weight: torch.Size([0])

This is my zero-stage config file:

compute_environment: LOCAL_MACHINE debug: false deepspeed_config: deepspeed_multinode_launcher: standard offload_optimizer_device: none offload_param_device: none zero3_init_flag: true zero3_save_16bit_model: true zero_stage: 3 distributed_type: DEEPSPEED downcast_bf16: 'no' machine_rank: 0 main_training_function: main mixed_precision: bf16 num_machines: 1 num_processes: 1 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false

My model is :

name: "Qwen/Qwen/Qwen1.5-0.5B-Chat" # name: "Qwen/Qwen2.5-7B-Instruct" # name: "Qwen/Qwen2.5-32B-Instruct" # name: "Qwen/Qwen2.5-14B-Instruct" # name: "internlm/internlm2_5-1_8b" # name: "meta-llama/Llama-3.1-8B-Instruct"

This is my lora config:

lora_config: r: 8 lora_alpha: 32 target_modules: - "q_proj" # qwen - "v_proj" # qwen lora_dropout: 0.1 bias: "none" task_type: "CAUSAL_LM"

Expected behavior

After using Deepspeed's lora+zero3, I found that the weight of lora changed to [0]; If I use zero2 without encountering such problems, can you help me?

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply Zero-3 and LoRA appears empty lora weight [0] #969

Description

System Info

Information

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions