Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Apply Zero-3 and LoRA appears empty lora weight [0] #969

Open
Open
Apply Zero-3 and LoRA appears empty lora weight [0]#969

Description

System Info

accelerate 1.6.0
peft 0.15.0
transformers 4.51.3
deepspeed 0.16.5

Information

The official example scripts

My own modified scripts
Tasks

An officially supported task in the examples folder

My own task or dataset (give details below)
Reproduction

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model
from accelerate import Accelerator
import torch
from torch.utils.data import Dataset, DataLoader


class DummyDataset(Dataset):
def __init__(self, tokenizer, dummy_text="Hello, world!", num_samples=100):
self.tokenizer = tokenizer
self.dummy_text = dummy_text
self.num_samples = num_samples

def __len__(self):
return self.num_samples

def __getitem__(self, idx):
encoded = self.tokenizer(self.dummy_text, return_tensors="pt")
item = {key: val.squeeze(0) for key, val in encoded.items()}
return item


accelerator = Accelerator()

model_name = "/home/clouduser/jxk/Qwen2.5-1.5B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

lora_config = LoraConfig(r=8, lora_alpha=32, lora_dropout=0.1, bias="none", task_type="CAUSAL_LM")
model = get_peft_model(model, lora_config)

optimizer = torch.optim.Adam(model.parameters(), lr=5e-5)
dummy_dataset = DummyDataset(tokenizer, dummy_text="Hello, world!", num_samples=100)
dataloader = DataLoader(dummy_dataset, batch_size=4, shuffle=True)


print("++++" * 100)
policy_state_dict = model.state_dict()
for key, value in policy_state_dict.items():
if "lora_A" in key or "lora_B" in key:
print(f"{key}: {value.shape}")
print("++++" * 100)
print("====" * 100)
print("====" * 100)
print("====" * 100)
print("====" * 100)

model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader)


print("++++" * 100)
policy_state_dict = model.state_dict()
for key, value in policy_state_dict.items():
if "lora_A" in key or "lora_B" in key:
print(f"{key}: {value.shape}")
print("++++" * 100)

The printed results (lora weight) are:

Before using zero 3:

base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight: torch.Size([8, 1536]) base_model.model.model.layers.20.self_attn.q_proj.lora_B.def ault.weight: torch.Size([1536, 8]) base_model.model.model.layers.20.self_attn.v_proj.lora_A.def ault.weight: torch.Size([8, 1536]) base_model.model.model.layers.20.self_attn.v_proj.lora_B.def ault.weight: torch.Size([256, 8]) base_model.model.model.layers.21.self_attn.q_proj.lora_A.def ault.weight: torch.Size([8, 1536]) base_model.model.model.layers.21.self_attn.q_proj.lora_B.def ault.weight: torch.Size([1536, 8]) base_model.model.model.layers.21.self_attn.v_proj.lora_A.def ault.weight: torch.Size([8, 1536])

After using zero 3:

module.base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight: torch.Size([0]) module.base_model.model.model.layers.21.self_attn.q_proj.lor a_B.default.weight: torch.Size([0]) module.base_model.model.model.layers.21.self_attn.v_proj.lor a_A.default.weight: torch.Size([0]) module.base_model.model.model.layers.21.self_attn.v_proj.lor a_B.default.weight: torch.Size([0]) module.base_model.model.model.layers.22.self_attn.q_proj.lor a_A.default.weight: torch.Size([0]) module.base_model.model.model.layers.22.self_attn.q_proj.lor a_B.default.weight: torch.Size([0]) module.base_model.model.model.layers.22.self_attn.v_proj.lor a_A.default.weight: torch.Size([0]) module.base_model.model.model.layers.22.self_attn.v_proj.lor a_B.default.weight: torch.Size([0]) module.base_model.model.model.layers.23.self_attn.q_proj.lor a_A.default.weight: torch.Size([0])

This is my zero-stage config file:

compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
deepspeed_multinode_launcher: standard
offload_optimizer_device: none
offload_param_device: none
zero3_init_flag: true
zero3_save_16bit_model: true
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

My model is :

name: "Qwen/Qwen/Qwen1.5-0.5B-Chat"
# name: "Qwen/Qwen2.5-7B-Instruct"
# name: "Qwen/Qwen2.5-32B-Instruct"
# name: "Qwen/Qwen2.5-14B-Instruct"
# name: "internlm/internlm2_5-1_8b"
# name: "meta-llama/Llama-3.1-8B-Instruct"

This is my lora config:

lora_config:
r: 8
lora_alpha: 32
target_modules:
- "q_proj" # qwen
- "v_proj" # qwen
lora_dropout: 0.1
bias: "none"
task_type: "CAUSAL_LM"

Expected behavior

After using Deepspeed's lora+zero3, I found that the weight of lora changed to [0]; If I use zero2 without encountering such problems, can you help me?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions