Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
FIX Failing target_parameters param usage count
For testing target_parameters, we use a tiny Llama4 model. This model
was refactored in
huggingface/transformers#39501, resulting in one
parameter being accessed an additional time:

https://github.com/huggingface/transformers/pull/39501/files#diff-e668ec07f78afdb2cb805d939e47453757f0b9437436cb860fcb7cb2431c9cf5R69

Therefore, a unit test that relied on how often this parameter was
accessed started failing. This PR updates the count to the correct
number.

Additionally debug print statements that were accidentally left over are
now removed.
  • Loading branch information
BenjaminBossan committed Jul 28, 2025
commit 27860c719779ea7f5944f7720673f9f4a280991d
6 changes: 3 additions & 3 deletions tests/test_target_parameters.py
Original file line number Diff line number Diff line change
Expand Up @@ -370,7 +370,9 @@ def mock_forward(self, W):
# Note: We call forward twice per step, once to create the parametrization and once for the actual forward
# step. This may be a bit wasteful but it's not clear how to prevent this and overall is probably negligible
num_forward_per_step = 2
expected_call_count = num_steps * num_layers * num_params * num_forward_per_step
# Since https://github.com/huggingface/transformers/pull/39501, one of the parameters is accessed twice per
# forward call, so add +1.
expected_call_count = num_steps * num_layers * (1 + num_params * num_forward_per_step)
assert actual_call_count == expected_call_count

actual_shapes = {W.shape for W in weights}
Expand All @@ -382,7 +384,6 @@ def mock_forward(self, W):
lora_weights_before = {
k: v.clone() for k, v in model.named_parameters() if "lora_A.default" in k or "lora_B.default" in k
}
print(lora_weights_before)
# sanity check:
assert len(lora_weights_before) == 2 * num_layers * num_params
# train
Expand All @@ -394,7 +395,6 @@ def mock_forward(self, W):
loss.backward()
optim.step()

print(lora_weights_before)
lora_weights_after = {
k: v for k, v in model.named_parameters() if "lora_A.default" in k or "lora_B.default" in k
}
Expand Down
Loading