| | + deepspeed |
| | [rank7]:[W528 20:36:03.452299243 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| | [rank5]:[W528 20:36:03.823556141 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| | [rank6]:[W528 20:36:03.831889186 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| | [rank2]:[W528 20:36:03.879296499 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| | [rank1]:[W528 20:36:03.879332575 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| | [rank3]:[W528 20:36:03.918079178 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| | [rank0]:[W528 20:36:03.981251053 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| | [rank4]:[W528 20:36:03.988719012 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. |
| | loading configuration file /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k/config.json |
| | loading configuration file /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k/config.json |
| | loading configuration file /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k/config.json |
| | loading configuration file /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k/config.json |
| | loading configuration file /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k/config.json |
| | loading configuration file /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k/config.json |
| | loading configuration file /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k/config.json |
| | loading configuration file /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k/config.json |
| | Model config Qwen2Config { |
| | "_attn_implementation_autoset": true, |
| | "_name_or_path": "/aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k", |
| | "architectures": [ |
| | "Qwen2ForCausalLM" |
| | ], |
| | "attention_dropout": 0.0, |
| | "bos_token_id": 128245, |
| | "eos_token_id": 151643, |
| | "hidden_act": "silu", |
| | "hidden_size": 4096, |
| | "initializer_range": 0.02, |
| | "intermediate_size": 11008, |
| | "max_position_embeddings": 32768, |
| | "max_window_layers": 28, |
| | "model_type": "qwen2", |
| | "num_attention_heads": 32, |
| | "num_hidden_layers": 32, |
| | "num_key_value_heads": 32, |
| | "pad_token_id": 151643, |
| | "rms_norm_eps": 1e-06, |
| | "rope_scaling": null, |
| | "rope_theta": 1000000.0, |
| | "sliding_window": 32768, |
| | "tie_word_embeddings": false, |
| | "torch_dtype": "bfloat16", |
| | "transformers_version": "4.49.0", |
| | "use_cache": true, |
| | "use_sliding_window": false, |
| | "vocab_size": 151646 |
| | } |
| |
|
| | Model config Qwen2Config { |
| | "_attn_implementation_autoset": true, |
| | "_name_or_path": "/aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k", |
| | "architectures": [ |
| | "Qwen2ForCausalLM" |
| | ], |
| | "attention_dropout": 0.0, |
| | "bos_token_id": 128245, |
| | "eos_token_id": 151643, |
| | "hidden_act": "silu", |
| | "hidden_size": 4096, |
| | "initializer_range": 0.02, |
| | "intermediate_size": 11008, |
| | "max_position_embeddings": 32768, |
| | "max_window_layers": 28, |
| | "model_type": "qwen2", |
| | "num_attention_heads": 32, |
| | "num_hidden_layers": 32, |
| | "num_key_value_heads": 32, |
| | "pad_token_id": 151643, |
| | "rms_norm_eps": 1e-06, |
| | "rope_scaling": null, |
| | "rope_theta": 1000000.0, |
| | "sliding_window": 32768, |
| | "tie_word_embeddings": false, |
| | "torch_dtype": "bfloat16", |
| | "transformers_version": "4.49.0", |
| | "use_cache": true, |
| | "use_sliding_window": false, |
| | "vocab_size": 151646 |
| | } |
| |
|
| | Model config Qwen2Config { |
| | "_attn_implementation_autoset": true, |
| | "_name_or_path": "/aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k", |
| | "architectures": [ |
| | "Qwen2ForCausalLM" |
| | ], |
| | "attention_dropout": 0.0, |
| | "bos_token_id": 128245, |
| | "eos_token_id": 151643, |
| | "hidden_act": "silu", |
| | "hidden_size": 4096, |
| | "initializer_range": 0.02, |
| | "intermediate_size": 11008, |
| | "max_position_embeddings": 32768, |
| | "max_window_layers": 28, |
| | "model_type": "qwen2", |
| | "num_attention_heads": 32, |
| | "num_hidden_layers": 32, |
| | "num_key_value_heads": 32, |
| | "pad_token_id": 151643, |
| | "rms_norm_eps": 1e-06, |
| | "rope_scaling": null, |
| | "rope_theta": 1000000.0, |
| | "sliding_window": 32768, |
| | "tie_word_embeddings": false, |
| | "torch_dtype": "bfloat16", |
| | "transformers_version": "4.49.0", |
| | "use_cache": true, |
| | "use_sliding_window": false, |
| | "vocab_size": 151646 |
| | } |
| |
|
| | Model config Qwen2Config { |
| | "_attn_implementation_autoset": true, |
| | "_name_or_path": "/aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k", |
| | "architectures": [ |
| | "Qwen2ForCausalLM" |
| | ], |
| | "attention_dropout": 0.0, |
| | "bos_token_id": 128245, |
| | "eos_token_id": 151643, |
| | "hidden_act": "silu", |
| | "hidden_size": 4096, |
| | "initializer_range": 0.02, |
| | "intermediate_size": 11008, |
| | "max_position_embeddings": 32768, |
| | "max_window_layers": 28, |
| | "model_type": "qwen2", |
| | "num_attention_heads": 32, |
| | "num_hidden_layers": 32, |
| | "num_key_value_heads": 32, |
| | "pad_token_id": 151643, |
| | "rms_norm_eps": 1e-06, |
| | "rope_scaling": null, |
| | "rope_theta": 1000000.0, |
| | "sliding_window": 32768, |
| | "tie_word_embeddings": false, |
| | "torch_dtype": "bfloat16", |
| | "transformers_version": "4.49.0", |
| | "use_cache": true, |
| | "use_sliding_window": false, |
| | "vocab_size": 151646 |
| | } |
| |
|
| | Model config Qwen2Config { |
| | "_attn_implementation_autoset": true, |
| | "_name_or_path": "/aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k", |
| | "architectures": [ |
| | "Qwen2ForCausalLM" |
| | ], |
| | "attention_dropout": 0.0, |
| | "bos_token_id": 128245, |
| | "eos_token_id": 151643, |
| | "hidden_act": "silu", |
| | "hidden_size": 4096, |
| | "initializer_range": 0.02, |
| | "intermediate_size": 11008, |
| | "max_position_embeddings": 32768, |
| | "max_window_layers": 28, |
| | "model_type": "qwen2", |
| | "num_attention_heads": 32, |
| | "num_hidden_layers": 32, |
| | "num_key_value_heads": 32, |
| | "pad_token_id": 151643, |
| | "rms_norm_eps": 1e-06, |
| | "rope_scaling": null, |
| | "rope_theta": 1000000.0, |
| | "sliding_window": 32768, |
| | "tie_word_embeddings": false, |
| | "torch_dtype": "bfloat16", |
| | "transformers_version": "4.49.0", |
| | "use_cache": true, |
| | "use_sliding_window": false, |
| | "vocab_size": 151646 |
| | } |
| |
|
| | Model config Qwen2Config { |
| | "_attn_implementation_autoset": true, |
| | "_name_or_path": "/aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k", |
| | "architectures": [ |
| | "Qwen2ForCausalLM" |
| | ], |
| | "attention_dropout": 0.0, |
| | "bos_token_id": 128245, |
| | "eos_token_id": 151643, |
| | "hidden_act": "silu", |
| | "hidden_size": 4096, |
| | "initializer_range": 0.02, |
| | "intermediate_size": 11008, |
| | "max_position_embeddings": 32768, |
| | "max_window_layers": 28, |
| | "model_type": "qwen2", |
| | "num_attention_heads": 32, |
| | "num_hidden_layers": 32, |
| | "num_key_value_heads": 32, |
| | "pad_token_id": 151643, |
| | "rms_norm_eps": 1e-06, |
| | "rope_scaling": null, |
| | "rope_theta": 1000000.0, |
| | "sliding_window": 32768, |
| | "tie_word_embeddings": false, |
| | "torch_dtype": "bfloat16", |
| | "transformers_version": "4.49.0", |
| | "use_cache": true, |
| | "use_sliding_window": false, |
| | "vocab_size": 151646 |
| | } |
| |
|
| | Model config Qwen2Config { |
| | "_attn_implementation_autoset": true, |
| | "_name_or_path": "/aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k", |
| | "architectures": [ |
| | "Qwen2ForCausalLM" |
| | ], |
| | "attention_dropout": 0.0, |
| | "bos_token_id": 128245, |
| | "eos_token_id": 151643, |
| | "hidden_act": "silu", |
| | "hidden_size": 4096, |
| | "initializer_range": 0.02, |
| | "intermediate_size": 11008, |
| | "max_position_embeddings": 32768, |
| | "max_window_layers": 28, |
| | "model_type": "qwen2", |
| | "num_attention_heads": 32, |
| | "num_hidden_layers": 32, |
| | "num_key_value_heads": 32, |
| | "pad_token_id": 151643, |
| | "rms_norm_eps": 1e-06, |
| | "rope_scaling": null, |
| | "rope_theta": 1000000.0, |
| | "sliding_window": 32768, |
| | "tie_word_embeddings": false, |
| | "torch_dtype": "bfloat16", |
| | "transformers_version": "4.49.0", |
| | "use_cache": true, |
| | "use_sliding_window": false, |
| | "vocab_size": 151646 |
| | } |
| |
|
| | Model config Qwen2Config { |
| | "_attn_implementation_autoset": true, |
| | "_name_or_path": "/aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k", |
| | "architectures": [ |
| | "Qwen2ForCausalLM" |
| | ], |
| | "attention_dropout": 0.0, |
| | "bos_token_id": 128245, |
| | "eos_token_id": 151643, |
| | "hidden_act": "silu", |
| | "hidden_size": 4096, |
| | "initializer_range": 0.02, |
| | "intermediate_size": 11008, |
| | "max_position_embeddings": 32768, |
| | "max_window_layers": 28, |
| | "model_type": "qwen2", |
| | "num_attention_heads": 32, |
| | "num_hidden_layers": 32, |
| | "num_key_value_heads": 32, |
| | "pad_token_id": 151643, |
| | "rms_norm_eps": 1e-06, |
| | "rope_scaling": null, |
| | "rope_theta": 1000000.0, |
| | "sliding_window": 32768, |
| | "tie_word_embeddings": false, |
| | "torch_dtype": "bfloat16", |
| | "transformers_version": "4.49.0", |
| | "use_cache": true, |
| | "use_sliding_window": false, |
| | "vocab_size": 151646 |
| | } |
| |
|
| | loading weights file /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k/pytorch_model.bin |
| | loading weights file /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k/pytorch_model.bin |
| | loading weights file /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k/pytorch_model.bin |
| | loading weights file /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k/pytorch_model.bin |
| | loading weights file /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k/pytorch_model.bin |
| | loading weights file /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k/pytorch_model.bin |
| | loading weights file /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k/pytorch_model.bin |
| | loading weights file /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k/pytorch_model.bin |
| | Will use torch_dtype=torch.bfloat16 as defined in model's config object |
| | Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
| | Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| | Generate config GenerationConfig { |
| | "bos_token_id": 128245, |
| | "eos_token_id": 151643, |
| | "pad_token_id": 151643 |
| | } |
| | |
| | Sliding Window Attention is enabled but not implemented for `eager`; unexpected results may be encountered. |
| | Will use torch_dtype=torch.bfloat16 as defined in model's config object |
| | Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
| | Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| | Will use torch_dtype=torch.bfloat16 as defined in model's config object |
| | Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
| | Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| | Will use torch_dtype=torch.bfloat16 as defined in model's config object |
| | Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
| | Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| | Will use torch_dtype=torch.bfloat16 as defined in model's config object |
| | Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
| | Will use torch_dtype=torch.bfloat16 as defined in model's config object |
| | Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
| | Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| | Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| | Generate config GenerationConfig { |
| | "bos_token_id": 128245, |
| | "eos_token_id": 151643, |
| | "pad_token_id": 151643 |
| | } |
| |
|
| | Generate config GenerationConfig { |
| | "bos_token_id": 128245, |
| | "eos_token_id": 151643, |
| | "pad_token_id": 151643 |
| | } |
| |
|
| | Generate config GenerationConfig { |
| | "bos_token_id": 128245, |
| | "eos_token_id": 151643, |
| | "pad_token_id": 151643 |
| | } |
| |
|
| | Generate config GenerationConfig { |
| | "bos_token_id": 128245, |
| | "eos_token_id": 151643, |
| | "pad_token_id": 151643 |
| | } |
| |
|
| | Generate config GenerationConfig { |
| | "bos_token_id": 128245, |
| | "eos_token_id": 151643, |
| | "pad_token_id": 151643 |
| | } |
| |
|
| | Sliding Window Attention is enabled but not implemented for `eager`; unexpected results may be encountered. |
| | Sliding Window Attention is enabled but not implemented for `eager`; unexpected results may be encountered. |
| | Sliding Window Attention is enabled but not implemented for `eager`; unexpected results may be encountered. |
| | Sliding Window Attention is enabled but not implemented for `eager`; unexpected results may be encountered. |
| | Sliding Window Attention is enabled but not implemented for `eager`; unexpected results may be encountered. |
| | Will use torch_dtype=torch.bfloat16 as defined in model's config object |
| | Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
| | Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| | Will use torch_dtype=torch.bfloat16 as defined in model's config object |
| | Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. |
| | Detected DeepSpeed ZeRO-3: activating zero.init() for this model |
| | Generate config GenerationConfig { |
| | "bos_token_id": 128245, |
| | "eos_token_id": 151643, |
| | "pad_token_id": 151643 |
| | } |
| |
|
| | Generate config GenerationConfig { |
| | "bos_token_id": 128245, |
| | "eos_token_id": 151643, |
| | "pad_token_id": 151643 |
| | } |
| |
|
| | Sliding Window Attention is enabled but not implemented for `eager`; unexpected results may be encountered. |
| | Sliding Window Attention is enabled but not implemented for `eager`; unexpected results may be encountered. |
| | All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
| |
|
| | All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k. |
| | If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
| | All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
| |
|
| | All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k. |
| | If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
| | All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
| |
|
| | All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k. |
| | If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
| | All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
| |
|
| | All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
| |
|
| | All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k. |
| | If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
| | All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k. |
| | If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
| | All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
| |
|
| | All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
| |
|
| | All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k. |
| | If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
| | All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k. |
| | If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
| | Generation config file not found, using a generation config created from the model config. |
| | Generation config file not found, using a generation config created from the model config. |
| | Generation config file not found, using a generation config created from the model config. |
| | Generation config file not found, using a generation config created from the model config. |
| | Generation config file not found, using a generation config created from the model config. |
| | Generation config file not found, using a generation config created from the model config. |
| | Generation config file not found, using a generation config created from the model config. |
| | loading file vocab.json |
| | loading file merges.txt |
| | loading file tokenizer.json |
| | loading file added_tokens.json |
| | loading file special_tokens_map.json |
| | loading file tokenizer_config.json |
| | loading file chat_template.jinja |
| | loading file vocab.json |
| | loading file vocab.json |
| | loading file merges.txt |
| | loading file merges.txt |
| | loading file tokenizer.json |
| | loading file vocab.json |
| | loading file tokenizer.json |
| | loading file added_tokens.json |
| | loading file added_tokens.json |
| | loading file special_tokens_map.json |
| | loading file special_tokens_map.json |
| | loading file tokenizer_config.json |
| | loading file chat_template.jinja |
| | loading file merges.txt |
| | loading file tokenizer_config.json |
| | loading file chat_template.jinja |
| | loading file tokenizer.json |
| | loading file added_tokens.json |
| | loading file special_tokens_map.json |
| | loading file tokenizer_config.json |
| | loading file chat_template.jinja |
| | loading file vocab.json |
| | loading file merges.txt |
| | loading file tokenizer.json |
| | loading file added_tokens.json |
| | loading file special_tokens_map.json |
| | loading file tokenizer_config.json |
| | loading file chat_template.jinja |
| | loading file vocab.json |
| | loading file merges.txt |
| | loading file tokenizer.json |
| | loading file added_tokens.json |
| | loading file special_tokens_map.json |
| | loading file tokenizer_config.json |
| | loading file chat_template.jinja |
| | loading file vocab.json |
| | loading file merges.txt |
| | loading file tokenizer.json |
| | loading file added_tokens.json |
| | loading file special_tokens_map.json |
| | loading file tokenizer_config.json |
| | loading file chat_template.jinja |
| | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
| | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
| | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
| | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
| | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
| | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
| | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
| | All model checkpoint weights were used when initializing Qwen2ForCausalLM. |
| |
|
| | All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k. |
| | If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. |
| | Generation config file not found, using a generation config created from the model config. |
| | loading file vocab.json |
| | loading file merges.txt |
| | loading file tokenizer.json |
| | loading file added_tokens.json |
| | loading file special_tokens_map.json |
| | loading file tokenizer_config.json |
| | loading file chat_template.jinja |
| | Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
| | Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| | Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| | Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| | Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| | Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| | Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| | Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| | Using /home/hansirui_1st/.cache/torch_extensions/py311_cu124 as PyTorch extensions root... |
| | Detected CUDA files, patching ldflags |
| | Emitting ninja build file /home/hansirui_1st/.cache/torch_extensions/py311_cu124/fused_adam/build.ninja... |
| | /aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/torch/utils/cpp_extension.py:2059: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. |
| | If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. |
| | warnings.warn( |
| | Building extension module fused_adam... |
| | Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) |
| | Loading extension module fused_adam... |
| | Loading extension module fused_adam... |
| | Loading extension module fused_adam... |
| | Loading extension module fused_adam... |
| | Loading extension module fused_adam... |
| | Loading extension module fused_adam... |
| | Loading extension module fused_adam... |
| | Loading extension module fused_adam... |
| | wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information. |
| | `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| | `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| | `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| | `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| | `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| | `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| | `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| | wandb: Currently logged in as: xtom to https://api.wandb.ai. Use `wandb login |
| | wandb: Tracking run with wandb version 0.19.8 |
| | wandb: Run data is saved locally in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k-Q2-2k/wandb/run-20250528_203628-hrtj7fnx |
| | wandb: Run `wandb offline` to turn off syncing. |
| | wandb: Syncing run qwen-7b-s3-Q1-40k-Q2-2k |
| | wandb: βοΈ View project at https://wandb.ai/xtom/Inverse_Alignment |
| | wandb: π View run at https://wandb.ai/xtom/Inverse_Alignment/runs/hrtj7fnx |
| |
Training 1/1 epoch: 0%| | 0/63 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`. |
| |
Training 1/1 epoch (loss 1.6059): 0%| | 0/63 [00:05<?, ?it/s]
Training 1/1 epoch (loss 1.6059): 2%|β | 1/63 [00:05<05:53, 5.70s/it]
Training 1/1 epoch (loss 1.6308): 2%|β | 1/63 [00:06<05:53, 5.70s/it]
Training 1/1 epoch (loss 1.6308): 3%|β | 2/63 [00:06<02:58, 2.93s/it]
Training 1/1 epoch (loss 1.6666): 3%|β | 2/63 [00:07<02:58, 2.93s/it]
Training 1/1 epoch (loss 1.6666): 5%|β | 3/63 [00:07<01:50, 1.85s/it]
Training 1/1 epoch (loss 1.5329): 5%|β | 3/63 [00:07<01:50, 1.85s/it]
Training 1/1 epoch (loss 1.5329): 6%|β | 4/63 [00:07<01:16, 1.30s/it]
Training 1/1 epoch (loss 1.5457): 6%|β | 4/63 [00:08<01:16, 1.30s/it]
Training 1/1 epoch (loss 1.5457): 8%|β | 5/63 [00:08<00:57, 1.01it/s]
Training 1/1 epoch (loss 1.6443): 8%|β | 5/63 [00:08<00:57, 1.01it/s]
Training 1/1 epoch (loss 1.6443): 10%|β | 6/63 [00:08<00:46, 1.24it/s]
Training 1/1 epoch (loss 1.6131): 10%|β | 6/63 [00:09<00:46, 1.24it/s]
Training 1/1 epoch (loss 1.6131): 11%|β | 7/63 [00:09<00:38, 1.45it/s]
Training 1/1 epoch (loss 1.6665): 11%|β | 7/63 [00:09<00:38, 1.45it/s]
Training 1/1 epoch (loss 1.6665): 13%|ββ | 8/63 [00:09<00:36, 1.52it/s]
Training 1/1 epoch (loss 1.6096): 13%|ββ | 8/63 [00:10<00:36, 1.52it/s]
Training 1/1 epoch (loss 1.6096): 14%|ββ | 9/63 [00:10<00:32, 1.68it/s]
Training 1/1 epoch (loss 1.5994): 14%|ββ | 9/63 [00:10<00:32, 1.68it/s]
Training 1/1 epoch (loss 1.5994): 16%|ββ | 10/63 [00:10<00:29, 1.82it/s]
Training 1/1 epoch (loss 1.6340): 16%|ββ | 10/63 [00:11<00:29, 1.82it/s]
Training 1/1 epoch (loss 1.6340): 17%|ββ | 11/63 [00:11<00:27, 1.91it/s]
Training 1/1 epoch (loss 1.6695): 17%|ββ | 11/63 [00:11<00:27, 1.91it/s]
Training 1/1 epoch (loss 1.6695): 19%|ββ | 12/63 [00:11<00:25, 1.98it/s]
Training 1/1 epoch (loss 1.5639): 19%|ββ | 12/63 [00:11<00:25, 1.98it/s]
Training 1/1 epoch (loss 1.5639): 21%|ββ | 13/63 [00:11<00:24, 2.05it/s]
Training 1/1 epoch (loss 1.5918): 21%|ββ | 13/63 [00:12<00:24, 2.05it/s]
Training 1/1 epoch (loss 1.5918): 22%|βββ | 14/63 [00:12<00:23, 2.10it/s]
Training 1/1 epoch (loss 1.6043): 22%|βββ | 14/63 [00:12<00:23, 2.10it/s]
Training 1/1 epoch (loss 1.6043): 24%|βββ | 15/63 [00:12<00:22, 2.14it/s]
Training 1/1 epoch (loss 1.6275): 24%|βββ | 15/63 [00:13<00:22, 2.14it/s]
Training 1/1 epoch (loss 1.6275): 25%|βββ | 16/63 [00:13<00:22, 2.09it/s]
Training 1/1 epoch (loss 1.7586): 25%|βββ | 16/63 [00:13<00:22, 2.09it/s]
Training 1/1 epoch (loss 1.7586): 27%|βββ | 17/63 [00:13<00:21, 2.14it/s]
Training 1/1 epoch (loss 1.6914): 27%|βββ | 17/63 [00:14<00:21, 2.14it/s]
Training 1/1 epoch (loss 1.6914): 29%|βββ | 18/63 [00:14<00:20, 2.16it/s]
Training 1/1 epoch (loss 1.7466): 29%|βββ | 18/63 [00:14<00:20, 2.16it/s]
Training 1/1 epoch (loss 1.7466): 30%|βββ | 19/63 [00:14<00:19, 2.20it/s]
Training 1/1 epoch (loss 1.7239): 30%|βββ | 19/63 [00:15<00:19, 2.20it/s]
Training 1/1 epoch (loss 1.7239): 32%|ββββ | 20/63 [00:15<00:19, 2.22it/s]
Training 1/1 epoch (loss 1.6923): 32%|ββββ | 20/63 [00:15<00:19, 2.22it/s]
Training 1/1 epoch (loss 1.6923): 33%|ββββ | 21/63 [00:15<00:18, 2.22it/s]
Training 1/1 epoch (loss 1.6782): 33%|ββββ | 21/63 [00:15<00:18, 2.22it/s]
Training 1/1 epoch (loss 1.6782): 35%|ββββ | 22/63 [00:15<00:18, 2.22it/s]
Training 1/1 epoch (loss 1.6388): 35%|ββββ | 22/63 [00:16<00:18, 2.22it/s]
Training 1/1 epoch (loss 1.6388): 37%|ββββ | 23/63 [00:16<00:17, 2.23it/s]
Training 1/1 epoch (loss 1.6102): 37%|ββββ | 23/63 [00:16<00:17, 2.23it/s]
Training 1/1 epoch (loss 1.6102): 38%|ββββ | 24/63 [00:16<00:18, 2.15it/s]
Training 1/1 epoch (loss 1.5856): 38%|ββββ | 24/63 [00:17<00:18, 2.15it/s]
Training 1/1 epoch (loss 1.5856): 40%|ββββ | 25/63 [00:17<00:17, 2.19it/s]
Training 1/1 epoch (loss 1.5694): 40%|ββββ | 25/63 [00:17<00:17, 2.19it/s]
Training 1/1 epoch (loss 1.5694): 41%|βββββ | 26/63 [00:17<00:16, 2.21it/s]
Training 1/1 epoch (loss 1.6984): 41%|βββββ | 26/63 [00:18<00:16, 2.21it/s]
Training 1/1 epoch (loss 1.6984): 43%|βββββ | 27/63 [00:18<00:16, 2.21it/s]
Training 1/1 epoch (loss 1.6798): 43%|βββββ | 27/63 [00:18<00:16, 2.21it/s]
Training 1/1 epoch (loss 1.6798): 44%|βββββ | 28/63 [00:18<00:16, 2.15it/s]
Training 1/1 epoch (loss 1.6722): 44%|βββββ | 28/63 [00:19<00:16, 2.15it/s]
Training 1/1 epoch (loss 1.6722): 46%|βββββ | 29/63 [00:19<00:15, 2.18it/s]
Training 1/1 epoch (loss 1.5383): 46%|βββββ | 29/63 [00:19<00:15, 2.18it/s]
Training 1/1 epoch (loss 1.5383): 48%|βββββ | 30/63 [00:19<00:15, 2.18it/s]
Training 1/1 epoch (loss 1.6506): 48%|βββββ | 30/63 [00:20<00:15, 2.18it/s]
Training 1/1 epoch (loss 1.6506): 49%|βββββ | 31/63 [00:20<00:14, 2.20it/s]
Training 1/1 epoch (loss 1.7083): 49%|βββββ | 31/63 [00:20<00:14, 2.20it/s]
Training 1/1 epoch (loss 1.7083): 51%|βββββ | 32/63 [00:20<00:14, 2.12it/s]
Training 1/1 epoch (loss 1.6033): 51%|βββββ | 32/63 [00:21<00:14, 2.12it/s]
Training 1/1 epoch (loss 1.6033): 52%|ββββββ | 33/63 [00:21<00:14, 2.14it/s]
Training 1/1 epoch (loss 1.5984): 52%|ββββββ | 33/63 [00:21<00:14, 2.14it/s]
Training 1/1 epoch (loss 1.5984): 54%|ββββββ | 34/63 [00:21<00:13, 2.17it/s]
Training 1/1 epoch (loss 1.5998): 54%|ββββββ | 34/63 [00:21<00:13, 2.17it/s]
Training 1/1 epoch (loss 1.5998): 56%|ββββββ | 35/63 [00:21<00:12, 2.20it/s]
Training 1/1 epoch (loss 1.6650): 56%|ββββββ | 35/63 [00:22<00:12, 2.20it/s]
Training 1/1 epoch (loss 1.6650): 57%|ββββββ | 36/63 [00:22<00:12, 2.22it/s]
Training 1/1 epoch (loss 1.5882): 57%|ββββββ | 36/63 [00:22<00:12, 2.22it/s]
Training 1/1 epoch (loss 1.5882): 59%|ββββββ | 37/63 [00:22<00:11, 2.25it/s]
Training 1/1 epoch (loss 1.5717): 59%|ββββββ | 37/63 [00:23<00:11, 2.25it/s]
Training 1/1 epoch (loss 1.5717): 60%|ββββββ | 38/63 [00:23<00:11, 2.23it/s]
Training 1/1 epoch (loss 1.6135): 60%|ββββββ | 38/63 [00:23<00:11, 2.23it/s]
Training 1/1 epoch (loss 1.6135): 62%|βββββββ | 39/63 [00:23<00:10, 2.24it/s]
Training 1/1 epoch (loss 1.6085): 62%|βββββββ | 39/63 [00:24<00:10, 2.24it/s]
Training 1/1 epoch (loss 1.6085): 63%|βββββββ | 40/63 [00:24<00:10, 2.15it/s]
Training 1/1 epoch (loss 1.5770): 63%|βββββββ | 40/63 [00:24<00:10, 2.15it/s]
Training 1/1 epoch (loss 1.5770): 65%|βββββββ | 41/63 [00:24<00:10, 2.17it/s]
Training 1/1 epoch (loss 1.5733): 65%|βββββββ | 41/63 [00:25<00:10, 2.17it/s]
Training 1/1 epoch (loss 1.5733): 67%|βββββββ | 42/63 [00:25<00:09, 2.22it/s]
Training 1/1 epoch (loss 1.6788): 67%|βββββββ | 42/63 [00:25<00:09, 2.22it/s]
Training 1/1 epoch (loss 1.6788): 68%|βββββββ | 43/63 [00:25<00:09, 2.22it/s]
Training 1/1 epoch (loss 1.5074): 68%|βββββββ | 43/63 [00:26<00:09, 2.22it/s]
Training 1/1 epoch (loss 1.5074): 70%|βββββββ | 44/63 [00:26<00:08, 2.24it/s]
Training 1/1 epoch (loss 1.6573): 70%|βββββββ | 44/63 [00:26<00:08, 2.24it/s]
Training 1/1 epoch (loss 1.6573): 71%|ββββββββ | 45/63 [00:26<00:08, 2.25it/s]
Training 1/1 epoch (loss 1.5843): 71%|ββββββββ | 45/63 [00:26<00:08, 2.25it/s]
Training 1/1 epoch (loss 1.5843): 73%|ββββββββ | 46/63 [00:26<00:07, 2.27it/s]
Training 1/1 epoch (loss 1.6471): 73%|ββββββββ | 46/63 [00:27<00:07, 2.27it/s]
Training 1/1 epoch (loss 1.6471): 75%|ββββββββ | 47/63 [00:27<00:06, 2.29it/s]
Training 1/1 epoch (loss 1.5904): 75%|ββββββββ | 47/63 [00:27<00:06, 2.29it/s]
Training 1/1 epoch (loss 1.5904): 76%|ββββββββ | 48/63 [00:27<00:06, 2.20it/s]
Training 1/1 epoch (loss 1.6856): 76%|ββββββββ | 48/63 [00:28<00:06, 2.20it/s]
Training 1/1 epoch (loss 1.6856): 78%|ββββββββ | 49/63 [00:28<00:06, 2.19it/s]
Training 1/1 epoch (loss 1.4872): 78%|ββββββββ | 49/63 [00:28<00:06, 2.19it/s]
Training 1/1 epoch (loss 1.4872): 79%|ββββββββ | 50/63 [00:28<00:05, 2.22it/s]
Training 1/1 epoch (loss 1.5536): 79%|ββββββββ | 50/63 [00:29<00:05, 2.22it/s]
Training 1/1 epoch (loss 1.5536): 81%|ββββββββ | 51/63 [00:29<00:05, 2.25it/s]
Training 1/1 epoch (loss 1.5252): 81%|ββββββββ | 51/63 [00:29<00:05, 2.25it/s]
Training 1/1 epoch (loss 1.5252): 83%|βββββββββ | 52/63 [00:29<00:04, 2.25it/s]
Training 1/1 epoch (loss 1.5549): 83%|βββββββββ | 52/63 [00:30<00:04, 2.25it/s]
Training 1/1 epoch (loss 1.5549): 84%|βββββββββ | 53/63 [00:30<00:04, 2.24it/s]
Training 1/1 epoch (loss 1.7051): 84%|βββββββββ | 53/63 [00:30<00:04, 2.24it/s]
Training 1/1 epoch (loss 1.7051): 86%|βββββββββ | 54/63 [00:30<00:04, 2.24it/s]
Training 1/1 epoch (loss 1.6551): 86%|βββββββββ | 54/63 [00:30<00:04, 2.24it/s]
Training 1/1 epoch (loss 1.6551): 87%|βββββββββ | 55/63 [00:30<00:03, 2.27it/s]
Training 1/1 epoch (loss 1.5699): 87%|βββββββββ | 55/63 [00:31<00:03, 2.27it/s]
Training 1/1 epoch (loss 1.5699): 89%|βββββββββ | 56/63 [00:31<00:03, 2.17it/s]
Training 1/1 epoch (loss 1.4918): 89%|βββββββββ | 56/63 [00:31<00:03, 2.17it/s]
Training 1/1 epoch (loss 1.4918): 90%|βββββββββ | 57/63 [00:31<00:02, 2.21it/s]
Training 1/1 epoch (loss 1.5779): 90%|βββββββββ | 57/63 [00:32<00:02, 2.21it/s]
Training 1/1 epoch (loss 1.5779): 92%|ββββββββββ| 58/63 [00:32<00:02, 2.23it/s]
Training 1/1 epoch (loss 1.6452): 92%|ββββββββββ| 58/63 [00:32<00:02, 2.23it/s]
Training 1/1 epoch (loss 1.6452): 94%|ββββββββββ| 59/63 [00:32<00:01, 2.17it/s]
Training 1/1 epoch (loss 1.6185): 94%|ββββββββββ| 59/63 [00:33<00:01, 2.17it/s]
Training 1/1 epoch (loss 1.6185): 95%|ββββββββββ| 60/63 [00:33<00:01, 2.21it/s]
Training 1/1 epoch (loss 1.5172): 95%|ββββββββββ| 60/63 [00:33<00:01, 2.21it/s]
Training 1/1 epoch (loss 1.5172): 97%|ββββββββββ| 61/63 [00:33<00:00, 2.23it/s]
Training 1/1 epoch (loss 1.6084): 97%|ββββββββββ| 61/63 [00:34<00:00, 2.23it/s]
Training 1/1 epoch (loss 1.6084): 98%|ββββββββββ| 62/63 [00:34<00:00, 2.25it/s]
Training 1/1 epoch (loss 1.5964): 98%|ββββββββββ| 62/63 [00:34<00:00, 2.25it/s]
Training 1/1 epoch (loss 1.5964): 100%|ββββββββββ| 63/63 [00:34<00:00, 2.24it/s]
Training 1/1 epoch (loss 1.5964): 100%|ββββββββββ| 63/63 [00:34<00:00, 1.82it/s] |
| | tokenizer config file saved in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k-Q2-2k/tokenizer_config.json |
| | Special tokens file saved in /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k-Q2-2k/special_tokens_map.json |
| | wandb: |
| | wandb: |
| | wandb: Run history: |
| | wandb: train/epoch ββββββββββββββββββββββ
β
β
β
β
β
βββββββββββββ |
| | wandb: train/loss ββ
βββββ
ββββββββ
βββββββββββββ
ββ
βββββββ
βββ |
| | wandb: train/lr ββββββββββββββββββββββββββββββββββββββββ |
| | wandb: train/step βββββββββββββββββββββ
β
β
β
β
βββββββββββββββ |
| | wandb: |
| | wandb: Run summary: |
| | wandb: train/epoch 1 |
| | wandb: train/loss 1.59638 |
| | wandb: train/lr 1e-05 |
| | wandb: train/step 63 |
| | wandb: |
| | wandb: π View run qwen-7b-s3-Q1-40k-Q2-2k at: https://wandb.ai/xtom/Inverse_Alignment/runs/hrtj7fnx |
| | wandb: βοΈ View project at: https://wandb.ai/xtom/Inverse_Alignment |
| | wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) |
| | wandb: Find logs at: /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-7B/Qwen1.5-7B-s3-Q1-40k-Q2-2k/wandb/run-20250528_203628-hrtj7fnx/logs |
| |
|