A) README.md（LoRAアダプタ用：サンプル準拠）

# **qwen3-4b-structured-output-lora-phase2**

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit, bitsandbytes).  
This repository contains LoRA adapter weights (and tokenizer files) only. The base model must be loaded separately.

## **Training Objective**

This adapter is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV).  
Loss is applied only to the final assistant output (completion-only loss), while the prompt portion is masked (labels = -100).  
Additionally, intermediate reasoning markers such as `<think>...</think>` and code fences were removed during data preparation to prioritize parseable outputs.

## **Training Configuration**

* Base model: Qwen/Qwen3-4B-Instruct-2507  
* Method: QLoRA (4-bit, bitsandbytes) + PEFT LoRA  
* Training: Phase1 (mixed) → Phase2 (hard-only resume)  
* Epochs: 1 epoch (Phase1) + 1 epoch (Phase2)  
* Learning rate: 5e-05 (Phase2)  
* Packing: disabled (completion-only loss)  
* Framework: transformers + trl + peft  
* LoRA: (see adapter_config.json in this repo)

## **Usage**

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "your_id/your-repo"

tokenizer = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter)

Sources & License (IMPORTANT)

Training Data: daichira/structevalt-3k-mix-sft, u-10bei/structured_data_with_cot_dataset_512_v2 (and internal splits/mixes)
Dataset License: Please follow each dataset’s license and attribution requirements.
Compliance: Users must comply with both the datasets’ attribution requirements and the base model’s original terms of use.

＜日本語訳＞

qwen3-4b-structured-output-lora-phase2

このリポジトリは、Qwen/Qwen3-4B-Instruct-2507 をベースモデルとし、QLoRA (4-bit, bitsandbytes) を用いてファインチューニングされた LoRA アダプターを提供します。

【重要】本リポジトリには LoRA アダプターの重み（および tokenizer）のみが含まれています。ベースモデルは別途ロードする必要があります。

学習の目的

このアダプターは、構造化出力（JSON / YAML / XML / TOML / CSV）の精度向上を目的としてトレーニングされています。学習時の損失（Loss）は 最終アシスタント出力（completion）のみに適用され、プロンプト部分はマスク（labels=-100）されています。また、パース可能な出力を優先するため、データ整形時に <think>...</think> やコードフェンスを除去しています。

学習設定

ベースモデル: Qwen/Qwen3-4B-Instruct-2507
手法: QLoRA (4-bit, bitsandbytes) + PEFT LoRA
学習: Phase1（混合）→ Phase2（hard-onlyを継続学習）
エポック数: Phase1 1epoch + Phase2 1epoch
学習率: 5e-5（Phase2）
Packing: 無効（completion-only lossのため）
フレームワーク: transformers + trl + peft
LoRA パラメータ: 本リポジトリの adapter_config.json を参照

使い方

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "your_id/your-repo"

tokenizer = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter)

ソースおよびライセンス（重要）

学習データ: daichira/structevalt-3k-mix-sft, u-10bei/structured_data_with_cot_dataset_512_v2（および内部での分割/混合）
データセットライセンス: 各データセットのライセンスおよび帰属要件に従ってください。
遵守事項: 利用者は、データセットの帰属表記（クレジット）要件、およびベースモデルの元の利用規約の両方を遵守する必要があります。


---

# B) README.md（マージ済みモデル用：提出URL向け・サンプル準拠）

```md
# **qwen3-4b-structured-output-phase2-merged**

This repository provides a merged FP16 model built from Qwen/Qwen3-4B-Instruct-2507 and a Phase2 LoRA adapter.  
This repository contains the full merged model weights, so it can be loaded directly with `from_pretrained()`.

## **Training Objective**

This model is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV).  
Loss is applied only to the final assistant output (completion-only loss), while the prompt portion is masked (labels = -100).  
Data preparation removes `<think>...</think>` and code fences to prioritize strict, parseable outputs.

## **Training Configuration**

* Base model: Qwen/Qwen3-4B-Instruct-2507  
* Method: QLoRA SFT (4-bit) → merged into FP16 weights for inference  
* Training: Phase1 (mixed) → Phase2 (hard-only resume)  
* Epochs: 1 epoch (Phase1) + 1 epoch (Phase2)  
* Learning rate: 5e-05 (Phase2)  
* Framework: transformers + trl + peft  
* Output: merged FP16 model (safe_serialization)

## **Usage**

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "your_id/your-repo"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True,
)

Sources & License (IMPORTANT)

Training Data: daichira/structevalt-3k-mix-sft, u-10bei/structured_data_with_cot_dataset_512_v2 (and internal splits/mixes)
Dataset License: Please follow each dataset’s license and attribution requirements.
Compliance: Users must comply with both the datasets’ attribution requirements and the base model’s original terms of use.

＜日本語訳＞

qwen3-4b-structured-output-phase2-merged

このリポジトリは、Qwen/Qwen3-4B-Instruct-2507 と Phase2 の LoRA アダプターから作成した マージ済み FP16 モデルを提供します。本リポジトリには マージ済みのフルモデル重みが含まれているため、from_pretrained() で直接ロードできます。

学習の目的

このモデルは、構造化出力（JSON / YAML / XML / TOML / CSV）の精度向上を目的としてトレーニングされています。学習時の損失（Loss）は 最終アシスタント出力（completion）のみに適用され、プロンプト部分はマスク（labels=-100）されています。また、厳密にパース可能な出力を優先するため、データ整形時に <think>...</think> やコードフェンスを除去しています。

学習設定

ベースモデル: Qwen/Qwen3-4B-Instruct-2507
手法: QLoRA SFT (4-bit) → 推論用に FP16 へマージ
学習: Phase1（混合）→ Phase2（hard-onlyを継続学習）
エポック数: Phase1 1epoch + Phase2 1epoch
学習率: 5e-5（Phase2）
フレームワーク: transformers + trl + peft
出力: マージ済み FP16（safe_serialization）

使い方

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "your_id/your-repo"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True,
)

ソースおよびライセンス（重要）

学習データ: daichira/structevalt-3k-mix-sft, u-10bei/structured_data_with_cot_dataset_512_v2（および内部での分割/混合）
データセットライセンス: 各データセットのライセンスおよび帰属要件に従ってください。
遵守事項: 利用者は、データセットの帰属表記（クレジット）要件、およびベースモデルの元の利用規約の両方を遵守する必要があります。


---

## どっちを使うべき？
- **提出URL用（推奨）**：B（merged）  
- **再現・継続学習用**：A（LoRA）

---

必要なら、あなたの実際の学習ログに合わせて README の数値をさらに厳密化できます（例：Phase1/2 の lr、bs、ga、max_len、データ件数を明記）。ただ、今の版でもサンプルの粒度と整合しています。
::contentReference[oaicite:0]{index=0}

Downloads last month: 17

Safetensors

Model size

4B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for daisuke-hoshina/qwen3-4b-structured-output-phase2-merged

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1030)

this model