金融研报指标提取 LoRA (Qwen3-14B)

从金融研报段落中提取被深度分析的核心指标，输出结构化 JSON。

用法

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch, json

model_id = "Qwen/Qwen3-14B"
lora_id = "lmxxf/financial-report-lora-qwen3-14b"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=BitsAndBytesConfig(load_in_8bit=True),
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, lora_id)
model.eval()

system = "你是一位金融研报指标提取专家。根据给定的章节标题和段落内容，提取被深度分析的核心指标，输出 JSON。先在 analysis 中分析段落结构，再给出 metrics 列表。指标数量不固定，根据段落实际内容决定。"
user = "【章节标题】盈利能力分析\n【段落内容】公司毛利率同比提升2.3个百分点至35.8%，受益于产品结构优化和原材料成本下降。"

prompt = f"<|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\n{user}<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1, do_sample=True)

print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

输出格式

{
  "analysis": "段落分析思路...",
  "metrics": [
    {"metric_name": "毛利率", "metric_type": "financial", "score": 0.93, "reason": "被深度分析的原因"}
  ]
}

训练细节

参数	值
基座模型	Qwen3-14B (8bit QLoRA)
训练数据	460 条（4 类样本）
LoRA r / alpha	16 / 32
LoRA target	q/k/v/o/gate/up/down_proj
Trainable params	64M / 14.8B (0.43%)
Epochs	5
Batch size	2 × 8 = 16
Learning rate	2e-4 (cosine)
混合精度	bf16
Optimizer	paged_adamw_8bit
硬件	NVIDIA DGX Spark (Blackwell)

训练代码和数据集：GitHub

Framework versions

PEFT 0.18.1
Transformers 4.x
TRL 0.x
bitsandbytes 0.x

Downloads last month: 9

Model tree for lmxxf/financial-report-lora-qwen3-14b

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Adapter

(132)

this model