金融研报指标提取 LoRA (Qwen3-14B)

从金融研报段落中提取被深度分析的核心指标,输出结构化 JSON。

用法

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch, json

model_id = "Qwen/Qwen3-14B"
lora_id = "lmxxf/financial-report-lora-qwen3-14b"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=BitsAndBytesConfig(load_in_8bit=True),
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, lora_id)
model.eval()

system = "你是一位金融研报指标提取专家。根据给定的章节标题和段落内容,提取被深度分析的核心指标,输出 JSON。先在 analysis 中分析段落结构,再给出 metrics 列表。指标数量不固定,根据段落实际内容决定。"
user = "【章节标题】盈利能力分析\n【段落内容】公司毛利率同比提升2.3个百分点至35.8%,受益于产品结构优化和原材料成本下降。"

prompt = f"<|im_start|>system\n{system}<|im_end|>\n<|im_start|>user\n{user}<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1, do_sample=True)

print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

输出格式

{
  "analysis": "段落分析思路...",
  "metrics": [
    {"metric_name": "毛利率", "metric_type": "financial", "score": 0.93, "reason": "被深度分析的原因"}
  ]
}

训练细节

参数
基座模型 Qwen3-14B (8bit QLoRA)
训练数据 460 条(4 类样本)
LoRA r / alpha 16 / 32
LoRA target q/k/v/o/gate/up/down_proj
Trainable params 64M / 14.8B (0.43%)
Epochs 5
Batch size 2 × 8 = 16
Learning rate 2e-4 (cosine)
混合精度 bf16
Optimizer paged_adamw_8bit
硬件 NVIDIA DGX Spark (Blackwell)

训练代码和数据集:GitHub

Framework versions

  • PEFT 0.18.1
  • Transformers 4.x
  • TRL 0.x
  • bitsandbytes 0.x
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lmxxf/financial-report-lora-qwen3-14b

Finetuned
Qwen/Qwen3-14B
Adapter
(132)
this model