---
license: apache-2.0
language:
- en
tags:
- ecommerce
- e-commerce
- retail
- marketplace
- shopping
- amazon
- ebay
- alibaba
- google
- rakuten
- bestbuy
- walmart
- flipkart
- wayfair
- shein
- target
- etsy
- shopify
- taobao
- asos
- carrefour
- costco
- overstock
- pretraining
- decoder
- language-modeling
- foundation-model
library_name: transformers
base_model:
- Qwen/Qwen3-Reranker-0.6B
pipeline_tag: text-ranking
datasets:
- thebajajra/Amazebay-Relevance
model-index:
- name: RexReranker-0.6B
  results:
  - task:
      type: text-ranking
      name: Reranking (query–product relevance)
    dataset:
      name: ERESS (E-commerce Relevance Evaluation Scoring Suite)
      type: thebajajra/eress
    metrics:
    - name: nDCG@5
      type: ndcg_at_5
      value: 0.9794
    - name: nDCG@10
      type: ndcg_at_10
      value: 0.9722
---

<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6893dd21467f7d2f5f358a95/apOIbl5PdJuRk-tQMdDc8.png" alt="RexReranker">
</p>
<p align="center">
</p>

[![Models](https://img.shields.io/badge/🤗%20Hugging%20Face-Models-red)](https://huggingface.co/collections/thebajajra/rexreranker)
[![Data](https://img.shields.io/badge/🤗%20Training%20Data-AmazebayR-yellow)](https://huggingface.co/datasets/thebajajra/Amazebay-Relevance)
[![ERSS](https://img.shields.io/badge/🤗%20Evaluation%20Data-ERSS-blue)](https://huggingface.co/datasets/thebajajra/eress)
[![GitHub](https://img.shields.io/badge/GitHub-Code-black)](https://github.com/bajajra/RexRerankers)
[![Blog](https://img.shields.io/badge/Blog-Blog-green)](https://huggingface.co/blog/thebajajra/rexrerankers)

# RexReranker-0.6B

## Model Summary
**RexReranker-0.6B** is state-of-the-art decoder-based **generation reranker** for e-commerce product discovery. Given a user query and a candidate product (title + optional description/attributes), it outputs a relevance score derived from the model’s **token-level probability of a binary judgment (“yes” vs “no”)**.


## Intended Use
**Primary use cases**
- Second-stage reranking for **product search** (high-recall retrieval → top-k rerank).
- Shopping/commerce assistants: selecting and ordering candidate products given natural-language constraints (size, compatibility, color, etc.).
- Offline evaluation / benchmarking of reranking approaches for product discovery.

## Model Details

### Model type
- **Text Ranking / Reranking** model 
- **Decoder LM architecture** (`Qwen3ForCausalLM`)  
- **Parameters:** ~0.6B 


## Training Data
This model is trained for e-commerce relevance reranking using the project’s released relevance datasets:

- **Amazebay-Relevance**: **6.33M rows** of query–product pairs (train split ~6.29M, validation ~38k)  

## Evaluation

Pareto Comparison: 
<center><img src="https://cdn-uploads.huggingface.co/production/uploads/6893dd21467f7d2f5f358a95/RAlkM57sjxKTyoLiWyhqL.png" width="675"></center>

Performance on various Query Types:
![radar_chart](https://cdn-uploads.huggingface.co/production/uploads/6893dd21467f7d2f5f358a95/ILa1znKsPU79ARgNDXFoQ.png)

## How to Use

#### Using vLLM

```python
# Requires vllm>=0.8.5
import logging
from typing import Dict, Optional, List

import json
import logging

import torch

from transformers import AutoTokenizer, is_torch_npu_available
from vllm import LLM, SamplingParams
from vllm.distributed.parallel_state import destroy_model_parallel
import gc
import math
from vllm.inputs.data import TokensPrompt


def format_instruction(instruction, query, doc):
    text = [
        {"role": "system", "content": "Judge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be \"yes\" or \"no\"."},
        {"role": "user", "content": f"<Instruct>: {instruction}\n\n<Query>: {query}\n\n<Document>: {doc}"}
    ]
    return text

def process_inputs(pairs, instruction, max_length, suffix_tokens):
    messages = [format_instruction(instruction, query, doc) for query, doc in pairs]
    messages =  tokenizer.apply_chat_template(
        messages, tokenize=True, add_generation_prompt=False, enable_thinking=False
    )
    messages = [ele[:max_length] + suffix_tokens for ele in messages]
    messages = [TokensPrompt(prompt_token_ids=ele) for ele in messages]
    return messages

def compute_logits(model, messages, sampling_params, true_token, false_token):
    outputs = model.generate(messages, sampling_params, use_tqdm=False)
    scores = []
    for i in range(len(outputs)):
        final_logits = outputs[i].outputs[0].logprobs[-1]
        token_count = len(outputs[i].outputs[0].token_ids)
        if true_token not in final_logits:
            true_logit = -10
        else:
            true_logit = final_logits[true_token].logprob
        if false_token not in final_logits:
            false_logit = -10
        else:
            false_logit = final_logits[false_token].logprob
        true_score = math.exp(true_logit)
        false_score = math.exp(false_logit)
        score = true_score / (true_score + false_score)
        scores.append(score)
    return scores

number_of_gpu = torch.cuda.device_count()
tokenizer = AutoTokenizer.from_pretrained('thebajajra/RexReranker-0.6B')
model = LLM(model='thebajajra/RexReranker-0.6B', tensor_parallel_size=number_of_gpu, max_model_len=10000, enable_prefix_caching=True, gpu_memory_utilization=0.8)
tokenizer.padding_side = "left"
tokenizer.pad_token = tokenizer.eos_token
suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
max_length=8192
suffix_tokens = tokenizer.encode(suffix, add_special_tokens=False)
true_token = tokenizer("yes", add_special_tokens=False).input_ids[0]
false_token = tokenizer("no", add_special_tokens=False).input_ids[0]
sampling_params = SamplingParams(temperature=0, 
    max_tokens=1,
    logprobs=20, 
    allowed_token_ids=[true_token, false_token],
)

        
task = 'Given a web search query, retrieve relevant passages that answer the query'
queries = ["visual fractions workbooks for children",
    "replacement motor mount for 2008 focus",
]
documents = [
    "Fractions and Decimals Workbook for Grades 4 to 5",
    "3pcs Set - Motor Mounts Kit Compatible with 08-11 Ford Focus 2.0L Auto Automatic and Manual Trans Transmission AT MT - Engine Mounts",
]

pairs = list(zip(queries, documents))
inputs = process_inputs(pairs, task, max_length-len(suffix_tokens), suffix_tokens)
scores = compute_logits(model, inputs, sampling_params, true_token, false_token)
print('scores', scores)

destroy_model_parallel()
```

#### Using HF Transformers

```python
# Requires transformers>=4.51.0
import torch
from transformers import AutoModel, AutoTokenizer, AutoModelForCausalLM

def format_instruction(instruction, query, doc):
    if instruction is None:
        instruction = 'Given a web search query, retrieve relevant passages that answer the query'
    output = "<Instruct>: {instruction}\n<Query>: {query}\n<Document>: {doc}".format(instruction=instruction,query=query, doc=doc)
    return output

def process_inputs(pairs):
    inputs = tokenizer(
        pairs, padding=False, truncation='longest_first',
        return_attention_mask=False, max_length=max_length - len(prefix_tokens) - len(suffix_tokens)
    )
    for i, ele in enumerate(inputs['input_ids']):
        inputs['input_ids'][i] = prefix_tokens + ele + suffix_tokens
    inputs = tokenizer.pad(inputs, padding=True, return_tensors="pt", max_length=max_length)
    for key in inputs:
        inputs[key] = inputs[key].to(model.device)
    return inputs

@torch.no_grad()
def compute_logits(inputs, **kwargs):
    batch_scores = model(**inputs).logits[:, -1, :]
    true_vector = batch_scores[:, token_true_id]
    false_vector = batch_scores[:, token_false_id]
    batch_scores = torch.stack([false_vector, true_vector], dim=1)
    batch_scores = torch.nn.functional.log_softmax(batch_scores, dim=1)
    scores = batch_scores[:, 1].exp().tolist()
    return scores

tokenizer = AutoTokenizer.from_pretrained("thebajajra/RexReranker-0.6B", padding_side='left')
model = AutoModelForCausalLM.from_pretrained("thebajajra/RexReranker-0.6B").eval()
# We recommend enabling flash_attention_2 for better acceleration and memory saving.
# model = AutoModelForCausalLM.from_pretrained("thebajajra/RexReranker-0.6B", torch_dtype=torch.float16, attn_implementation="flash_attention_2").cuda().eval()
token_false_id = tokenizer.convert_tokens_to_ids("no")
token_true_id = tokenizer.convert_tokens_to_ids("yes")
max_length = 8192

prefix = "<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be \"yes\" or \"no\".<|im_end|>\n<|im_start|>user\n"
suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
prefix_tokens = tokenizer.encode(prefix, add_special_tokens=False)
suffix_tokens = tokenizer.encode(suffix, add_special_tokens=False)
        
task = 'Given a web search query, retrieve relevant passages that answer the query'

queries = ["visual fractions workbooks for children",
    "replacement motor mount for 2008 focus",
]
documents = [
    "Fractions and Decimals Workbook for Grades 4 to 5",
    "3pcs Set - Motor Mounts Kit Compatible with 08-11 Ford Focus 2.0L Auto Automatic and Manual Trans Transmission AT MT - Engine Mounts",
]

pairs = [format_instruction(task, query, doc) for query, doc in zip(queries, documents)]

# Tokenize the input texts
inputs = process_inputs(pairs)
scores = compute_logits(inputs)

print("scores: ", scores)

```