Linux File Search Query Formatter Model

Model Overview

This model is a Query Formatter trained on the Linux File Search Dataset.
It maps natural language file search queries into a structured JSON-like representation of file attributes based on a fixed schema.

Key Features:

  • Converts NL queries → structured tag–value pairs
  • Supports all schema attributes from the Linux File Search NLI dataset:
    • File attributes (file_type, extension, size_kb, owner, group, permissions)
    • Temporal attributes (created_year, modified_year)
    • Semantic attributes (language, purpose, contains_text, is_executable, hidden)
    • Path scope and generic tags (path_scope, important, autogenerated, obsolete, archived)
  • Outputs deterministic JSON suitable for safe post-processing into find commands or other Linux search engines
  • Trained in bf16 format for faster training/inference with lower memory usage

Intended Use

Recommended:

  • Formatting natural language queries into structured representations
  • Query-to-Structure pipelines for semantic file search
  • Integration with safe Linux CLI search tools (find, grep, fd)
  • Training downstream Q2I or NLI models

Not Recommended:

  • Direct command execution without validation
  • General-purpose conversation
  • Use outside Linux file systems without adaptation

Model Architecture

  • Type: Decoder-only (seq2seq transformer)
  • Input: Natural language query
  • Output: JSON-like structured representation (tag:value pairs)
  • Precision: bf16
  • Training Dataset: Linux File Search NLI (~3500 synthetic examples)
  • Training Objective: Map NL queries → structured schema attributes

Limitations

  • English-only queries
  • Linux-centric file system abstraction
  • Temporal reasoning limited to years
  • Logical operators may require post-processing
  • Does not execute commands

Safety Considerations

  • Outputs are structured representations, not shell commands
  • Any conversion to executable commands should be validated and sandboxed
  • Prevent execution of arbitrary system commands from model output
Downloads last month
21
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for software-si/gemma3-270m-query-formatter-linux-file-search-bf16

Finetuned
(1002)
this model

Dataset used to train software-si/gemma3-270m-query-formatter-linux-file-search-bf16