Linux File Search Query Formatter Model
Model Overview
This model is a Query Formatter trained on the Linux File Search Dataset.
It maps natural language file search queries into a structured JSON-like representation of file attributes based on a fixed schema.
Key Features:
- Converts NL queries → structured tag–value pairs
- Supports all schema attributes from the Linux File Search NLI dataset:
- File attributes (
file_type,extension,size_kb,owner,group,permissions) - Temporal attributes (
created_year,modified_year) - Semantic attributes (
language,purpose,contains_text,is_executable,hidden) - Path scope and generic tags (
path_scope,important,autogenerated,obsolete,archived)
- File attributes (
- Outputs deterministic JSON suitable for safe post-processing into
findcommands or other Linux search engines - Trained in bf16 format for faster training/inference with lower memory usage
Intended Use
Recommended:
- Formatting natural language queries into structured representations
- Query-to-Structure pipelines for semantic file search
- Integration with safe Linux CLI search tools (
find,grep,fd) - Training downstream Q2I or NLI models
Not Recommended:
- Direct command execution without validation
- General-purpose conversation
- Use outside Linux file systems without adaptation
Model Architecture
- Type: Decoder-only (seq2seq transformer)
- Input: Natural language query
- Output: JSON-like structured representation (tag:value pairs)
- Precision: bf16
- Training Dataset: Linux File Search NLI (~3500 synthetic examples)
- Training Objective: Map NL queries → structured schema attributes
Limitations
- English-only queries
- Linux-centric file system abstraction
- Temporal reasoning limited to years
- Logical operators may require post-processing
- Does not execute commands
Safety Considerations
- Outputs are structured representations, not shell commands
- Any conversion to executable commands should be validated and sandboxed
- Prevent execution of arbitrary system commands from model output
- Downloads last month
- 21
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support