Models

74

Full-text search

Active filters: jailbreak-detection

rogue-security/prompt-injection-jailbreak-sentinel-v2

Text Classification • 0.6B • Updated Mar 11 • 16.8k • 33

madhurjindal/Jailbreak-Detector-2-XL

Text Generation • Updated Jul 20, 2025 • 414 • 6

llm-semantic-router/toolcall-verifier

Token Classification • 0.1B • Updated Dec 18, 2025 • 38 • 2

Necent/distilbert-base-uncased-detected-jailbreak

Text Classification • 67M • Updated May 29, 2025 • 41

madhurjindal/Jailbreak-Detector

Text Classification • 65.8M • Updated May 30, 2025 • 2.34k

madhurjindal/Jailbreak-Detector-Large

Text Classification • 0.3B • Updated May 30, 2025 • 170 • 3

GuardrailsAI/prompt-saturation-attack-detector

Text Classification • 4.39M • Updated Nov 14, 2024 • 47.8k • • 2

qualifire/prompt-injection-sentinel

Text Classification • 0.4B • Updated Sep 22, 2025 • 2.21k • 15

gincioks/cerberus-bert-base-un-v1.0-onnx

Text Classification • Updated Jun 15, 2025 • 3

gincioks/cerberus-distilbert-base-un-v1.0-onnx

Text Classification • Updated Jun 15, 2025 • 5

gincioks/cerberus-deberta-v3-small-v1.0-onnx

Text Classification • Updated Jun 15, 2025 • 2

gincioks/cerberus-proventra-mdeberta-v3-base-v1.0-onnx

Text Classification • Updated Jun 15, 2025 • 4

pmking27/jailbreak-detection

Text Classification • 0.3B • Updated Jun 19, 2025 • 78

intelliway/deberta-v3-base-prompt-injection-v2-mapa

Text Classification • 0.2B • Updated Jul 3, 2025 • 3

qualifire/prompt-injection-jailbreak-sentinel-v2-GGUF

0.6B • Updated Sep 28, 2025 • 22 • 1

ahmedmajid92/iraqi-guard-model

Text Classification • 0.3B • Updated Oct 9, 2025 • 7 • 1

rootfs/tool-call-verifier

Token Classification • 0.1B • Updated Dec 14, 2025 • 5

rootfs/function-call-sentinel

Text Classification • 0.1B • Updated Dec 14, 2025 • 6

vincentoh/jailbreak-detector-v5

Text Classification • Updated Dec 18, 2025

thirtyninetythree/deberta-prompt-guard

Text Classification • 0.2B • Updated Dec 22, 2025 • 3

llm-semantic-router/toolcall-sentinel

Text Classification • 0.1B • Updated Dec 18, 2025 • 13 • 1

llm-semantic-router/mmbert-jailbreak-detector-lora

Text Classification • Updated Jan 21 • 4

llm-semantic-router/mmbert-jailbreak-detector-merged

Text Classification • 0.3B • Updated Jan 21 • 106

abdulmunimjemal/Sentinel-Rail-A-Prompt-Attack-Guard

Text Classification • Updated Jan 21 • 1

llm-semantic-router/mmbert-safety-classifier-level1

Text Classification • Updated Jan 21 • 2

llm-semantic-router/mlcommons-safety-classifier-level1-binary

Text Classification • Updated Jan 22 • 62

ynyg/Unified_Prompt_Guard

0.3B • Updated Jan 28 • 26

llm-semantic-router/mmbert32k-jailbreak-detector-lora

Text Classification • Updated Feb 1 • 21

llm-semantic-router/mmbert32k-jailbreak-detector-merged

Text Classification • 0.3B • Updated Mar 6 • 2.66k

satyamg1620/mmbert32k-jailbreak-detector-healthcare-merged

Text Classification • 0.3B • Updated Feb 15 • 3