Nikeytas
/

videomae-crime-detector-ultra-v1

@@ -1,199 +1,317 @@
 ---
-library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
+license: mit
+base_model: MCG-NJU/videomae-base
+tags:
+- video-classification
+- crime-detection
+- violence-detection
+- videomae
+- computer-vision
+- security
+- surveillance
+- generated_from_trainer
+language:
+- en
+datasets:
+- jinmang2/ucf_crime
+metrics:
+- accuracy
+- precision
+- recall
+- f1
+pipeline_tag: video-classification
+model-index:
+- name: videomae-crime-detector-ultra-v1
+  results:
+  - task:
+      name: Violence Detection
+      type: video-classification
+    dataset:
+      name: UCF Crime Dataset (Subset)
+      type: jinmang2/ucf_crime
+      args: violence_detection
+    metrics:
+    - name: Accuracy
+      type: accuracy
+      value: 0.7188
+    - name: Precision
+      type: precision
+      value: 0.7207
+    - name: Recall
+      type: recall
+      value: 0.7188
+    - name: F1
+      type: f1
+      value: 0.7190
 ---
+# Nikeytas/Videomae Crime Detector Ultra V1
+This model is a fine-tuned version of [MCG-NJU/videomae-base](https://huggingface.co/MCG-NJU/videomae-base) on the UCF Crime dataset with **event-based binary classification**. It achieves the following results on the evaluation set:
+- **Loss**: 1.4159
+- **Accuracy**: 0.7188
+- **Precision**: 0.7207
+- **Recall**: 0.7188
+- **F1 Score**: 0.7190
+## 🎯 Model Overview
+This VideoMAE model has been fine-tuned for **binary violence detection** in video content. The model classifies videos into two categories:
+- **Violent Crime** (1): Videos containing violent criminal activities
+- **Non-Violent Incident** (0): Videos with non-violent or normal activities
+The model is based on the **VideoMAE architecture** and has been specifically trained on a curated subset of the UCF Crime dataset with event-based categorization for realistic crime detection scenarios.
+## 📊 Dataset & Training
+### Dataset Composition
+**Total Videos**: 600
+- **Violent Crime Videos**: 300
+- **Non-Violent Incident Videos**: 300
+**Class Balance**: 50.0% violent crimes
+**Event Distribution**:
+- **Abuse**: 28 videos
+- **Arrest**: 18 videos
+- **Arson**: 16 videos
+- **Assault**: 62 videos
+- **Burglary**: 120 videos
+- **Explosion**: 54 videos
+- **Fighting**: 48 videos
+- **RoadAccidents**: 58 videos
+- **Robbery**: 184 videos
+- **Shoplifting**: 36 videos
+- **Stealing**: 46 videos
+- **Vandalism**: 72 videos
+**Data Splits**:
+- **Training**: 384 videos
+- **Validation**: 96 videos
+- **Test**: 120 videos
+## 🎯 Performance
+### Performance Metrics
+**Validation Performance**:
+- **eval_loss**: 1.4159
+- **eval_accuracy**: 0.7188
+- **eval_precision**: 0.7207
+- **eval_recall**: 0.7188
+- **eval_f1**: 0.7190
+- **eval_runtime**: 11.1870
+- **eval_samples_per_second**: 8.5810
+- **eval_steps_per_second**: 4.2910
+- **epoch**: 15.0000
+**Test Performance**:
+- **eval_loss**: 1.7586
+- **eval_accuracy**: 0.6833
+- **eval_precision**: 0.6963
+- **eval_recall**: 0.6833
+- **eval_f1**: 0.6802
+- **eval_runtime**: 13.9918
+- **eval_samples_per_second**: 8.5760
+- **eval_steps_per_second**: 4.2880
+- **epoch**: 15.0000
+**Training Information**:
+- **Training Time**: 69.5 minutes
+- **Best Accuracy Achieved**: 0.7188
+- **Model Architecture**: VideoMAE Base (fine-tuned)
+- **Fine-tuning Approach**: Event-based binary classification
+## 🚀 Training Procedure
+### Training Hyperparameters
+The following hyperparameters were used during training:
+- **Learning Rate**: 5e-05
+- **Train Batch Size**: 2
+- **Eval Batch Size**: 2
+- **Optimizer**: AdamW with betas=(0.9,0.999) and epsilon=1e-08
+- **LR Scheduler Type**: Linear
+- **Training Epochs**: 15
+- **Weight Decay**: 0.01
+### Training Results
+| Training Loss | Epoch | Step | Validation Loss | Accuracy |
+|---------------|-------|------|-----------------|----------|
+| 0.71875 | 15.00 | N/A | 1.4159 | 0.7188 |
+### Framework Versions
+- **Transformers**: 4.30.2+
+- **PyTorch**: 2.0.1+
+- **Datasets**: Latest
+- **Device**: Apple Silicon MPS / CUDA / CPU (Auto-detected)
+## 🚀 Quick Start
+### Installation
+```bash
+pip install transformers torch torchvision opencv-python pillow
+```
+### Basic Usage
+```python
+import torch
+from transformers import AutoModelForVideoClassification, AutoProcessor
+import cv2
+import numpy as np
+# Load model and processor
+model = AutoModelForVideoClassification.from_pretrained("Nikeytas/videomae-crime-detector-ultra-v1")
+processor = AutoProcessor.from_pretrained("Nikeytas/videomae-crime-detector-ultra-v1")
+# Process video
+def classify_video(video_path, num_frames=16):
+    # Extract frames
+    cap = cv2.VideoCapture(video_path)
+    frames = []
+    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+    indices = np.linspace(0, total_frames - 1, num_frames, dtype=int)
+    for idx in indices:
+        cap.set(cv2.CAP_PROP_POS_FRAMES, idx)
+        ret, frame = cap.read()
+        if ret:
+            frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+            frames.append(frame_rgb)
+    cap.release()
+    # Process with model
+    inputs = processor(frames, return_tensors="pt")
+    with torch.no_grad():
+        outputs = model(**inputs)
+        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
+        predicted_class = torch.argmax(predictions, dim=-1).item()
+        confidence = predictions[0][predicted_class].item()
+    label = "Violent Crime" if predicted_class == 1 else "Non-Violent"
+    return label, confidence
+# Example usage
+video_path = "path/to/your/video.mp4"
+prediction, confidence = classify_video(video_path)
+print(f"Prediction: {prediction} (Confidence: {confidence:.3f})")
+```
+### Batch Processing
+```python
+import os
+from pathlib import Path
+def process_video_directory(video_dir, output_file="results.txt"):
+    results = []
+    for video_file in Path(video_dir).glob("*.mp4"):
+        try:
+            prediction, confidence = classify_video(str(video_file))
+            results.append({
+                "file": video_file.name,
+                "prediction": prediction,
+                "confidence": confidence
+            })
+            print(f"✅ {video_file.name}: {prediction} ({confidence:.3f})")
+        except Exception as e:
+            print(f"❌ Error processing {video_file.name}: {e}")
+    # Save results
+    with open(output_file, "w") as f:
+        for result in results:
+            f.write(f"{result['file']}: {result['prediction']} ({result['confidence']:.3f})\n")
+    return results
+# Process all videos in a directory
+results = process_video_directory("./videos/")
+```
+## 📈 Technical Specifications
+- **Base Model**: MCG-NJU/videomae-base
+- **Architecture**: Vision Transformer (ViT) adapted for video
+- **Input Resolution**: 224x224 pixels per frame
+- **Temporal Resolution**: 16 frames per video clip
+- **Output Classes**: 2 (Binary classification)
+- **Training Framework**: HuggingFace Transformers
+- **Optimization**: AdamW optimizer with learning rate 5e-5
+## ⚠️ Limitations
+1. **Dataset Scope**: Trained on a subset of UCF Crime dataset - may not generalize to all types of violence
+2. **Temporal Context**: Uses 16-frame clips which may miss context in longer sequences
+3. **Environmental Bias**: Performance may vary with different lighting, camera angles, and video quality
+4. **False Positives**: May misclassify intense but non-violent activities (sports, action movies)
+5. **Real-time Performance**: Processing time depends on hardware capabilities
+## 🔒 Ethical Considerations
+### Intended Use
+- **Primary**: Research and development in video analysis
+- **Secondary**: Security system enhancement with human oversight
+- **Educational**: Computer vision and AI safety research
+### Prohibited Uses
+- **Surveillance without consent**: Do not use for unauthorized monitoring
+- **Discriminatory profiling**: Avoid bias against specific groups or communities
+- **Automated punishment**: Never use for automated legal or disciplinary actions
+- **Privacy violation**: Respect privacy laws and individual rights
+### Bias and Fairness
+- Model trained on specific dataset that may not represent all populations
+- Regular evaluation needed for bias detection and mitigation
+- Human oversight required for critical applications
+- Consider demographic representation in deployment scenarios
+## 📝 Model Card Information
+- **Developed by**: Research Team
+- **Model Type**: Video Classification (Binary)
+- **Training Data**: UCF Crime Dataset (Subset)
+- **Training Date**: 2025-06-02 11:52:12 UTC
+- **Evaluation Metrics**: Accuracy, Precision, Recall, F1-Score
+- **Intended Users**: Researchers, Security Professionals, Developers
+## 📚 Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{Nikeytas_videomae_crime_detector_ultra_v1,
+    title={VideoMAE Fine-tuned for Crime Detection},
+    author={Research Team},
+    year={2024},
+    publisher={Hugging Face},
+    url={https://huggingface.co/Nikeytas/videomae-crime-detector-ultra-v1}
+}
+```
+## 🤝 Contributing
+We welcome contributions to improve the model! Please:
+1. Report issues with specific examples
+2. Suggest improvements for bias reduction
+3. Share evaluation results on new datasets
+4. Contribute to documentation and examples
+## 📞 Contact
+For questions, issues, or collaboration opportunities, please open an issue in the model repository or contact the development team.
+---
+*Last updated: 2025-06-02 11:52:12 UTC*
+*Model version: 1.0*
+*Framework: HuggingFace Transformers*