Spaces:

djhui5710
/

reachy_mini_home_assistant

Running

App Files Files Community

Desmond-Dong commited on Jan 7

Commit

c92d558

1 Parent(s): e1fb914

恢复到4051ebd: 手势识别有依赖冲突，暂时移除

Browse files

Files changed (7) hide show

PROJECT_PLAN.md +33 -117
index.html +0 -15
pyproject.toml +1 -10
reachy_mini_ha_voice/__init__.py +1 -1
reachy_mini_ha_voice/camera_server.py +9 -91
reachy_mini_ha_voice/entity_registry.py +0 -46
reachy_mini_ha_voice/head_tracker.py +26 -36

PROJECT_PLAN.md CHANGED Viewed

@@ -614,7 +614,7 @@ VAD_DB_OFF = -45  # 停止检测阈值
 **技术实现**:
 - `tap_detector.py` - IMU 加速度突变检测
 - `satellite.py:_tap_conversation_mode` - 持续对话模式标志
-- 阈值: 0.5g (可配置，默认最敏感)
 - 冷却时间: 1.0s (防止重复触发)
 - 仅限无线版本 (Wireless) 可用
@@ -644,87 +644,7 @@ def _tts_finished(self):
 | 倾斜/倒下 | 播放求助动作 + 语音 "我倒了，帮帮我" | ❌ 未实现 |
 | 长时间静止 | 进入休眠动画 | ❌ 未实现 |
-### Phase 21 - 手势识别 ✅ **已完成**
-**目标**: 使用 MediaPipe Hands 检测手势，实现非语音交互。
-**技术方案**:
-- 使用 MediaPipe Hands（完全本地运行，无云端依赖）
-- 与 YOLO 人脸检测并行运行（每隔一帧处理手势，节省 CPU）
-- 手势需保持 0.5 秒才触发，1.5 秒冷却期
-**已实现手势 (11种)**:
-| 手势 | 英文值 | 含义 | 检测逻辑 |
-|------|--------|------|---------|
-| 👍 | `thumbs_up` | 确认/点赞 | 拇指向上，其他手指握拳 |
-| 👎 | `thumbs_down` | 拒绝/不喜欢 | 拇指向下，其他手指握拳 |
-| ✋ | `open_palm` | 停止 | 所有手指伸展 |
-| ✊ | `fist` | 暂停/保持 | 所有手指握拳 |
-| ✌️ | `peace` | 胜利/和平 | 食指和中指伸展，其他握拳 |
-| 👌 | `ok` | OK | 拇指和食指形成圆圈，其他伸展 |
-| ☝️ | `pointing_up` | 注意/一 | 仅食指伸展 |
-| 🤘 | `rock` | Rock on | 食指和小指伸展，中指和无名指握拳 |
-| 🤙 | `call` | Call me | 拇指和小指伸展，其他握拳 |
-| 3️⃣ | `three` | 三 | 食指、中指、无名指伸展 |
-| 4️⃣ | `four` | 四 | 除拇指外所有手指伸展 |
-**Home Assistant 实体**:
-| ESPHome 实体类型 | 名称 | 说明 |
-|-----------------|------|------|
-| `Text Sensor` | `detected_gesture` | 当前检测到的手势 (英文) |
-| `Switch` | `gesture_detection_enabled` | 手势检测开关 |
-**代码位置**:
-- `gesture_detector.py` - MediaPipe 手势检测器
-- `camera_server.py` - 集成手势检测到摄像头处理循环
-- `entity_registry.py` - Home Assistant 实体注册
-**技术细节**:
-```python
-# gesture_detector.py - 手势分类
-class Gesture(Enum):
-    NONE = "none"
-    THUMBS_UP = "thumbs_up"
-    THUMBS_DOWN = "thumbs_down"
-    OPEN_PALM = "open_palm"
-    FIST = "fist"
-    PEACE = "peace"
-    OK = "ok"
-    POINTING_UP = "pointing_up"
-    ROCK = "rock"
-    CALL = "call"
-    THREE = "three"
-    FOUR = "four"
-# 手势检测参数
-min_detection_confidence = 0.7
-min_tracking_confidence = 0.5
-gesture_hold_threshold = 0.5  # 保持 0.5 秒触发
-gesture_cooldown = 1.5  # 触发后 1.5 秒冷却
-gesture_clear_delay = 2.0  # 手势消失 2 秒后清除
-```
-**回调支持**:
-```python
-# 可为每种手势设置回调
-camera_server.set_gesture_callbacks(
-    on_thumbs_up=lambda: print("确认"),
-    on_thumbs_down=lambda: print("拒绝"),
-    on_open_palm=lambda: print("停止"),
-    on_fist=lambda: print("暂停"),
-    on_peace=lambda: print("和平"),
-    on_ok=lambda: print("OK"),
-    on_pointing_up=lambda: print("注意"),
-    on_rock=lambda: print("Rock!"),
-    on_call=lambda: print("打电话"),
-    on_three=lambda: print("三"),
-    on_four=lambda: print("四"),
-)
-```
-### Phase 22 - Home Assistant 场景联动 (未实现) ❌
 **目标**: 根据 Home Assistant 的场景/自动化触发机器人动作。
@@ -753,20 +673,16 @@ camera_server.set_gesture_callbacks(
 - **音频处理** - AGC、噪声抑制、回声消除
 - **摄像头流** - MJPEG 实时预览
-#### 扩展功能 (Phase 13-21)
-- **Phase 13** - Sendspin 多房间音频支持 ✅
-- **Phase 15** - YOLO 人脸追踪 ✅
-- **Phase 20** - 拍一拍唤醒 ✅
-- **Phase 21** - 手势识别 (11 种手势，自动安装mediapipe) ✅
-#### 部分实现功能
 - **Phase 14** - 情感动作 API 基础设施 (手动触发可用)
 - **Phase 19** - 重力补偿模式切换 (教学流程未实现)
 ### ❌ 未实现功能
 #### 高优先级
 - **Phase 14** - 自动情感动作反馈 (需与语音助手事件关联)
 #### 中优先级
 - **Phase 16** - 卡通风格运动模式 (需动态插值切换)
@@ -775,8 +691,8 @@ camera_server.set_gesture_callbacks(
 #### 低优先级
 - **Phase 19** - 教学模式录制/播放功能
-- **Phase 20** - IMU 环境感知响应 (摇晃/倾斜检测)
-- **Phase 22** - Home Assistant 场景联动
 ---
@@ -786,40 +702,42 @@ camera_server.set_gesture_callbacks(
 - ✅ **Phase 1-12**: 基础 ESPHome 实体 (45+ 个)
 - ✅ 核心语音助手功能
 - ✅ 基础运动反馈 (点头、摇头、注视)
-- ✅ **Phase 13**: Sendspin 多房间音频
-- ✅ **Phase 15**: YOLO 人脸追踪
-- ✅ **Phase 21**: 手势识别 (11 种手势)
 ### 高优先级 (部分实现 🟡)
-- 🟡 **Phase 14**: 情感动作反馈系统
   - ✅ Emotion Selector 实体与 API 基础设施
   - ❌ 自动根据语音助手响应触发情感动作
   - ❌ 意图识别与情感匹配
   - ❌ 舞蹈动作库集成
 ### 中优先级 (部分实现 🟡)
-- 🟡 **Phase 16**: 卡通风格运动模式
-  - ✅ 10Hz 统一控制循环架构 (优化以防止 daemon 崩溃)
   - ✅ 姿态变化检测 + 状态查询缓存 (减少 daemon 负载)
   - ✅ 平滑插值动作 + 呼吸动画
   - ❌ 动态插值技术切换 (CARTOON 等)
-- 🟡 **Phase 17**: 说话时天线同步
   - ✅ 语音驱动头部摆动 (SpeechSwayGenerator)
   - ❌ 天线随音频节奏摆动
 ### 中优先级 (未实现 ❌)
-- ❌ **Phase 18**: 视觉注视交互 - 眼神交流
 ### 低优先级 (部分实现 🟡)
-- 🟡 **Phase 19**: 重力补偿互动模式
   - ✅ 重力补偿模式切换
   - ❌ 教学式交互 (录制/播放功能)
-- 🟡 **Phase 20**: 环境感知响应
-  - ✅ 拍一拍唤醒 (IMU 加速度检测)
-  - ❌ 摇晃/倾斜检测
 ### 低优先级 (未实现 ❌)
-- ❌ **Phase 22**: Home Assistant 场景联动 - 智能家居整合
 ---
@@ -827,19 +745,17 @@ camera_server.set_gesture_callbacks(
 | 阶段 | 状态 | 完成度 | 说明 |
 |------|------|--------|------|
-| Phase 1-12 | ✅ 完成 | 100% | 45 个 ESPHome 实体已实现（Phase 11 LED 已禁用） |
-| Phase 13 | ✅ 完成 | 100% | Sendspin 多房间音频支持 |
-| Phase 14 | 🟡 部分完成 | 30% | API 基础设施就绪,缺自动触发 |
-| Phase 15 | ✅ 完成 | 100% | YOLO 人脸追踪 |
-| Phase 16 | 🟡 部分完成 | 70% | 10Hz控制循环+姿态变化检测+状态缓存+呼吸动画已实现 |
-| Phase 17 | 🟡 部分完成 | 50% | 语音驱动头部摆动已实现 |
-| Phase 18 | ❌ 未完成 | 10% | 摄像头已实现,缺眼神交流 |
-| Phase 19 | 🟡 部分完成 | 40% | 模式切换已实现,缺教学流程 |
-| Phase 20 | 🟡 部分完成 | 50% | 拍一拍唤醒已实现,缺摇晃/倾斜检测 |
-| Phase 21 | ✅ 完成 | 100% | 手势识别 (11 种手势) |
-| Phase 22 | ❌ 未完成 | 0% | 完全未实现 |
-**总体完成度**: **Phase 1-12: 100%** | **Phase 13-22: ~60%**
 ---

 **技术实现**:
 - `tap_detector.py` - IMU 加速度突变检测
 - `satellite.py:_tap_conversation_mode` - 持续对话模式标志
+- 阈值: 2.0g (可配置)
 - 冷却时间: 1.0s (防止重复触发)
 - 仅限无线版本 (Wireless) 可用
 | 倾斜/倒下 | 播放求助动作 + 语音 "我倒了，帮帮我" | ❌ 未实现 |
 | 长时间静止 | 进入休眠动画 | ❌ 未实现 |
+### Phase 21 - Home Assistant 场景联动 (未实现) ❌
 **目标**: 根据 Home Assistant 的场景/自动化触发机器人动作。
 - **音频处理** - AGC、噪声抑制、回声消除
 - **摄像头流** - MJPEG 实时预览
+#### 部分实现功能 (Phase 14-21)
 - **Phase 14** - 情感动作 API 基础设施 (手动触发可用)
 - **Phase 19** - 重力补偿模式切换 (教学流程未实现)
 ### ❌ 未实现功能
 #### 高优先级
+- ~~**Phase 13** - Sendspin 音频播放支持~~ ✅ **已完成**
 - **Phase 14** - 自动情感动作反馈 (需与语音助手事件关联)
+- **Phase 15** - 持续声源追踪 (仅唤醒时转向)
 #### 中优先级
 - **Phase 16** - 卡通风格运动模式 (需动态插值切换)
 #### 低优先级
 - **Phase 19** - 教学模式录制/播放功能
+- **Phase 20** - IMU 环境感知响应
+- **Phase 21** - Home Assistant 场景联动
 ---
 - ✅ **Phase 1-12**: 基础 ESPHome 实体 (45+ 个)
 - ✅ 核心语音助手功能
 - ✅ 基础运动反馈 (点头、摇头、注视)
 ### 高优先级 (部分实现 🟡)
+- 🟡 **Phase 13**: 情感动作反馈系统
   - ✅ Emotion Selector 实体与 API 基础设施
   - ❌ 自动根据语音助手响应触发情感动作
   - ❌ 意图识别与情感匹配
   - ❌ 舞蹈动作库集成
+### 高优先级 (未实现 ❌)
+- ❌ **Phase 14**: 智能声源追踪增强
+  - ✅ 唤醒时转向声源
+  - ❌ 持续声源追踪
+  - ❌ 多人对话切换
+  - ❌ 声源可视化
 ### 中优先级 (部分实现 🟡)
+- 🟡 **Phase 15**: 卡通风格运动模式
+  - ✅ 20Hz 统一控制循环架构 (优化以防止 daemon 崩溃)
   - ✅ 姿态变化检测 + 状态查询缓存 (减少 daemon 负载)
   - ✅ 平滑插值动作 + 呼吸动画
   - ❌ 动态插值技术切换 (CARTOON 等)
+- 🟡 **Phase 16**: 说话时天线同步
   - ✅ 语音驱动头部摆动 (SpeechSwayGenerator)
   - ❌ 天线随音频节奏摆动
 ### 中优先级 (未实现 ❌)
+- ❌ **Phase 17**: 视觉注视交互 - 眼神交流
 ### 低优先级 (部分实现 🟡)
+- 🟡 **Phase 18**: 重力补偿互动模式
   - ✅ 重力补偿模式切换
   - ❌ 教学式交互 (录制/播放功能)
 ### 低优先级 (未实现 ❌)
+- ❌ **Phase 19**: 环境感知响应 - IMU 触发动作
+- ❌ **Phase 20**: Home Assistant 场景联动 - 智能家居整合
 ---
 | 阶段 | 状态 | 完成度 | 说明 |
 |------|------|--------|------|
+| Phase 1-12 | ✅ 完成 | 100% | 40 个 ESPHome 实体已实现（Phase 11 LED 已禁用） |
+| Phase 13 | 🟡 部分完成 | 30% | API 基础设施就绪,缺自动触发 |
+| Phase 14 | ❌ 未完成 | 20% | 仅实现唤醒时转向 |
+| Phase 15 | 🟡 部分完成 | 70% | 20Hz控制循环+姿态变化检测+状态缓存+呼吸动画已实现 |
+| Phase 16 | 🟡 部分完成 | 50% | 语音驱动头部摆动已实现 |
+| Phase 17 | ❌ 未完成 | 10% | 摄像头已实现,缺人脸检测 |
+| Phase 18 | 🟡 部分完成 | 40% | 模式切换已实现,缺教学流程 |
+| Phase 19 | ❌ 未完成 | 10% | IMU 数据已暴露,缺触发逻辑 |
+| Phase 20 | ❌ 未完成 | 0% | 完全未实现 |
+**总体完成度**: **Phase 1-12: 100%** | **Phase 13-20: ~35%**
 ---

index.html CHANGED Viewed

@@ -80,10 +80,6 @@
 						<h3>😊 Facial Expressions</h3>
 						<p>Automatic emotional feedback with head movements and antenna animations while listening and responding.</p>
 					</div>
-					<div class="info-box">
-						<h3>✋ Gesture Detection</h3>
-						<p>MediaPipe-based hand gesture recognition with 11 gestures for non-verbal interaction.</p>
-					</div>
 					<div class="info-box">
 						<h3>📹 Camera Streaming</h3>
 						<p>MJPEG video stream available in Home Assistant as a Generic Camera for real-time monitoring.</p>
@@ -101,15 +97,6 @@
 				<h2>Changelog</h2>
 				<div class="how-to-use changelog-container">
 					<div class="changelog-scroll">
-						<div class="changelog-entry">
-							<span class="version">v0.6.0</span>
-							<span class="date">2026-01-07</span>
-							<ul>
-								<li>NEW: MediaPipe gesture detection with 11 gestures</li>
-								<li>Gestures: thumbs_up, thumbs_down, open_palm, fist, peace, ok, pointing_up, rock, call, three, four</li>
-								<li>Home Assistant entities for gesture detection (detected_gesture, gesture_detection_enabled)</li>
-							</ul>
-						</div>
 						<div class="changelog-entry">
 							<span class="version">v0.5.0</span>
 							<span class="date">2026-01-07</span>
@@ -122,8 +109,6 @@
 								<li>Noise suppression default reduced to 15%</li>
 								<li>Tap-to-wake default threshold reduced to 0.5g (most sensitive)</li>
 								<li>Fix: Replace non-existent clear_output_buffer with stop_playing</li>
-								<li>NEW: Gesture detection (11 gestures) via MediaPipe</li>
-								<li>NEW: Home Assistant entities for gesture detection</li>
 							</ul>
 						</div>
 						<div class="changelog-entry">

 						<h3>😊 Facial Expressions</h3>
 						<p>Automatic emotional feedback with head movements and antenna animations while listening and responding.</p>
 					</div>
 					<div class="info-box">
 						<h3>📹 Camera Streaming</h3>
 						<p>MJPEG video stream available in Home Assistant as a Generic Camera for real-time monitoring.</p>
 				<h2>Changelog</h2>
 				<div class="how-to-use changelog-container">
 					<div class="changelog-scroll">
 						<div class="changelog-entry">
 							<span class="version">v0.5.0</span>
 							<span class="date">2026-01-07</span>
 								<li>Noise suppression default reduced to 15%</li>
 								<li>Tap-to-wake default threshold reduced to 0.5g (most sensitive)</li>
 								<li>Fix: Replace non-existent clear_output_buffer with stop_playing</li>
 							</ul>
 						</div>
 						<div class="changelog-entry">

pyproject.toml CHANGED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "reachy_mini_ha_voice"
-version = "0.6.0"
 description = "Home Assistant Voice Assistant for Reachy Mini"
 readme = "README.md"
 requires-python = ">=3.10"
@@ -40,16 +40,7 @@ dependencies = [
     # Sendspin synchronized audio (optional, for multi-room playback)
     "aiosendspin>=2.0.1",
-    # Gesture detection dependencies (mediapipe installed separately for ARM64)
-    "flatbuffers>=2.0",
-    "absl-py",
-    "attrs>=19.1.0",
 ]
-[project.optional-dependencies]
-# For x86_64 systems, install with: pip install reachy_mini_ha_voice[gesture]
-gesture = ["mediapipe>=0.10.31"]
 keywords = ["reachy-mini-app", "reachy-mini", "home-assistant", "voice-assistant"]
 [project.entry-points."reachy_mini_apps"]

 [project]
 name = "reachy_mini_ha_voice"
+version = "0.5.0"
 description = "Home Assistant Voice Assistant for Reachy Mini"
 readme = "README.md"
 requires-python = ">=3.10"
     # Sendspin synchronized audio (optional, for multi-room playback)
     "aiosendspin>=2.0.1",
 ]
 keywords = ["reachy-mini-app", "reachy-mini", "home-assistant", "voice-assistant"]
 [project.entry-points."reachy_mini_apps"]

reachy_mini_ha_voice/__init__.py CHANGED Viewed

@@ -11,7 +11,7 @@ Key features:
 - Reachy Mini motion control integration
 """
-__version__ = "0.6.0"
 __author__ = "Desmond Dong"
 # Don't import main module here to avoid runpy warning

 - Reachy Mini motion control integration
 """
+__version__ = "0.5.0"
 __author__ = "Desmond Dong"
 # Don't import main module here to avoid runpy warning

reachy_mini_ha_voice/camera_server.py CHANGED Viewed

@@ -1,9 +1,9 @@
 """
-MJPEG Camera Server for Reachy Mini with Face Tracking and Gesture Detection.
 This module provides an HTTP server that streams camera frames from Reachy Mini
 as MJPEG, which can be integrated with Home Assistant via Generic Camera.
-Also provides face tracking for head movement control and gesture detection.
 Reference: reachy_mini_conversation_app/src/reachy_mini_conversation_app/camera_worker.py
 """
@@ -12,7 +12,7 @@ import asyncio
 import logging
 import threading
 import time
-from typing import Optional, Tuple, List, Callable, TYPE_CHECKING
 import cv2
 import numpy as np
@@ -36,15 +36,14 @@ MJPEG_BOUNDARY = "frame"
 class MJPEGCameraServer:
     """
-    MJPEG streaming server for Reachy Mini camera with face tracking and gesture detection.
     Provides HTTP endpoints:
     - /stream - MJPEG video stream
     - /snapshot - Single JPEG image
     - / - Simple status page
-    Also provides face tracking offsets for head movement control
-    and gesture detection for interaction (thumbs up, open palm/stop).
     """
     def __init__(
@@ -55,7 +54,6 @@ class MJPEGCameraServer:
         fps: int = 15,  # 15fps for smooth face tracking
         quality: int = 80,
         enable_face_tracking: bool = True,
-        enable_gesture_detection: bool = True,
     ):
         """
         Initialize the MJPEG camera server.
@@ -67,7 +65,6 @@ class MJPEGCameraServer:
             fps: Target frames per second for the stream
             quality: JPEG quality (1-100)
             enable_face_tracking: Enable face tracking for head movement
-            enable_gesture_detection: Enable gesture detection (thumbs up, stop)
         """
         self.reachy_mini = reachy_mini
         self.host = host
@@ -75,7 +72,6 @@ class MJPEGCameraServer:
         self.fps = fps
         self.quality = quality
         self.enable_face_tracking = enable_face_tracking
-        self.enable_gesture_detection = enable_gesture_detection
         self._server: Optional[asyncio.Server] = None
         self._running = False
@@ -102,10 +98,6 @@ class MJPEGCameraServer:
         # Offset scaling (same as conversation_app)
         self._offset_scale = 0.6
-        # Gesture detection state
-        self._gesture_detector = None
-        self._gesture_detection_enabled = True
     async def start(self) -> None:
         """Start the MJPEG camera server."""
@@ -129,25 +121,6 @@ class MJPEGCameraServer:
                 self._head_tracker = None
         else:
             _LOGGER.info("Face tracking disabled by configuration")
-        # Initialize gesture detector if enabled
-        if self.enable_gesture_detection:
-            try:
-                from .gesture_detector import GestureDetector
-                self._gesture_detector = GestureDetector()
-                if self._gesture_detector.is_available:
-                    _LOGGER.info("Gesture detection enabled with MediaPipe Hands")
-                else:
-                    _LOGGER.warning("Gesture detection not available (MediaPipe not installed)")
-                    self._gesture_detector = None
-            except ImportError as e:
-                _LOGGER.warning("Failed to import gesture detector: %s", e)
-                self._gesture_detector = None
-            except Exception as e:
-                _LOGGER.warning("Failed to initialize gesture detector: %s", e)
-                self._gesture_detector = None
-        else:
-            _LOGGER.info("Gesture detection disabled by configuration")
         # Start frame capture thread
         self._capture_thread = threading.Thread(
@@ -184,9 +157,8 @@ class MJPEGCameraServer:
         _LOGGER.info("MJPEG Camera server stopped")
     def _capture_frames(self) -> None:
-        """Background thread to capture frames from Reachy Mini and do face tracking + gesture detection."""
-        _LOGGER.info("Starting camera capture thread (face_tracking=%s, gesture_detection=%s)",
-                    self._face_tracking_enabled, self._gesture_detection_enabled)
         frame_count = 0
         last_log_time = time.time()
@@ -215,16 +187,11 @@ class MJPEGCameraServer:
                     # Handle smooth interpolation when face lost
                     self._process_face_lost_interpolation(current_time)
-                    # Do gesture detection if enabled (every other frame to save CPU)
-                    if self._gesture_detection_enabled and self._gesture_detector is not None:
-                        if frame_count % 2 == 0:  # Process every other frame
-                            self._gesture_detector.process_frame(frame)
                     # Log stats every 10 seconds
                     if current_time - last_log_time >= 10.0:
                         fps = frame_count / (current_time - last_log_time)
-                        _LOGGER.debug("Camera: %.1f fps, face_tracking=%s, gesture_detection=%s",
-                                     fps, self._face_tracking_enabled, self._gesture_detection_enabled)
                         frame_count = 0
                         last_log_time = current_time
@@ -412,55 +379,6 @@ class MJPEGCameraServer:
             self._interpolation_start_time = None
         _LOGGER.info("Face tracking %s", "enabled" if enabled else "disabled")
-    # =========================================================================
-    # Public API for gesture detection
-    # =========================================================================
-    def get_current_gesture(self) -> str:
-        """Get current detected gesture as string.
-        Returns:
-            Gesture name: "none", "thumbs_up", "open_palm"
-        """
-        if self._gesture_detector is None:
-            return "none"
-        return self._gesture_detector.current_gesture.value
-    def set_gesture_detection_enabled(self, enabled: bool) -> None:
-        """Enable or disable gesture detection."""
-        self._gesture_detection_enabled = enabled
-        _LOGGER.info("Gesture detection %s", "enabled" if enabled else "disabled")
-    def set_gesture_callbacks(
-        self,
-        on_thumbs_up: Optional[Callable[[], None]] = None,
-        on_thumbs_down: Optional[Callable[[], None]] = None,
-        on_open_palm: Optional[Callable[[], None]] = None,
-        on_fist: Optional[Callable[[], None]] = None,
-        on_peace: Optional[Callable[[], None]] = None,
-        on_ok: Optional[Callable[[], None]] = None,
-        on_pointing_up: Optional[Callable[[], None]] = None,
-        on_rock: Optional[Callable[[], None]] = None,
-        on_call: Optional[Callable[[], None]] = None,
-        on_three: Optional[Callable[[], None]] = None,
-        on_four: Optional[Callable[[], None]] = None,
-    ) -> None:
-        """Set gesture detection callbacks."""
-        if self._gesture_detector is not None:
-            self._gesture_detector.set_callbacks(
-                on_thumbs_up=on_thumbs_up,
-                on_thumbs_down=on_thumbs_down,
-                on_open_palm=on_open_palm,
-                on_fist=on_fist,
-                on_peace=on_peace,
-                on_ok=on_ok,
-                on_pointing_up=on_pointing_up,
-                on_rock=on_rock,
-                on_call=on_call,
-                on_three=on_three,
-                on_four=on_four,
-            )
     def _get_camera_frame(self) -> Optional[np.ndarray]:
         """Get a frame from Reachy Mini's camera."""
         if self.reachy_mini is None:

 """
+MJPEG Camera Server for Reachy Mini with Face Tracking.
 This module provides an HTTP server that streams camera frames from Reachy Mini
 as MJPEG, which can be integrated with Home Assistant via Generic Camera.
+Also provides face tracking for head movement control.
 Reference: reachy_mini_conversation_app/src/reachy_mini_conversation_app/camera_worker.py
 """
 import logging
 import threading
 import time
+from typing import Optional, Tuple, List, TYPE_CHECKING
 import cv2
 import numpy as np
 class MJPEGCameraServer:
     """
+    MJPEG streaming server for Reachy Mini camera with face tracking.
     Provides HTTP endpoints:
     - /stream - MJPEG video stream
     - /snapshot - Single JPEG image
     - / - Simple status page
+    Also provides face tracking offsets for head movement control.
     """
     def __init__(
         fps: int = 15,  # 15fps for smooth face tracking
         quality: int = 80,
         enable_face_tracking: bool = True,
     ):
         """
         Initialize the MJPEG camera server.
             fps: Target frames per second for the stream
             quality: JPEG quality (1-100)
             enable_face_tracking: Enable face tracking for head movement
         """
         self.reachy_mini = reachy_mini
         self.host = host
         self.fps = fps
         self.quality = quality
         self.enable_face_tracking = enable_face_tracking
         self._server: Optional[asyncio.Server] = None
         self._running = False
         # Offset scaling (same as conversation_app)
         self._offset_scale = 0.6
     async def start(self) -> None:
         """Start the MJPEG camera server."""
                 self._head_tracker = None
         else:
             _LOGGER.info("Face tracking disabled by configuration")
         # Start frame capture thread
         self._capture_thread = threading.Thread(
         _LOGGER.info("MJPEG Camera server stopped")
     def _capture_frames(self) -> None:
+        """Background thread to capture frames from Reachy Mini and do face tracking."""
+        _LOGGER.info("Starting camera capture thread (face_tracking=%s)", self._face_tracking_enabled)
         frame_count = 0
         last_log_time = time.time()
                     # Handle smooth interpolation when face lost
                     self._process_face_lost_interpolation(current_time)
                     # Log stats every 10 seconds
                     if current_time - last_log_time >= 10.0:
                         fps = frame_count / (current_time - last_log_time)
+                        _LOGGER.debug("Camera: %.1f fps, face_tracking=%s, head_tracker=%s",
+                                     fps, self._face_tracking_enabled, self._head_tracker is not None)
                         frame_count = 0
                         last_log_time = current_time
             self._interpolation_start_time = None
         _LOGGER.info("Face tracking %s", "enabled" if enabled else "disabled")
     def _get_camera_frame(self) -> Optional[np.ndarray]:
         """Get a frame from Reachy Mini's camera."""
         if self.reachy_mini is None:

reachy_mini_ha_voice/entity_registry.py CHANGED Viewed

@@ -83,9 +83,6 @@ ENTITY_KEYS: Dict[str, int] = {
     # Phase 13: Sendspin - auto-enabled via mDNS, no user entities needed
     # Phase 20: Tap detection
     "tap_sensitivity": 1400,
-    # Phase 21: Gesture detection
-    "detected_gesture": 1500,
-    "gesture_detection_enabled": 1501,
 }
@@ -157,7 +154,6 @@ class EntityRegistry:
         # Phase 13 (Sendspin) - auto-enabled via mDNS discovery, no user entities
         # Phase 14 (head_joints, passive_joints) removed - not needed
         self._setup_phase20_entities(entities)
-        self._setup_phase21_entities(entities)
         _LOGGER.info("All entities registered: %d total", len(entities))
@@ -773,48 +769,6 @@ class EntityRegistry:
         _LOGGER.debug("Phase 20 entities registered: tap_sensitivity")
-    def _setup_phase21_entities(self, entities: List) -> None:
-        """Setup Phase 21 entities: Gesture detection."""
-        if self.camera_server is None:
-            _LOGGER.debug("Phase 21 skipped: no camera server")
-            return
-        def get_detected_gesture() -> str:
-            """Get current detected gesture."""
-            return self.camera_server.get_current_gesture()
-        def get_gesture_detection_enabled() -> bool:
-            """Get gesture detection enabled state."""
-            return self.camera_server._gesture_detection_enabled
-        def set_gesture_detection_enabled(value: bool) -> None:
-            """Set gesture detection enabled state."""
-            self.camera_server.set_gesture_detection_enabled(value)
-        # Text sensor for detected gesture
-        entities.append(TextSensorEntity(
-            server=self.server,
-            key=get_entity_key("detected_gesture"),
-            name="Detected Gesture",
-            object_id="detected_gesture",
-            icon="mdi:hand-wave",
-            value_getter=get_detected_gesture,
-        ))
-        # Switch to enable/disable gesture detection
-        entities.append(SwitchEntity(
-            server=self.server,
-            key=get_entity_key("gesture_detection_enabled"),
-            name="Gesture Detection",
-            object_id="gesture_detection_enabled",
-            icon="mdi:gesture",
-            entity_category=1,  # config
-            value_getter=get_gesture_detection_enabled,
-            value_setter=set_gesture_detection_enabled,
-        ))
-        _LOGGER.debug("Phase 21 entities registered: detected_gesture, gesture_detection_enabled")
     def find_entity_references(self, entities: List) -> None:
         """Find and store references to special entities from existing list.

     # Phase 13: Sendspin - auto-enabled via mDNS, no user entities needed
     # Phase 20: Tap detection
     "tap_sensitivity": 1400,
 }
         # Phase 13 (Sendspin) - auto-enabled via mDNS discovery, no user entities
         # Phase 14 (head_joints, passive_joints) removed - not needed
         self._setup_phase20_entities(entities)
         _LOGGER.info("All entities registered: %d total", len(entities))
         _LOGGER.debug("Phase 20 entities registered: tap_sensitivity")
     def find_entity_references(self, entities: List) -> None:
         """Find and store references to special entities from existing list.

reachy_mini_ha_voice/head_tracker.py CHANGED Viewed

@@ -1,12 +1,12 @@
 """Lightweight head tracker using YOLO for face detection.
-Model is downloaded from HuggingFace on first use and cached locally.
 """
 from __future__ import annotations
 import logging
-import time
-from pathlib import Path
 from typing import Tuple, Optional
 import numpy as np
@@ -15,38 +15,43 @@ from numpy.typing import NDArray
 logger = logging.getLogger(__name__)
-# Model config
-_MODEL_REPO = "AdamCodd/YOLOv11n-face-detection"
-_MODEL_FILENAME = "model.pt"
-_MAX_RETRIES = 3
-_RETRY_DELAY = 5  # seconds
 class HeadTracker:
-    """Lightweight head tracker using YOLO for face detection."""
     def __init__(
         self,
         confidence_threshold: float = 0.3,
         device: str = "cpu",
     ) -> None:
         """Initialize YOLO-based head tracker.
         Args:
             confidence_threshold: Minimum confidence for face detection
             device: Device to run inference on ('cpu' or 'cuda')
         """
         self.confidence_threshold = confidence_threshold
         self.model = None
         self._device = device
         self._detections_class = None
         self._model_load_attempted = False
         self._model_load_error: Optional[str] = None
         self._load_model()
     def _load_model(self) -> None:
-        """Load YOLO model with retry logic."""
         if self._model_load_attempted:
             return
@@ -59,34 +64,19 @@ class HeadTracker:
             self._detections_class = Detections
-            # Download with retries
-            model_path = None
-            last_error = None
-            for attempt in range(_MAX_RETRIES):
-                try:
-                    model_path = hf_hub_download(
-                        repo_id=_MODEL_REPO,
-                        filename=_MODEL_FILENAME,
-                    )
-                    break
-                except Exception as e:
-                    last_error = e
-                    if attempt < _MAX_RETRIES - 1:
-                        logger.warning(
-                            "Model download failed (attempt %d/%d): %s. Retrying in %ds...",
-                            attempt + 1, _MAX_RETRIES, e, _RETRY_DELAY
-                        )
-                        time.sleep(_RETRY_DELAY)
-            if model_path is None:
-                raise last_error
             self.model = YOLO(model_path).to(self._device)
-            logger.info("YOLO face detection model loaded")
         except ImportError as e:
             self._model_load_error = f"Missing dependencies: {e}"
-            logger.warning("Face tracking disabled - missing dependencies: %s", e)
             self.model = None
         except Exception as e:
             self._model_load_error = str(e)

 """Lightweight head tracker using YOLO for face detection.
+Ported from reachy_mini_conversation_app for voice assistant integration.
+Model is loaded at initialization time (not lazy) to ensure face tracking
+is ready immediately when the camera server starts.
 """
 from __future__ import annotations
 import logging
 from typing import Tuple, Optional
 import numpy as np
 logger = logging.getLogger(__name__)
 class HeadTracker:
+    """Lightweight head tracker using YOLO for face detection.
+    Model is loaded at initialization time to ensure face tracking
+    is ready immediately (matching conversation_app behavior).
+    """
     def __init__(
         self,
+        model_repo: str = "AdamCodd/YOLOv11n-face-detection",
+        model_filename: str = "model.pt",
         confidence_threshold: float = 0.3,
         device: str = "cpu",
     ) -> None:
         """Initialize YOLO-based head tracker.
         Args:
+            model_repo: HuggingFace model repository
+            model_filename: Model file name
             confidence_threshold: Minimum confidence for face detection
             device: Device to run inference on ('cpu' or 'cuda')
         """
         self.confidence_threshold = confidence_threshold
         self.model = None
+        self._model_repo = model_repo
+        self._model_filename = model_filename
         self._device = device
         self._detections_class = None
         self._model_load_attempted = False
         self._model_load_error: Optional[str] = None
+        # Load model immediately at init (not lazy)
         self._load_model()
     def _load_model(self) -> None:
+        """Load YOLO model at initialization time."""
         if self._model_load_attempted:
             return
             self._detections_class = Detections
+            model_path = hf_hub_download(
+                repo_id=self._model_repo,
+                filename=self._model_filename
+            )
             self.model = YOLO(model_path).to(self._device)
+            logger.info("YOLO face detection model loaded from %s", self._model_repo)
         except ImportError as e:
             self._model_load_error = f"Missing dependencies: {e}"
+            logger.warning(
+                "Face tracking disabled - missing dependencies: %s. "
+                "Install with: pip install ultralytics supervision huggingface_hub",
+                e
+            )
             self.model = None
         except Exception as e:
             self._model_load_error = str(e)