AI & ML interests
Enterprise AI and ML, Foundation Models, Responsible AI
Recent Activity
View all activity
Papers
Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows
Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents
Articles
A collection of time series models trained by IBM
-
ibm-research/ttm-research-r2
Time Series Forecasting • 855k • Updated • 23.9k • 6 -
ibm-research/ttm-r3
Time Series Forecasting • 1.41M • Updated • 110k • 5 -
ibm-research/flowstate
Time Series Forecasting • 9.07M • Updated • 31.3k • 11 -
ibm-research/patchtst-fm-r1
Time Series Forecasting • 0.3B • Updated • 6.9k • 9
An evaluation suite created for benchmarking of retrieval models on Table+Text retrieval datasets.
This category highlights the collective efforts of the AI Automation team in advancing Industry 4.0 applications and exploring innovations beyond it.
-
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance
Paper • 2506.03828 • Published • 20 -
FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure Modes
Paper • 2506.03278 • Published • 7 -
ibm-research/AssetOpsBench
Viewer • Updated • 467 • 1.2k • 29 -
AssetOpsBench
📉4Evaluating Autonomous AI Agents for Industry 4.0 Tasks
REAL-MM-RAG-Bench is a benchmark designed to evaluate multi-modal retrieval models under realistic and challenging conditions.
Welcome to IBM’s multi-modal foundation model for materials, FM4M, designed to support and advance research in materials science and chemistry.
Enterprise agent ecosystem featuring AssetOpsBench (industrial) and ITBench (SRE, FinOps, CISO), CUGA to accelerate AI Automation
REAL-MM-RAG-Bench is a benchmark designed to evaluate multi-modal retrieval models under realistic and challenging conditions.
Datasets and models of the Otter-Knowledge project
GGUF-formatted versions of IBM Granite 3.2 models. Licensed under the Apache 2.0 license.
-
ibm-research/granite-3.2-2b-instruct-GGUF
Text Generation • 3B • Updated • 435 • 12 -
ibm-research/granite-3.2-8b-instruct-GGUF
Text Generation • 8B • Updated • 419 • 9 -
ibm-research/granite-vision-3.2-2b-GGUF
3B • Updated • 358 • 12 -
ibm-research/granite-guardian-3.2-3b-a800m-GGUF
Text Generation • 3B • Updated • 290 • 3
A collection of time series models trained by IBM
-
ibm-research/ttm-research-r2
Time Series Forecasting • 855k • Updated • 23.9k • 6 -
ibm-research/ttm-r3
Time Series Forecasting • 1.41M • Updated • 110k • 5 -
ibm-research/flowstate
Time Series Forecasting • 9.07M • Updated • 31.3k • 11 -
ibm-research/patchtst-fm-r1
Time Series Forecasting • 0.3B • Updated • 6.9k • 9
Enterprise agent ecosystem featuring AssetOpsBench (industrial) and ITBench (SRE, FinOps, CISO), CUGA to accelerate AI Automation
An evaluation suite created for benchmarking of retrieval models on Table+Text retrieval datasets.
REAL-MM-RAG-Bench is a benchmark designed to evaluate multi-modal retrieval models under realistic and challenging conditions.
This category highlights the collective efforts of the AI Automation team in advancing Industry 4.0 applications and exploring innovations beyond it.
-
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance
Paper • 2506.03828 • Published • 20 -
FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure Modes
Paper • 2506.03278 • Published • 7 -
ibm-research/AssetOpsBench
Viewer • Updated • 467 • 1.2k • 29 -
AssetOpsBench
📉4Evaluating Autonomous AI Agents for Industry 4.0 Tasks
Datasets and models of the Otter-Knowledge project
REAL-MM-RAG-Bench is a benchmark designed to evaluate multi-modal retrieval models under realistic and challenging conditions.
GGUF-formatted versions of IBM Granite 3.2 models. Licensed under the Apache 2.0 license.
-
ibm-research/granite-3.2-2b-instruct-GGUF
Text Generation • 3B • Updated • 435 • 12 -
ibm-research/granite-3.2-8b-instruct-GGUF
Text Generation • 8B • Updated • 419 • 9 -
ibm-research/granite-vision-3.2-2b-GGUF
3B • Updated • 358 • 12 -
ibm-research/granite-guardian-3.2-3b-a800m-GGUF
Text Generation • 3B • Updated • 290 • 3
Welcome to IBM’s multi-modal foundation model for materials, FM4M, designed to support and advance research in materials science and chemistry.