AI & ML interests

None defined yet.

Recent Activity

cedricbonhomme 
posted an update 10 days ago
view post
Post
2650
With VLAgentIc, you can now use your local Qwen installation via Ollama and leverage the models CIRCL/vulnerability-severity-classification-roberta-base and CIRCL/cwe-parent-vulnerability-classification-roberta-base.

The project is available here:
https://github.com/vulnerability-lookup/VLAgentIc

The VLAI Severity and CWE classifiers are available on Hugging Face:
- CIRCL/vulnerability-severity-classification-roberta-base
- CIRCL/cwe-parent-vulnerability-classification-roberta-base

The concept of AI agents—combining models, tools, and orchestration—has become fairly standardized during the last year, but VLAgentIc brings something unique:

- Agents communicate over XMPP, enabling concurrent tasks and asynchronous messaging thanks to the SPADE framework.
- Built-in presence and discovery streamline interactions between components.
- Flexible behaviours make orchestrating AI-assisted security workflows seamless for future connections
- Last but not least, the VLAI Severity and VLAI CWE classifiers are now wrapped as LLM Tools and run entirely locally.

New, more comprehensive agent tools will soon be available, leveraging the Vulnerability-Lookup API and supporting the GCVE project.

The Human-in-the-Loop agent tool will be designed to notify you and request authorization whenever a query to an external service is about to be made—ensuring that, by default, all reasoning and processing stay local on your computer.

VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification (2507.03607)
ronantakizawa 
posted an update 14 days ago
view post
Post
191
Introducing the HuggingFace Top Trending Papers dataset: a dataset that compiles the most trending papers on HuggingFace Daily Papers in 2025.

This dataset captures which AI/ML research papers gained the most community attention this year!

#huggingface #papers #dataset

ronantakizawa/huggingface-top-papers
ronantakizawa 
posted an update 20 days ago
victor 
posted an update 26 days ago
ronantakizawa 
posted an update 27 days ago
view post
Post
2683
Introducing the github-top-developers dataset: A comprehensive dataset of the top 8000 developers on GitHub (2020-2025). This dataset captures the evolution of GitHub's trending developers repositories over time and the projects they work on.

#github #developers

ronantakizawa/github-top-developers
·
ronantakizawa 
posted an update about 1 month ago
view post
Post
291
Introducing the trending-stocks-yahoo-finance dataset: a compilation of the most trending stocks on Yahoo Finance from July 2024 to October 2025.

This dataset captures each trending stock's max price, max market cap, best rank on Yahoo Finance, PE ratio, and trading volume.

#stocks #investing #trading

ronantakizawa/trending-stocks-yahoo-finance
  • 2 replies
·
ronantakizawa 
posted an update about 1 month ago
view post
Post
277
Introducing the github-top-projects dataset: A comprehensive dataset of 423,098 GitHub trending repository entries spanning 12+ years (August 2013 - November 2025).

This dataset captures the evolution of GitHub's trending repositories over time, providing insights into software development trends across programming languages and domains, popular open-source projects and their trending patterns, and community interests and shifts in developer focus over 12 years.

ronantakizawa/github-top-projects

#github #softwareengineering
ronantakizawa 
posted an update about 1 month ago
view post
Post
1107
Introducing the twitter-trending-hashtags dataset, a compilation of 12,000+ unique trending hashtags on Twitter / X from 2020 to 2025. This dataset captures viral and cultural moments on Twitter / X and is perfect for researchers studying viral content patterns on social media.

ronantakizawa/twitter-trending-hashtags

#twitter #trends #socialmedia
ronantakizawa 
posted an update about 1 month ago
view post
Post
1641
Introducing the tiktok-trending-hashtags dataset: a compilation of 1,830 unique trending hashtags on TikTok from 2022 to 2025. This dataset captures viral one-time and seasonal viral moments on TikTok and is perfect for researchers, marketers, and content creators studying viral content patterns on social media.

ronantakizawa/tiktok-trending-hashtags
#tiktok #trends #social-media
ronantakizawa 
posted an update about 2 months ago
view post
Post
318
Reached 2500+ total downloads across my models and datasets! 🎉

Follow me for more @ronantakizawa
ronantakizawa 
posted an update about 2 months ago
view post
Post
328
Introducing the india-trending-words dataset: a compilation of 900 trending Google searches from 2006-2024 based on https://trends.withgoogle.com. This dataset captures search trends in 80 categories, and is perfect for analyzing cultural shifts and predicting future trends in India.

#india #indiadataset #googlesearches

ronantakizawa/india-trending-words
ronantakizawa 
posted an update about 2 months ago
view post
Post
2488
Introducing the japanese-trending-words dataset: a dataset consisting 593 words from Japan’s annual trending word rankings (流行語大賞) from 2006-2025. This dataset provides the top 30 words from each year and its meaning in Japanese and english. This resource is awesome for NLP tasks understanding recent Japanese culture and history.

ronantakizawa/japanese-trending-words

#japanese #japanesedataset #trending


ronantakizawa 
posted an update about 2 months ago
view post
Post
992
Introducing the google-trending-words dataset: a compilation of 2784 trending Google searches from 2001-2024 based on https://trends.withgoogle.com. This dataset captures search trends in 93 categories, and is perfect for analyzing cultural shifts, predicting future trends, and understanding how global events shape online behavior.

#trends #google #googlesearches

ronantakizawa/trending-words-google
ronantakizawa 
posted an update about 2 months ago
view post
Post
1635
Introducing the Japanese Character Difficulty Dataset: a collection of 3,003 Japanese characters (Kanji) labeled with official educational difficulty grades. It includes elementary (grades 1–6), secondary (grade 8), and advanced (grade 9) characters, making it useful for language learning, text difficulty analysis, and educational tool development 🎉

ronantakizawa/japanese-character-difficulty

#japanese #kanji #japanesedataset
ronantakizawa 
posted an update about 2 months ago
view post
Post
2292
I built a demo on how to implement Cache-Augmented Generation (CAG) in an LLM and compare its performance gains to RAG (111 stars, 20 forks).

https://github.com/ronantakizawa/cacheaugmentedgeneration

CAG preloads document content into an LLM’s context as a precomputed key-value (KV) cache. This caching eliminates the need for real-time retrieval during inference, reducing token usage by up to 76% while maintaining answer quality.

CAG is particularly effective for constrained knowledge bases like internal documentation, FAQs, and customer support systems, where all relevant information can fit within the model's extended context window.

#rag #retrievalaugmentedgeneration
ronantakizawa 
posted an update 2 months ago
view post
Post
3289
Reached 1000+ total downloads across my models and datasets! 🎉

Follow me for more @ronantakizawa
  • 2 replies
·
ronantakizawa 
posted an update 2 months ago
view post
Post
2974
Introducing the Japanese honorifics dataset: a dataset with 137 sentences covering the three main keigo forms: 尊敬語 (Sonkeigo), 謙譲語 (Kenjōgo), and 丁寧語 (Teineigo). Each entry includes the base form, all three honorific transformations, and English translations for essential phrases in Japanese. This dataset is perfect for training and evaluating the Japanese skill level of LLMs.

#japanese #japanesedataset

ronantakizawa/japanese-honorifics
ronantakizawa 
posted an update 2 months ago
view post
Post
1111
Introducing JFLEG-JA, a new Japanese language error correction benchmark with 1,335 sentences, each paired with 4 high-quality human corrections 🎉

Inspired by the English JFLEG dataset, this dataset covers diverse error types, including particle mistakes, kanji mix-ups, incorrect contextual verb, adjective, and literary technique usage.

You can use this for evaluating LLMs, few-shot learning, error analysis, or fine-tuning correction systems.

ronantakizawa/jfleg-japanese

#japanese #evals #benchmark
ronantakizawa 
posted an update 2 months ago
view post
Post
1727
Introducing the Medical-o1-Reasoning-SFT-Japanese dataset 🎉

This dataset is a Japanese dataset consisting questions, reasoning, and answer results for complex medical topics.

#japanese #medical #dataset


ronantakizawa/Medical-o1-Reasoning-SFT-Japanese
ronantakizawa 
posted an update 3 months ago
view post
Post
1481
Introducing the Finance-Instruct-500k-Japanese dataset 🎉

This is a Japanese-translated version of the @Josephgflowers Finance-Instruct-500k dataset, which includes complex questions and answers related to finance and Economics.

#datasets #finance #finance-instruct #japanese

ronantakizawa/Finance-Instruct-500k-Japanese