you got us hooked now , cant wait for the release of the 4.2 version . could you please provide any ETA or approximations about when it MIGHT release ?

#17
by Why-T - opened

specially after what we noticed from this model so far its so promising yet frustrating for the users , since the overthinking issue is really a pain to the experience where sometimes in some long recursive logic problems it takes over 30 min in thinking and still sometimes gets the answer wrong even after wasting tens of thousands of tokens on thinking .

Nanbeige LLM Lab org

Thanks, really appreciate it!

Nanbeige4.2 is actively in progress. We don’t have a confirmed ETA yet, but we’re pushing hard to release it as soon as possible.

The next version will introduce structural innovations and broader capability upgrades across multiple dimensions. We’re excited to explore how far a small model can go under tight VRAM constraints. Besides, an instruct version is also coming.

Stay tuned — we’re working on it.

well im all in rooting for you guys , this is genuinely what the AI market and R&D needs to focus on , cause scaling will hit the ceiling at some point ! wish you all the best keep it up ! :)

Yeah we all are waiting for just the option of low, medium, and high reasoning levels... and if the problem is just too hard and it cant get it. We can just leave it there form 20 min eating tokens forever. But maybe for "how many rs has the word strawberry" it's enough with just one round of COT. BTW the model answers it correct in the first 100 tokens and spends like 4k tokens when already got the answer.

Considering the rank of this model in coding compared to other models, even 30B models, I think you have achieved about 80% of your goal in building a lightweight model but with strong reasoning capabilities, but a solution and change or improvement is needed to fix the problem of overthinking.
Overthinking is not the problem, I think this is the strength of the model, but the problem is that the model does not know which questions to think less about and which questions to think more about.
My personal opinion is that it may not be right, but I say: the scenario of building a 3b model with deep thinking capabilities is completely possible and feasible, but it is better to turn it into multiple specialized models, not a general-purpose model.
For example, a version for coding and reasoning and mathematics
A version for general tasks and working with functions and chatbots
Etc...

I really admire you.

I hope one day you will present a fully specialized version of the coder with these 3 to 7 B. The speed and accuracy are really admirable.

What I really want is vision support, THEN I will have a reason to use Nanbeige 😉

Sign up or log in to comment