Training Data and inference scripts with tool calling , websearch and so on plus training scripts
Will you also publicly release the
Training Data -
Inference scripts you use (tool call and websearch) -
scripts you used for training the model -
Also additional questions:
- Did you ever consider training it natively on NVFP4 or MXFP4 instead of fp16?
- I run a benchmark and let the model have fp32 weights for the KV cache and fp16 for the weights, the benchmarks did slightly improve about 2% points
- Are you planning to train a MoE on this model as it performs extremely well? Something like 40B4A
This is a very very good release thank you a lot, it works very well with ShinkaEvolve to improve algorithms.
To get the overthinking down it might be possible to punish the model if the solution already appeared more than 2 times in the thinking process (just from what i think might help)
Thanks for sharing this discussion! 👋
I’ve been exploring Nanbeige4.1-3B as well, and this thread has been really helpful. Just wanted to add that I’ve noticed improvements in generation quality with longer context inputs compared to previous versions. I’m curious if others have tested performance on domain-specific prompts (e.g., technical or scientific text), and how well the model maintains coherence over extended responses.
Would love to hear about other users’ experiences or tips for optimizing prompts! 😊
Thanks a lot for the thoughtful feedback and support.
Low-precision training, MoE scaling, and efficient thinking without sacrificing performance are all part of our ongoing research.
Stay tuned for our future releases.
Thanks for sharing this discussion! 👋
I’ve been exploring Nanbeige4.1-3B as well, and this thread has been really helpful. Just wanted to add that I’ve noticed improvements in generation quality with longer context inputs compared to previous versions. I’m curious if others have tested performance on domain-specific prompts (e.g., technical or scientific text), and how well the model maintains coherence over extended responses.
Would love to hear about other users’ experiences or tips for optimizing prompts! 😊
With Rtx2060 8GB and 16GB RAM and i5 8600
I tested the model with 5 very complex Lua programming questions
He consumes a lot of tokens
Maybe he thinks for about 8-10 minutes but he answers 96% correctly in coding
I think the architecture of this model is a revolution
The temperature of the graphics card goes up to 73 due to the calculation, but with the MSI After Burner program you can reduce the Temp limit so that it thinks at a lower frequency and the temperature stays at 50-60 and your fans do not bear too much pressure
A lower Temp limit does not mean reducing the speed of thinking, but rather limits the frequency of the card a little to control the temperature
Of course, this is a personal suggestion, otherwise the temperature of 73 is not a problem and this is my precaution to increase the life of the fans and the graphics card
I really thank the team that created this model
I hope they release new updates soon