What is the LLM backbone?

#3
by niktheod - opened

Hi, what is the LLM backbone for Ristretto. In transformers the LLM is Qwen2ForCausalLM, however its size is around 3.4B parameters. There isn't a model of this size in the Qwen2 family. Can you clarify what backbone was used?

LiAutoAD org
β€’
edited 8 days ago

Hi, what is the LLM backbone for Ristretto. In transformers the LLM is Qwen2ForCausalLM, however its size is around 3.4B parameters. There isn't a model of this size in the Qwen2 family. Can you clarify what backbone was used?

As shown in the Model Card, our LLM utilizes the Qwen2.5-3B architecture. Since we have set tie_word_embeddings to False, the LM head weights are not shared with the embedding layer, resulting in an additional parameter count of approximately 300M.

Sign up or log in to comment