Improve language tag

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show

README.md +269 -258

README.md CHANGED Viewed

@@ -1,258 +1,269 @@
----
-license: mit
-language:
-- zh
-- en
-base_model:
-- Qwen/Qwen2.5-1.5B-Instruct
-pipeline_tag: text-generation
-library_name: transformers
-tags:
-- Context
-- Qwen2.5-1.5B
----
-# Qwen2.5-1.5B-Instruct-CTX-Int8
-This version of Qwen2.5-1.5B-Instruct-CTX-Int8 has been converted to run on the Axera NPU using **w8a16** quantization.
-This model has been optimized with the following LoRA:
-Compatible with Pulsar2 version: 4.0(Not released yet)
-## Feature
-- Support for longer contexts, in this sample it's 2.5k
-- Support context dialogue
-- System prompt kvcache is supported
-## Convert tools links:
-For those who are interested in model conversion, you can try to export axmodel through the original repo : https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4
-[Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html)
-[AXera NPU AXEngine LLM Runtime](https://github.com/ZHEQIUSHUI/ax-llm/tree/prefill_kvcaches_context)
-[AXera NPU AXCL LLM Runtime](https://github.com/ZHEQIUSHUI/ax-llm/tree/axcl-context-kvcache)
-## Support Platform
-- AX650
-  - AX650N DEMO Board
-  - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
-  - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
-- AX630C
-  - *TBD*
-|Chips|w8a16|w4a16| DDR | Flash |
-|--|--|--|--|--|
-|AX650| 11 tokens/sec| *TBD* | 2.3GB | 2.3GB |
-## How to use
-Download all files from this repository to the device
-```
-root@ax650:/mnt/qtang/llm-test/Qwen2.5-1.5B-Instruct-CTX-Int8# tree -L 1
-.
-├── kvcache
-├── main
-├── main_axcl_aarch64
-├── main_axcl_x86
-├── post_config.json
-├── qwen2.5-1.5b-ctx-ax650
-├── qwen2.5_tokenizer
-├── qwen2.5_tokenizer_uid.py
-├── run_qwen2.5_1.5b_ctx_ax650.sh
-├── run_qwen2.5_1.5b_ctx_axcl_aarch64.sh
-└── run_qwen2.5_1.5b_ctx_axcl_x86.sh
-```
-#### Start the Tokenizer service
-```
-root@ax650:/mnt/qtang/llm-test/Qwen2.5-1.5B-Instruct-CTX-Int8# python qwen2.5_tokenizer_uid.py
-Server running at http://0.0.0.0:12345
-```
-#### System prompt cache
-- The System prompt can be preset through the configuration file from `--system_prompt`
-- The System prompt can be cached in the form of kv cache to a specified folder for quick loading at the next run time from `--kvcache_path`
-- This folder needs to be created manually before running, for example `mkdir kvcache`
-```
-(base) axera@raspberrypi:~/samples/qwen2.5-1.5b-ctx $ cat run_qwen2.5_1.5b_ctx_axcl_aarch64.sh
-./main_axcl_aarch64 \
---system_prompt "你的名字叫小智（allen）,你是一个人畜无害的AI助手。深圳市今天（4月1日）阴天，愚人节，气温在14°C至19°C之间，微风。" \
---kvcache_path "./kvcache" \
---template_filename_axmodel "qwen2.5-1.5b-ctx-ax650/qwen2_p128_l%d_together.axmodel" \
---axmodel_num 28 \
---tokenizer_type 2 \
---url_tokenizer_model "http://127.0.0.1:12345" \
---filename_post_axmodel "qwen2.5-1.5b-ctx-ax650/qwen2_post.axmodel" \
---filename_tokens_embed "qwen2.5-1.5b-ctx-ax650/model.embed_tokens.weight.bfloat16.bin" \
---tokens_embed_num 151936 \
---tokens_embed_size 1536 \
---use_mmap_load_embed 1 \
---live_print 1 \
---devices 0
-```
-#### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board
-Open another terminal and run `run_qwen2.5_1.5b_gptq_int4_ax650.sh`
-```
-root@ax650:/mnt/qtang/llm-test/Qwen2.5-1.5B-Instruct-CTX-Int8# mkdir -p kvcache
-root@ax650:/mnt/qtang/llm-test/Qwen2.5-1.5B-Instruct-CTX-Int8# ./run_qwen2.5_1.5b_ctx_ax650.sh
-[I][                            Init][ 107]: LLM init start
-[I][                            Init][  34]: connect http://127.0.0.1:12345 ok
-bos_id: -1, eos_id: 151645
-  3% | ██                                |   1 /  31 [0.21s<6.39s, 4.85 count/s] tokenizer init ok
-[I][                            Init][  26]: LLaMaEmbedSelector use mmap
-100% | ████████████████████████████████ |  31 /  31 [5.04s<5.04s, 6.15 count/s] init post axmodel ok,remain_cmm(9656 MB)
-[I][                            Init][ 185]: max_token_len : 2559
-[I][                            Init][ 190]: kv_cache_size : 256, kv_cache_num: 2559
-[I][                            Init][ 198]: prefill_token_num : 128
-[I][                            Init][ 202]: grp: 1, prefill_max_token_num : 1
-[I][                            Init][ 202]: grp: 2, prefill_max_token_num : 512
-[I][                            Init][ 202]: grp: 3, prefill_max_token_num : 1024
-[I][                            Init][ 202]: grp: 4, prefill_max_token_num : 1536
-[I][                            Init][ 202]: grp: 5, prefill_max_token_num : 2048
-[I][                     load_config][ 282]: load config:
-{
-    "enable_repetition_penalty": false,
-    "enable_temperature": true,
-    "enable_top_k_sampling": true,
-    "enable_top_p_sampling": false,
-    "penalty_window": 20,
-    "repetition_penalty": 1.2,
-    "temperature": 0.9,
-    "top_k": 10,
-    "top_p": 0.8
-}
-[I][                            Init][ 213]: LLM init ok
-Type "q" to exit, Ctrl+c to stop current running
-[E][                    load_kvcache][ 101]: k_cache ./kvcache/k_cache_0.bin or v_cache ./kvcache/v_cache_0.bin not exist
-[W][                            main][ 217]: load kvcache from path: ./kvcache failed,generate kvcache
-100% | ████████████████████████████████ |  53 /  53 [4.12s<4.12s, 12.85 token/s]
-[I][                      GetKVCache][ 325]: precompute_len:53
-[I][                            main][ 224]: generate kvcache to path: ./kvcache
-[I][                            main][ 226]: precompute_len: 53
-[I][                            main][ 227]: system_prompt: 你的名字叫小智（allen）,你是一个人畜无害的AI助手。深圳市今天（4月1日）阴天，愚人节，气温在14°C至19°C之间，微风。
-prompt >> who are you?
-[I][                      SetKVCache][ 354]: prefill_grpid:2 kv_cache_num:512 precompute_len:53 input_num_token:12
-[I][                             Run][ 527]: input_embed_num(12)
-[I][                             Run][ 642]: ttft: 537.06 ms
-我是Allen，一个能够回答问题、提供信息和执行任务的虚拟助手。我可以帮助你解决各种问题、做计划、玩游戏、甚至是进行一些娱乐活动。请问有什么我能帮助你的吗？
-[N][                             Run][ 756]: hit eos,avg 11.09 token/s
-[I][                      GetKVCache][ 325]: precompute_len:108
-prompt >> 今天是几号，天气怎么样
-[I][                      SetKVCache][ 354]: prefill_grpid:2 kv_cache_num:512 precompute_len:108 input_num_token:15
-[I][                             Run][ 527]: input_embed_num(15)
-[I][                             Run][ 642]: ttft: 536.81 ms
-今天是4月1日，愚人节。根据您所描述的深圳天气情况，气温在14°C至19°C之间，气温较低，建议穿着适当。希望您今天愉快！
-[N][                             Run][ 756]: hit eos,avg 11.17 token/s
-[I][                      GetKVCache][ 325]: precompute_len:166
-```
-#### Inference with M.2 Accelerator card
-[What is M.2 Accelerator card?](https://axcl-pi5-examples-cn.readthedocs.io/zh-cn/latest/index.html), Show this DEMO based on Raspberry PI 5.
-```
-(base) axera@raspberrypi:~/samples/Qwen2.5-1.5B-Instruct-CTX-Int8 $ mkdir -p kvcache
-(base) axera@raspberrypi:~/samples/Qwen2.5-1.5B-Instruct-CTX-Int8 $ ./run_qwen2.5_1.5b_ctx_axcl_aarch64.sh
-[I][                            Init][ 134]: LLM init start
-[I][                            Init][  41]: connect http://127.0.0.1:12345 ok
-bos_id: -1, eos_id: 151645
-  3% | ██                                |   1 /  31 [0.46s<14.11s, 2.20 count/s] tokenizer init ok
-[I][                            Init][  45]: LLaMaEmbedSelector use mmap
-  6% | ███                               |   2 /  31 [0.46s<7.05s, 4.40 count/s] embed_selector init ok
-[I][                             run][  30]: AXCLWorker start with devid 0
-100% | ████████████████████████████████ |  31 /  31 [29.18s<29.18s, 1.06 count/s] init post axmodel ok,remain_cmm(-1 MB)m(-1 MB)
-[I][                            Init][ 235]: max_token_len : 2559
-[I][                            Init][ 238]: kv_cache_size : 256, kv_cache_num: 2559
-[I][                            Init][ 246]: prefill_token_num : 128
-[I][                            Init][ 250]: grp: 1, prefill_max_token_num : 1
-[I][                            Init][ 250]: grp: 2, prefill_max_token_num : 512
-[I][                            Init][ 250]: grp: 3, prefill_max_token_num : 1024
-[I][                            Init][ 250]: grp: 4, prefill_max_token_num : 1536
-[I][                            Init][ 250]: grp: 5, prefill_max_token_num : 2048
-________________________
-|    ID| remain cmm(MB)|
-========================
-|     0|             -1|
-¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
-[I][                     load_config][ 282]: load config:
-{
-    "enable_repetition_penalty": false,
-    "enable_temperature": true,
-    "enable_top_k_sampling": true,
-    "enable_top_p_sampling": false,
-    "penalty_window": 20,
-    "repetition_penalty": 1.2,
-    "temperature": 0.9,
-    "top_k": 10,
-    "top_p": 0.8
-}
-[I][                            Init][ 275]: LLM init ok
-Type "q" to exit, Ctrl+c to stop current running
-[E][                    load_kvcache][ 100]: k_cache ./kvcache/k_cache_0.bin or v_cache ./kvcache/v_cache_0.bin not exist
-[W][                            main][ 223]: load kvcache from path: ./kvcache failed,generate kvcache
-100% | ████████████████████████████████ |  53 /  53 [5.06s<5.06s, 10.47 token/s]
-[I][                      GetKVCache][ 419]: precompute_len:53
-[I][                            main][ 230]: generate kvcache to path: ./kvcache
-[I][                            main][ 232]: precompute_len: 53
-[I][                            main][ 233]: system_prompt: 你的名字叫小智（allen）,你是一个人畜无害的AI助手。深圳市今天（4月1日）阴天，愚人节，气温在14°C至19°C之间，微风。
-prompt >> 你是谁
-[I][                      SetKVCache][ 448]: prefill_grpid:2 kv_cache_num:512 precompute_len:53 input_num_token:10
-[I][                             Run][ 722]: input token num : 10
-[I][                             Run][ 823]: ttft: 548.23 ms
-我是深圳市气象局发布的天气预报，我叫小智，是为了解答大家关于天气的问题而设计的。如果你对天气有疑问，欢迎随时询问！
-[N][                             Run][ 975]: hit eos,avg 9.04 token/s
-[I][                      GetKVCache][ 419]: precompute_len:98
-prompt >> 你能干什么
-[I][                      SetKVCache][ 448]: prefill_grpid:2 kv_cache_num:512 precompute_len:98 input_num_token:10
-[I][                             Run][ 722]: input token num : 10
-[I][                             Run][ 823]: ttft: 548.07 ms
-我能回答关于天气、生活、科技、文化、娱乐、历史等方面的很多问题。如果你有任何想知道的内容，都可以问我哦！
-[N][                             Run][ 975]: hit eos,avg 9.03 token/s
-[I][                      GetKVCache][ 419]: precompute_len:135
-prompt >> q
-[I][                             run][  80]: AXCLWorker exit with devid 0
->> q
-(base) axera@raspberrypi:~ $ axcl-smi
-+------------------------------------------------------------------------------------------------+
-| AXCL-SMI  V2.25.0_20250117163029                                Driver  V2.25.0_20250117163029 |
-+-----------------------------------------+--------------+---------------------------------------+
-| Card  Name                     Firmware | Bus-Id       |                          Memory-Usage |
-| Fan   Temp                Pwr:Usage/Cap | CPU      NPU |                             CMM-Usage |
-|=========================================+==============+=======================================|
-|    0  AX650N                    V2.25.0 | 0000:01:00.0 |                188 MiB /      945 MiB |
-|   --   37C                      -- / -- | 1%        0% |               2335 MiB /     7040 MiB |
-+-----------------------------------------+--------------+---------------------------------------+
-+------------------------------------------------------------------------------------------------+
-| Processes:                                                                                     |
-| Card      PID  Process Name                                                   NPU Memory Usage |
-|================================================================================================|
-|    0   147835  /home/axera/samples/qwen2.5-1.5b-ctx/main_axcl_aarch64               1990172 KiB |
-+------------------------------------------------------------------------------------------------+
-(base) axera@raspberrypi:~ $
-```

+---
+license: mit
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+base_model:
+- Qwen/Qwen2.5-1.5B-Instruct
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- Context
+- Qwen2.5-1.5B
+---
+# Qwen2.5-1.5B-Instruct-CTX-Int8
+This version of Qwen2.5-1.5B-Instruct-CTX-Int8 has been converted to run on the Axera NPU using **w8a16** quantization.
+This model has been optimized with the following LoRA:
+Compatible with Pulsar2 version: 4.0(Not released yet)
+## Feature
+- Support for longer contexts, in this sample it's 2.5k
+- Support context dialogue
+- System prompt kvcache is supported
+## Convert tools links:
+For those who are interested in model conversion, you can try to export axmodel through the original repo : https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4
+[Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html)
+[AXera NPU AXEngine LLM Runtime](https://github.com/ZHEQIUSHUI/ax-llm/tree/prefill_kvcaches_context)
+[AXera NPU AXCL LLM Runtime](https://github.com/ZHEQIUSHUI/ax-llm/tree/axcl-context-kvcache)
+## Support Platform
+- AX650
+  - AX650N DEMO Board
+  - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
+  - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
+- AX630C
+  - *TBD*
+|Chips|w8a16|w4a16| DDR | Flash |
+|--|--|--|--|--|
+|AX650| 11 tokens/sec| *TBD* | 2.3GB | 2.3GB |
+## How to use
+Download all files from this repository to the device
+```
+root@ax650:/mnt/qtang/llm-test/Qwen2.5-1.5B-Instruct-CTX-Int8# tree -L 1
+.
+├── kvcache
+├── main
+├── main_axcl_aarch64
+├── main_axcl_x86
+├── post_config.json
+├── qwen2.5-1.5b-ctx-ax650
+├── qwen2.5_tokenizer
+├── qwen2.5_tokenizer_uid.py
+├── run_qwen2.5_1.5b_ctx_ax650.sh
+├── run_qwen2.5_1.5b_ctx_axcl_aarch64.sh
+└── run_qwen2.5_1.5b_ctx_axcl_x86.sh
+```
+#### Start the Tokenizer service
+```
+root@ax650:/mnt/qtang/llm-test/Qwen2.5-1.5B-Instruct-CTX-Int8# python qwen2.5_tokenizer_uid.py
+Server running at http://0.0.0.0:12345
+```
+#### System prompt cache
+- The System prompt can be preset through the configuration file from `--system_prompt`
+- The System prompt can be cached in the form of kv cache to a specified folder for quick loading at the next run time from `--kvcache_path`
+- This folder needs to be created manually before running, for example `mkdir kvcache`
+```
+(base) axera@raspberrypi:~/samples/qwen2.5-1.5b-ctx $ cat run_qwen2.5_1.5b_ctx_axcl_aarch64.sh
+./main_axcl_aarch64 \
+--system_prompt "你的名字叫小智（allen）,你是一个人畜无害的AI助手。深圳市今天（4月1日）阴天，愚人节，气温在14°C至19°C之间，微风。" \
+--kvcache_path "./kvcache" \
+--template_filename_axmodel "qwen2.5-1.5b-ctx-ax650/qwen2_p128_l%d_together.axmodel" \
+--axmodel_num 28 \
+--tokenizer_type 2 \
+--url_tokenizer_model "http://127.0.0.1:12345" \
+--filename_post_axmodel "qwen2.5-1.5b-ctx-ax650/qwen2_post.axmodel" \
+--filename_tokens_embed "qwen2.5-1.5b-ctx-ax650/model.embed_tokens.weight.bfloat16.bin" \
+--tokens_embed_num 151936 \
+--tokens_embed_size 1536 \
+--use_mmap_load_embed 1 \
+--live_print 1 \
+--devices 0
+```
+#### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board
+Open another terminal and run `run_qwen2.5_1.5b_gptq_int4_ax650.sh`
+```
+root@ax650:/mnt/qtang/llm-test/Qwen2.5-1.5B-Instruct-CTX-Int8# mkdir -p kvcache
+root@ax650:/mnt/qtang/llm-test/Qwen2.5-1.5B-Instruct-CTX-Int8# ./run_qwen2.5_1.5b_ctx_ax650.sh
+[I][                            Init][ 107]: LLM init start
+[I][                            Init][  34]: connect http://127.0.0.1:12345 ok
+bos_id: -1, eos_id: 151645
+  3% | ██                                |   1 /  31 [0.21s<6.39s, 4.85 count/s] tokenizer init ok
+[I][                            Init][  26]: LLaMaEmbedSelector use mmap
+100% | ████████████████████████████████ |  31 /  31 [5.04s<5.04s, 6.15 count/s] init post axmodel ok,remain_cmm(9656 MB)
+[I][                            Init][ 185]: max_token_len : 2559
+[I][                            Init][ 190]: kv_cache_size : 256, kv_cache_num: 2559
+[I][                            Init][ 198]: prefill_token_num : 128
+[I][                            Init][ 202]: grp: 1, prefill_max_token_num : 1
+[I][                            Init][ 202]: grp: 2, prefill_max_token_num : 512
+[I][                            Init][ 202]: grp: 3, prefill_max_token_num : 1024
+[I][                            Init][ 202]: grp: 4, prefill_max_token_num : 1536
+[I][                            Init][ 202]: grp: 5, prefill_max_token_num : 2048
+[I][                     load_config][ 282]: load config:
+{
+    "enable_repetition_penalty": false,
+    "enable_temperature": true,
+    "enable_top_k_sampling": true,
+    "enable_top_p_sampling": false,
+    "penalty_window": 20,
+    "repetition_penalty": 1.2,
+    "temperature": 0.9,
+    "top_k": 10,
+    "top_p": 0.8
+}
+[I][                            Init][ 213]: LLM init ok
+Type "q" to exit, Ctrl+c to stop current running
+[E][                    load_kvcache][ 101]: k_cache ./kvcache/k_cache_0.bin or v_cache ./kvcache/v_cache_0.bin not exist
+[W][                            main][ 217]: load kvcache from path: ./kvcache failed,generate kvcache
+100% | ████████████████████████████████ |  53 /  53 [4.12s<4.12s, 12.85 token/s]
+[I][                      GetKVCache][ 325]: precompute_len:53
+[I][                            main][ 224]: generate kvcache to path: ./kvcache
+[I][                            main][ 226]: precompute_len: 53
+[I][                            main][ 227]: system_prompt: 你的名字叫小智（allen）,你是一个人畜无害的AI助手。深圳市今天（4月1日）阴天，愚人节，气温在14°C至19°C之间，微风。
+prompt >> who are you?
+[I][                      SetKVCache][ 354]: prefill_grpid:2 kv_cache_num:512 precompute_len:53 input_num_token:12
+[I][                             Run][ 527]: input_embed_num(12)
+[I][                             Run][ 642]: ttft: 537.06 ms
+我是Allen，一个能够回答问题、提供信息和执行任务的虚拟助手。我可以帮助你解决各种问题、做计划、玩游戏、甚至是进行一些娱乐活动。请问有什么我能帮助你的吗？
+[N][                             Run][ 756]: hit eos,avg 11.09 token/s
+[I][                      GetKVCache][ 325]: precompute_len:108
+prompt >> 今天是几号，天气怎么样
+[I][                      SetKVCache][ 354]: prefill_grpid:2 kv_cache_num:512 precompute_len:108 input_num_token:15
+[I][                             Run][ 527]: input_embed_num(15)
+[I][                             Run][ 642]: ttft: 536.81 ms
+今天是4月1日，愚人节。根据您所描述的深圳天气情况，气温在14°C至19°C之间，气温较低，建议穿着适当。希望您今天愉快！
+[N][                             Run][ 756]: hit eos,avg 11.17 token/s
+[I][                      GetKVCache][ 325]: precompute_len:166
+```
+#### Inference with M.2 Accelerator card
+[What is M.2 Accelerator card?](https://axcl-pi5-examples-cn.readthedocs.io/zh-cn/latest/index.html), Show this DEMO based on Raspberry PI 5.
+```
+(base) axera@raspberrypi:~/samples/Qwen2.5-1.5B-Instruct-CTX-Int8 $ mkdir -p kvcache
+(base) axera@raspberrypi:~/samples/Qwen2.5-1.5B-Instruct-CTX-Int8 $ ./run_qwen2.5_1.5b_ctx_axcl_aarch64.sh
+[I][                            Init][ 134]: LLM init start
+[I][                            Init][  41]: connect http://127.0.0.1:12345 ok
+bos_id: -1, eos_id: 151645
+  3% | ██                                |   1 /  31 [0.46s<14.11s, 2.20 count/s] tokenizer init ok
+[I][                            Init][  45]: LLaMaEmbedSelector use mmap
+  6% | ███                               |   2 /  31 [0.46s<7.05s, 4.40 count/s] embed_selector init ok
+[I][                             run][  30]: AXCLWorker start with devid 0
+100% | ████████████████████████████████ |  31 /  31 [29.18s<29.18s, 1.06 count/s] init post axmodel ok,remain_cmm(-1 MB)m(-1 MB)
+[I][                            Init][ 235]: max_token_len : 2559
+[I][                            Init][ 238]: kv_cache_size : 256, kv_cache_num: 2559
+[I][                            Init][ 246]: prefill_token_num : 128
+[I][                            Init][ 250]: grp: 1, prefill_max_token_num : 1
+[I][                            Init][ 250]: grp: 2, prefill_max_token_num : 512
+[I][                            Init][ 250]: grp: 3, prefill_max_token_num : 1024
+[I][                            Init][ 250]: grp: 4, prefill_max_token_num : 1536
+[I][                            Init][ 250]: grp: 5, prefill_max_token_num : 2048
+________________________
+|    ID| remain cmm(MB)|
+========================
+|     0|             -1|
+¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
+[I][                     load_config][ 282]: load config:
+{
+    "enable_repetition_penalty": false,
+    "enable_temperature": true,
+    "enable_top_k_sampling": true,
+    "enable_top_p_sampling": false,
+    "penalty_window": 20,
+    "repetition_penalty": 1.2,
+    "temperature": 0.9,
+    "top_k": 10,
+    "top_p": 0.8
+}
+[I][                            Init][ 275]: LLM init ok
+Type "q" to exit, Ctrl+c to stop current running
+[E][                    load_kvcache][ 100]: k_cache ./kvcache/k_cache_0.bin or v_cache ./kvcache/v_cache_0.bin not exist
+[W][                            main][ 223]: load kvcache from path: ./kvcache failed,generate kvcache
+100% | ████████████████████████████████ |  53 /  53 [5.06s<5.06s, 10.47 token/s]
+[I][                      GetKVCache][ 419]: precompute_len:53
+[I][                            main][ 230]: generate kvcache to path: ./kvcache
+[I][                            main][ 232]: precompute_len: 53
+[I][                            main][ 233]: system_prompt: 你的名字叫小智（allen）,你是一个人畜无害的AI助手。深圳市今天（4月1日）阴天，愚人节，气温在14°C至19°C之间，微风。
+prompt >> 你是谁
+[I][                      SetKVCache][ 448]: prefill_grpid:2 kv_cache_num:512 precompute_len:53 input_num_token:10
+[I][                             Run][ 722]: input token num : 10
+[I][                             Run][ 823]: ttft: 548.23 ms
+我是深圳市气象局发布的天气预报，我叫小智，是为了解答大家关于天气的问题而设计的。如果你对天气有疑问，欢迎随时询问！
+[N][                             Run][ 975]: hit eos,avg 9.04 token/s
+[I][                      GetKVCache][ 419]: precompute_len:98
+prompt >> 你能干什么
+[I][                      SetKVCache][ 448]: prefill_grpid:2 kv_cache_num:512 precompute_len:98 input_num_token:10
+[I][                             Run][ 722]: input token num : 10
+[I][                             Run][ 823]: ttft: 548.07 ms
+我能回答关于天气、生活、科技、文化、娱乐、历史等方面的很多问题。如果你有任何想知道的内容，都可以问我哦！
+[N][                             Run][ 975]: hit eos,avg 9.03 token/s
+[I][                      GetKVCache][ 419]: precompute_len:135
+prompt >> q
+[I][                             run][  80]: AXCLWorker exit with devid 0
+>> q
+(base) axera@raspberrypi:~ $ axcl-smi
++------------------------------------------------------------------------------------------------+
+| AXCL-SMI  V2.25.0_20250117163029                                Driver  V2.25.0_20250117163029 |
++-----------------------------------------+--------------+---------------------------------------+
+| Card  Name                     Firmware | Bus-Id       |                          Memory-Usage |
+| Fan   Temp                Pwr:Usage/Cap | CPU      NPU |                             CMM-Usage |
+|=========================================+==============+=======================================|
+|    0  AX650N                    V2.25.0 | 0000:01:00.0 |                188 MiB /      945 MiB |
+|   --   37C                      -- / -- | 1%        0% |               2335 MiB /     7040 MiB |
++-----------------------------------------+--------------+---------------------------------------+
++------------------------------------------------------------------------------------------------+
+| Processes:                                                                                     |
+| Card      PID  Process Name                                                   NPU Memory Usage |
+|================================================================================================|
+|    0   147835  /home/axera/samples/qwen2.5-1.5b-ctx/main_axcl_aarch64               1990172 KiB |
++------------------------------------------------------------------------------------------------+
+(base) axera@raspberrypi:~ $
+```