1kxia commited on
Commit
23ad6cc
·
verified ·
1 Parent(s): 6820eba

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +1 -16
README.md CHANGED
@@ -7,7 +7,7 @@ tags:
7
  - quantized
8
  - embedding
9
  - nvidia-modelopt
10
- - nanojet
11
  pipeline_tag: feature-extraction
12
  ---
13
 
@@ -78,21 +78,6 @@ All configurations achieve >0.99 cosine similarity with the BF16 baseline.
78
  └── generation_config.json # Generation config
79
  ```
80
 
81
- ## Usage
82
-
83
- This model is designed to be used with the [NanoJet](https://github.com/ai-microsoft/NanoJet_Kernels) inference engine:
84
-
85
- ```python
86
- from test.infrastructure.model_utils import load_nanojet_model
87
-
88
- model = load_nanojet_model(
89
- "1kxia/gemma-3-270m-modelopt-fp8",
90
- batch=8,
91
- seq_len=4096,
92
- quantization="fp8"
93
- )
94
- ```
95
-
96
  ## Intended Use
97
 
98
  This model is intended for efficient FP8 inference on NVIDIA GPUs with FP8 support (Hopper architecture and above).
 
7
  - quantized
8
  - embedding
9
  - nvidia-modelopt
10
+
11
  pipeline_tag: feature-extraction
12
  ---
13
 
 
78
  └── generation_config.json # Generation config
79
  ```
80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
  ## Intended Use
82
 
83
  This model is intended for efficient FP8 inference on NVIDIA GPUs with FP8 support (Hopper architecture and above).