nyu-mll/glue
Viewer • Updated • 1.49M • 486k • 502
How to use echarlaix/distilbert-sst2-inc-dynamic-quantization-magnitude-pruning-0.1 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="echarlaix/distilbert-sst2-inc-dynamic-quantization-magnitude-pruning-0.1") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("echarlaix/distilbert-sst2-inc-dynamic-quantization-magnitude-pruning-0.1")
model = AutoModelForSequenceClassification.from_pretrained("echarlaix/distilbert-sst2-inc-dynamic-quantization-magnitude-pruning-0.1")Model Description: This model is a DistilBERT fine-tuned on SST-2 dynamically quantized and pruned using a magnitude pruning strategy to obtain a sparsity of 10% with optimum-intel through the usage of Intel® Neural Compressor.
This requires to install Optimum :
pip install optimum[neural-compressor]
To load the quantized model and run inference using the Transformers pipelines, you can do as follows:
from transformers import AutoTokenizer, pipeline
from optimum.intel import INCModelForSequenceClassification
model_id = "echarlaix/distilbert-sst2-inc-dynamic-quantization-magnitude-pruning-0.1"
model = INCModelForSequenceClassification.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
cls_pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
text = "He's a dreadful magician."
outputs = cls_pipe(text)