Using Optimum Intel

Using the Intel Neural Compressor to perform Quantization Aware Training
quantization
Published

September 26, 2022

Now it’s time to evaluate the Optimum Intel section of the Optimum documentation. This has two different tools available, being the Intel Neural Compressor and OpenVINO.

Intel Neural Compressor provides tools for quantization, pruning and knowledge distillation. It would be interesting to see how adding pruning into the mix affects model accuracy. I also think that some kind of quantization aware training would be appropriate to make the best model and knowledge distillation could be just that.

OpenVINO appears to be another optimization approach which is able to provide high performance inference on CPU GPU and other platforms. This may be something that can be integrated with the other techniques to provide a boost.

Intel Neural Compressor

The Neural Compressor is the main feature of the quickstart documentation and has further API documentation, while OpenVINO does not. This leads me to believe that the Neural Compressor is the more mature technology of this section and so I should spend more time on it.

As part of spending time on it I have reviewed and executed the quickstart documentation and once again found a few problems with it (documentation ticket and trainer ticket). The documentation has already been fixed, and there seems to be movement on the trainer ticket. This is great, huggingface have been very responsive.

The code here is split into three sections:

  • Initializing and loading the dataset, metrics and model
  • Training the model on SST2
  • Quantizing and Pruning the model

After all this we can then evaluate the model to see how well it performs.

Loading the Model, Dataset and Metrics

This is quite standard huggingface code that was included in the various examples. I have tried to keep this brief.

Code
from pathlib import Path
from typing import Any, Dict
from datasets import load_dataset
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    EvalPrediction,
)
import evaluate

DATA_FOLDER = Path("/data/blog/2022-09-26-optimum-intel-quantization-aware-training")
DATA_FOLDER.mkdir(exist_ok=True, parents=True)

### MODEL ###

MODEL_NAME = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)

### DATASET ###

def tokenize(row: Dict[str, Any]) -> Dict[str, Any]:
    return tokenizer(row["sentence"], return_attention_mask=False, truncation=True)

train_dataset = load_dataset("sst2", split="train")
train_dataset = train_dataset.map(tokenize)
train_dataset = train_dataset.remove_columns(["idx", "sentence"])
eval_dataset = load_dataset("sst2", split="validation")
eval_dataset = eval_dataset.map(tokenize)
eval_dataset = eval_dataset.remove_columns(["idx", "sentence"])


### METRICS ###

accuracy = evaluate.load("accuracy")
def compute_metrics(results: EvalPrediction) -> Dict[str, Any]:
    return accuracy.compute(
        predictions=results.predictions.argmax(axis=1),
        references=results.label_ids,
    )

Intel Neural Compression Training

After loading all of this we can perform the initial training run. I hope that this training makes the model more quantization aware. Since it’s just a drop in replacement for the regular trainer I am not sure to what degree it changes the training loop.

Code
from transformers import TrainingArguments, DataCollatorWithPadding
from optimum.intel.neural_compressor import IncTrainer

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# Initialize our IncTrainer
training_arguments = TrainingArguments(
    output_dir=DATA_FOLDER / "runs",
    report_to=[],
    num_train_epochs=3.0,
    evaluation_strategy="steps",
    logging_steps=500,
    eval_steps=500,
    save_steps=500,
    optim="adamw_torch",

    # cannot load best model, see https://github.com/huggingface/optimum/issues/400
    # load_best_model_at_end=True,

    # best model metrics affect which checkpoints are saved
    # the most recent checkpoint will always be saved,
    # so saving 2 checkpoints guarantees that the best model so far will be saved
    save_total_limit=2,
    metric_for_best_model="accuracy",
    greater_is_better=True,
)
trainer = IncTrainer(
    model=model,
    args=training_arguments,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

trainer.train()
[25257/25257 12:16, Epoch 3/3]
Step Training Loss Validation Loss Accuracy
500 0.416900 0.423411 0.844037
1000 0.357000 0.370419 0.872706
1500 0.326000 0.479002 0.861239
2000 0.330500 0.369814 0.860092
2500 0.299400 0.337450 0.875000
3000 0.295000 0.433883 0.875000
3500 0.289700 0.367563 0.885321
4000 0.263700 0.316120 0.887615
4500 0.253700 0.338836 0.896789
5000 0.260600 0.330739 0.877294
5500 0.255300 0.368549 0.884174
6000 0.241700 0.367694 0.895642
6500 0.246400 0.315198 0.893349
7000 0.236600 0.346849 0.887615
7500 0.231800 0.383121 0.891055
8000 0.235800 0.322974 0.902523
8500 0.223000 0.455890 0.897936
9000 0.149800 0.422777 0.895642
9500 0.169200 0.450380 0.881881
10000 0.153100 0.513537 0.877294
10500 0.141600 0.477130 0.894495
11000 0.152100 0.487463 0.889908
11500 0.152900 0.464643 0.888761
12000 0.151200 0.531085 0.883028
12500 0.154700 0.515652 0.888761
13000 0.141800 0.420183 0.901376
13500 0.162300 0.391321 0.892202
14000 0.158800 0.410795 0.887615
14500 0.156400 0.451805 0.889908
15000 0.173600 0.448858 0.878440
15500 0.148800 0.534395 0.881881
16000 0.142500 0.510981 0.887615
16500 0.143400 0.478686 0.888761
17000 0.115100 0.573710 0.885321
17500 0.079900 0.604235 0.889908
18000 0.097900 0.517512 0.889908
18500 0.073100 0.659710 0.889908
19000 0.096700 0.616234 0.888761
19500 0.090000 0.545781 0.893349
20000 0.075800 0.571489 0.895642
20500 0.089200 0.510331 0.897936
21000 0.088800 0.512357 0.892202
21500 0.080200 0.595959 0.895642
22000 0.093000 0.524824 0.891055
22500 0.095100 0.506695 0.894495
23000 0.084600 0.515244 0.894495
23500 0.088000 0.504729 0.894495
24000 0.073200 0.543100 0.894495
24500 0.086000 0.532118 0.893349
25000 0.087300 0.539065 0.894495

TrainOutput(global_step=25257, training_loss=0.17336956612754284, metrics={'train_runtime': 736.7197, 'train_samples_per_second': 274.252, 'train_steps_per_second': 34.283, 'total_flos': 1554849545444976.0, 'train_loss': 0.17336956612754284, 'epoch': 3.0})
Code
! ls {DATA_FOLDER}/runs
checkpoint-25000  checkpoint-8000
Code
! cp -r {DATA_FOLDER}/runs/checkpoint-8000 {DATA_FOLDER}/best-model
Code
from typing import TypedDict
from datasets import load_dataset
import pandas as pd

class InputRow(TypedDict):
    label: int
    title: str
    content: str

class TransformedRow(TypedDict):
    label: int
    text: str

def combine(row: InputRow) -> TransformedRow:
    title = row["title"].strip()
    if title[0] not in {".", "!", "?"}:
        title += "."
    content = row["content"].strip()
    if content:
        content = " " + content
    return {
        "label": row["label"],
        "text": title + content
    }

data = load_dataset("amazon_polarity", split="test[:3000]")
data = data.map(combine)
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:97: FutureWarning: Deprecated argument(s) used in 'dataset_info': token. Will not be supported from version '0.12'.
  warnings.warn(message, FutureWarning)
2022-09-29 00:07:42 [WARNING] Found cached dataset amazon_polarity (/home/matthew/.cache/huggingface/datasets/amazon_polarity/amazon_polarity/3.0.0/a27b32b7e7b88eb274a8fa8ba0f654f1fe998a87c22547557317793b5d2772dc)
Code
import evaluate
from datasets import load_dataset
from transformers import (
    AutoModelForSequenceClassification, AutoTokenizer
)
import pandas as pd

task_evaluator = evaluate.evaluator("text-classification")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(DATA_FOLDER / "best-model")

pd.DataFrame([
    task_evaluator.compute(
        model_or_pipeline=model,
        tokenizer=tokenizer,
        data=data,
        metric=accuracy,
        input_column="text",
        label_column="label",
        label_mapping={"LABEL_1": 1, "LABEL_0": 0},
        device=-1, # CPU
    )
])
accuracy total_time_in_seconds samples_per_second latency_in_seconds
0 0.87 58.246611 51.505142 0.019416

The checkpoint-8000 folder contains the best model that was trained during that run.

Pruning and Quantization

Code
from datasets import load_dataset
from transformers import (
    AutoModelForSequenceClassification, AutoTokenizer, pipeline
)
from evaluate import evaluator
from optimum.intel.neural_compressor import (
    IncOptimizer, IncQuantizationConfig, IncQuantizer
)
import torch

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(DATA_FOLDER / "best-model")

def eval_func(model: AutoModelForSequenceClassification) -> float:
    metrics = task_evaluator.compute(
        model_or_pipeline=model,
        tokenizer=tokenizer,
        data=data,
        input_column="text",
        label_mapping={"LABEL_1": 1, "LABEL_0": 0},
        metric="accuracy",
        device=-1, # CPU
    )
    return metrics["accuracy"]

# Load the quantization configuration detailing the quantization we wish to apply
config_path = "echarlaix/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic"
quantization_config = IncQuantizationConfig.from_pretrained(
    config_path,
    config_file_name="quantization.yml",
)

# Instantiate our IncQuantizer using the desired configuration and the evaluation function used
# for the INC accuracy-driven tuning strategy
quantizer = IncQuantizer(quantization_config, eval_func=eval_func)
optimizer = IncOptimizer(model, quantizer=quantizer)

# Apply dynamic quantization
quantized_model = optimizer.fit()

# Save the resulting model and its corresponding configuration in the given directory
optimizer.save_pretrained(DATA_FOLDER / "quantized-model")
2022-09-27 17:48:05 [INFO] Start sequential pipeline execution.
2022-09-27 17:48:05 [INFO] The 0th step being executing is QUANTIZATION.
2022-09-27 17:48:05 [INFO] Pass query framework capability elapsed time: 1.66 ms
2022-09-27 17:48:05 [INFO] Get FP32 model baseline.
2022-09-27 17:49:00 [INFO] Save tuning history to /home/matthew/Programming/Blog/blog/posts/2022/09/26/nc_workspace/2022-09-27_16-36-54/./history.snapshot.
2022-09-27 17:49:00 [INFO] FP32 baseline is: [Accuracy: 0.8700, Duration (seconds): 55.7711]
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/ao/quantization/qconfig.py:92: UserWarning: QConfigDynamic is going to be deprecated in PyTorch 1.12, please use QConfig instead
  warnings.warn("QConfigDynamic is going to be deprecated in PyTorch 1.12, please use QConfig instead")
2022-09-27 17:49:01 [INFO] |******Mixed Precision Statistics*****|
2022-09-27 17:49:01 [INFO] +----------------+----------+---------+
2022-09-27 17:49:01 [INFO] |    Op Type     |  Total   |   INT8  |
2022-09-27 17:49:01 [INFO] +----------------+----------+---------+
2022-09-27 17:49:01 [INFO] |   Embedding    |    2     |    2    |
2022-09-27 17:49:01 [INFO] |     Linear     |    38    |    38   |
2022-09-27 17:49:01 [INFO] +----------------+----------+---------+
2022-09-27 17:49:01 [INFO] Pass quantize model elapsed time: 536.44 ms
2022-09-27 17:49:40 [INFO] Tune 1 result is: [Accuracy (int8|fp32): 0.8613|0.8700, Duration (seconds) (int8|fp32): 38.7483|55.7711], Best tune result is: [Accuracy: 0.8613, Duration (seconds): 38.7483]
2022-09-27 17:49:40 [INFO] |**********************Tune Result Statistics**********************|
2022-09-27 17:49:40 [INFO] +--------------------+----------+---------------+------------------+
2022-09-27 17:49:40 [INFO] |     Info Type      | Baseline | Tune 1 result | Best tune result |
2022-09-27 17:49:40 [INFO] +--------------------+----------+---------------+------------------+
2022-09-27 17:49:40 [INFO] |      Accuracy      | 0.8700   |    0.8613     |     0.8613       |
2022-09-27 17:49:40 [INFO] | Duration (seconds) | 55.7711  |    38.7483    |     38.7483      |
2022-09-27 17:49:40 [INFO] +--------------------+----------+---------------+------------------+
2022-09-27 17:49:40 [INFO] Save tuning history to /home/matthew/Programming/Blog/blog/posts/2022/09/26/nc_workspace/2022-09-27_16-36-54/./history.snapshot.
2022-09-27 17:49:40 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.
2022-09-27 17:49:40 [INFO] Save deploy yaml to /home/matthew/Programming/Blog/blog/posts/2022/09/26/nc_workspace/2022-09-27_16-36-54/deploy.yaml
2022-09-27 17:49:40 [INFO] Model weights saved to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/quantized-model

Let’s see if this has improved the speed at all.

Code
from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification

model = IncQuantizedModelForSequenceClassification.from_pretrained(
    DATA_FOLDER / "quantized-model"
)

pd.DataFrame([
    task_evaluator.compute(
        model_or_pipeline=model,
        tokenizer=tokenizer,
        data=data,
        metric=accuracy,
        input_column="text",
        label_column="label",
        label_mapping={"LABEL_1": 1, "LABEL_0": 0},
        device=-1, # CPU
    )
])
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/ao/quantization/qconfig.py:92: UserWarning: QConfigDynamic is going to be deprecated in PyTorch 1.12, please use QConfig instead
  warnings.warn("QConfigDynamic is going to be deprecated in PyTorch 1.12, please use QConfig instead")
accuracy total_time_in_seconds samples_per_second latency_in_seconds
0 0.861333 36.69454 81.756032 0.012232

There is a notable improvement here from 51 samples / second to 82. This doesn’t compete with the ONNX quantization improvements available (see previous posts). Given that it appears to be using the ONNX approach maybe the two methods can be combined.

I’m also at a loss to find any actual pruning here. I wonder how I would find that.

Quantization Aware Training

I’ve found that the examples on GitHub explicitly mention the capability for this. Reviewing the code should make it clear how to activate this. It may be an inherent property of the IncTrainer.

The readme already indicates that there are explicit flags for the interesting parts:

python run_glue.py \
    --model_name_or_path distilbert-base-uncased \
    --task_name sst2 \
    --apply_distillation \
    --teacher_model_name_or_path distilbert-base-uncased-finetuned-sst-2-english \
    --apply_quantization \
    --quantization_approach aware_training \ # HERE
    --do_train \
    --do_eval \
    --verify_loading \
    --output_dir /tmp/sst2_output
...
python run_glue.py \
    --model_name_or_path distilbert-base-uncased-finetuned-sst-2-english \
    --task_name sst2 \
    --apply_quantization \
    --quantization_approach dynamic \
    --apply_pruning \       # HERE
    --target_sparsity 0.1 \ # HERE
    --do_train \
    --do_eval \
    --verify_loading \
    --output_dir /tmp/sst2_output

This suggests that quantization aware training is incompatible with dynamic quantization. I am going to review and implement the GLUE example here (still using Standford Sentiment Treebank (Socher et al. 2013) and Amazon Polarity (Zhang, Zhao, and LeCun 2015) so I can compare results).

Socher, Richard, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. “Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank.” In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 1631–42. Seattle, Washington, USA: Association for Computational Linguistics. https://aclanthology.org/D13-1170.
Zhang, Xiang, Junbo Zhao, and Yann LeCun. 2015. “Character-Level Convolutional Networks for Text Classification.” arXiv. https://doi.org/10.48550/ARXIV.1509.01626.

The first thing to do will be to review the code to see if I can spot where these flags are used and what changes they introduce.

Here is the quantization_approach code:

# Set quantization approach if specified
if optim_args.quantization_approach is not None:
    supported_approach = {"static", "dynamic", "aware_training"}
    if optim_args.quantization_approach not in supported_approach:
        raise ValueError(
            "Unknown quantization approach. Supported approach are " + ", ".join(supported_approach)
        )
    quant_approach = getattr(IncQuantizationMode, optim_args.quantization_approach.upper()).value
    q8_config.set_config("quantization.approach", quant_approach)

    quant_approach = IncQuantizationMode(q8_config.get_config("quantization.approach"))
    # torch FX used for post-training quantization and quantization aware training
    # dynamic quantization will be added when torch FX is more mature
    if quant_approach != IncQuantizationMode.DYNAMIC:
        if not training_args.do_train:
            raise ValueError("do_train must be set to True for static and aware training quantization.")

        q8_config.set_config("model.framework", "pytorch_fx")

The q8_config is defined as:

q8_config = IncQuantizationConfig.from_pretrained(
    optim_args.quantization_config if optim_args.quantization_config is not None else default_config,
    config_file_name="quantization.yml",
    cache_dir=model_args.cache_dir,
)

The classes come from:

from optimum.intel.neural_compressor import (
    IncDistillationConfig,
    IncDistiller,
    IncOptimizer,
    IncPruner,
    IncPruningConfig,
    IncQuantizationConfig,
    IncQuantizationMode,
    IncQuantizer,
    IncTrainer,
)

The IncQuantizationMode is an enum:

class IncQuantizationMode(Enum):

    DYNAMIC = "post_training_dynamic_quant"
    STATIC = "post_training_static_quant"
    AWARE_TRAINING = "quant_aware_training"

The code is pretty wild. It’s so easy to read. Using this code as-is on arbitrary datasets seems reasonable.

I’m going to heavily crop the code and run it, below.

Code
import logging
import os
import random
import sys
from dataclasses import dataclass, field
from typing import Optional
from pathlib import Path

import datasets
import numpy as np
import transformers
from datasets import load_dataset, load_metric
from transformers import (
    AutoConfig,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    DataCollatorWithPadding,
    EvalPrediction,
    HfArgumentParser,
    PretrainedConfig,
    TrainingArguments,
    default_data_collator,
    set_seed,
)
from transformers.trainer_utils import get_last_checkpoint
from transformers.utils import check_min_version
from transformers.utils.versions import require_version

from optimum.intel.neural_compressor import (
    IncDistillationConfig,
    IncDistiller,
    IncOptimizer,
    IncPruner,
    IncPruningConfig,
    IncQuantizationConfig,
    IncQuantizationMode,
    IncQuantizer,
    IncTrainer,
)
from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification

DATA_FOLDER = Path("/data/blog/2022-09-26-optimum-intel-quantization-aware-training")
DATA_FOLDER.mkdir(exist_ok=True, parents=True)

def train(model_name: str = "distilbert-base-uncased", epochs: float = 3.0, teacher_model_name: str = None):
    train_dataset = load_dataset("sst2", split="train")
    train_dataset = train_dataset.rename_column("sentence", "text")
    train_dataset = train_dataset.remove_columns(["idx"])
    for index in random.sample(range(len(train_dataset)), 3):
        print(f"Sample {index} of the training set: {train_dataset[index]}.")

    def combine(row):
        title = row["title"].strip()
        if title[0] not in {".", "!", "?"}:
            title += "."
        content = row["content"].strip()
        if content:
            content = " " + content
        return {
            "label": row["label"],
            "text": title + content
        }
    eval_dataset = load_dataset("amazon_polarity", split="test[:3000]")
    eval_dataset = eval_dataset.map(combine)
    eval_dataset = eval_dataset.remove_columns(["title", "content"])

    num_labels = 2
    label_list = [0, 1]
    label_to_id = dict(enumerate(label_list))


    # Load pretrained model and tokenizer
    #
    # In distributed training, the .from_pretrained methods guarantee that only one local process can concurrently
    # download model & vocab.
    config = AutoConfig.from_pretrained(
        model_name,
        num_labels=num_labels,
        finetuning_task=None,
        cache_dir=None,
        revision="main",
        use_auth_token=None,
    )
    tokenizer = AutoTokenizer.from_pretrained(
        model_name,
        cache_dir=None,
        use_fast=True,
        revision="main",
        use_auth_token=None,
    )
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name,
        from_tf=False,
        config=config,
        cache_dir=None,
        revision="main",
        use_auth_token=None,
    )
    model.config.label2id = label_to_id
    model.config.id2label = {id: label for label, id in config.label2id.items()}
    
    # batchwise padding
    padding = False
    max_seq_length = tokenizer.model_max_length

    def preprocess_function(examples):
        result = tokenizer(
            examples["text"],
            padding=padding,
            max_length=max_seq_length,
            truncation=True,
        )
        # Map labels to IDs (not necessary for GLUE tasks)
        if label_to_id is not None and "label" in examples:
            result["label"] = [(label_to_id[l] if l != -1 else -1) for l in examples["label"]]
        return result
    train_dataset = train_dataset.map(
        preprocess_function, batched=True
    )
    eval_dataset = eval_dataset.map(
        preprocess_function, batched=True
    )

    metric = load_metric("accuracy")
    metric_name = "eval_accuracy"

    # You can define your custom compute_metrics function. It takes an `EvalPrediction` object (a namedtuple with a
    # predictions and label_ids field) and has to return a dictionary string to float.
    def compute_metrics(p: EvalPrediction):
        preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
        preds = np.argmax(preds, axis=1)
        return {"accuracy": (preds == p.label_ids).astype(np.float32).mean().item()}

    data_collator = DataCollatorWithPadding(
        tokenizer=tokenizer,
        padding=True,
        return_tensors="pt",
    )
    
    training_args = TrainingArguments(
        output_dir=DATA_FOLDER / "runs",
        report_to=[],
        num_train_epochs=3.0,
        evaluation_strategy="steps",
        logging_steps=500,
        eval_steps=500,
        save_steps=500,
        optim="adamw_torch",

        # cannot load best model, see https://github.com/huggingface/optimum/issues/400
        # load_best_model_at_end=True,

        # best model metrics affect which checkpoints are saved
        # the most recent checkpoint will always be saved,
        # so saving 2 checkpoints guarantees that the best model so far will be saved
        save_total_limit=2,
        metric_for_best_model="accuracy",
        greater_is_better=True,
        no_cuda=True,
    )
        
    # Initialize our Trainer
    trainer = IncTrainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        compute_metrics=compute_metrics,
        tokenizer=tokenizer,
        data_collator=data_collator,
    )
    resume_from_checkpoint = training_args.resume_from_checkpoint
    last_checkpoint = None

    def take_eval_steps(model, trainer, metric_name, save_metrics=False):
        trainer.model = model
        metrics = trainer.evaluate()
        if save_metrics:
            trainer.save_metrics("eval", metrics)
        print("{}: {}".format(metric_name, metrics.get(metric_name)))
        print("Throughput: {} samples/sec".format(metrics.get("eval_samples_per_second")))
        return metrics[metric_name]

    def eval_func(model):
        return take_eval_steps(model, trainer, metric_name)

    def take_train_steps(model, trainer, resume_from_checkpoint, last_checkpoint):
        trainer.model_wrapped = model
        trainer.model = model
        checkpoint = None
        if resume_from_checkpoint is not None:
            checkpoint = resume_from_checkpoint
        elif last_checkpoint is not None:
            checkpoint = last_checkpoint
        train_result = trainer.train(agent, resume_from_checkpoint=checkpoint)
        metrics = train_result.metrics
        trainer.save_model()  # Saves the tokenizer too for easy upload
        trainer.log_metrics("train", metrics)
        trainer.save_metrics("train", metrics)
        trainer.save_state()
        return trainer.model

    def train_func(model):
        return take_train_steps(model, trainer, resume_from_checkpoint, last_checkpoint)
    
    def make_quantizer():
        q8_config = IncQuantizationConfig.from_pretrained(
            ".",
            config_file_name="quantization.yml",
            cache_dir=None,
        )

        # Set metric tolerance if specified
        # if optim_args.tolerance_criterion is not None:
        #     q8_config.set_tolerance(optim_args.tolerance_criterion)

        # Set quantization approach if specified
        # if optim_args.quantization_approach is not None:
        #     supported_approach = {"static", "dynamic", "aware_training"}
        #     if optim_args.quantization_approach not in supported_approach:
        #         raise ValueError(
        #             "Unknown quantization approach. Supported approach are " + ", ".join(supported_approach)
        #         )
        #     quant_approach = getattr(IncQuantizationMode, optim_args.quantization_approach.upper()).value
        #     q8_config.set_config("quantization.approach", quant_approach)
        q8_config.set_config("quantization.approach", IncQuantizationMode.AWARE_TRAINING.value)

        quant_approach = IncQuantizationMode(q8_config.get_config("quantization.approach"))
        # torch FX used for post-training quantization and quantization aware training
        # dynamic quantization will be added when torch FX is more mature
        if quant_approach != IncQuantizationMode.DYNAMIC:
            q8_config.set_config("model.framework", "pytorch_fx")

        calib_dataloader = trainer.get_train_dataloader() if quant_approach == IncQuantizationMode.STATIC else None
        quantizer = IncQuantizer(
            q8_config, eval_func=eval_func, train_func=train_func, calib_dataloader=calib_dataloader
        )
        return quantizer
    
    def make_pruner():
        pruning_config = IncPruningConfig.from_pretrained(
            ".",
            config_file_name="prune.yml",
            cache_dir=None,
        )

        # Set targeted sparsity if specified
        # if optim_args.target_sparsity is not None:
        #     pruning_config.set_config(
        #         "pruning.approach.weight_compression.target_sparsity", optim_args.target_sparsity
        #     )

        pruning_start_epoch = pruning_config.get_config("pruning.approach.weight_compression.start_epoch")
        pruning_end_epoch = pruning_config.get_config("pruning.approach.weight_compression.end_epoch")

        if pruning_start_epoch > training_args.num_train_epochs - 1:
            logger.warning(
                f"Pruning end epoch {pruning_start_epoch} is higher than the total number of training epoch "
                f"{training_args.num_train_epochs}. No pruning will be applied."
            )

        if pruning_end_epoch > training_args.num_train_epochs - 1:
            logger.warning(
                f"Pruning end epoch {pruning_end_epoch} is higher than the total number of training epoch "
                f"{training_args.num_train_epochs}. The target sparsity will not be reached."
            )

        # Creation Pruning object used for IncTrainer training loop
        pruner = IncPruner(pruning_config, eval_func=eval_func, train_func=train_func)
        return pruner
    
    def make_distiller():
        if optim_args.teacher_model_name_or_path is None:
            raise ValueError("A teacher model is needed to apply distillation.")

        if not training_args.do_train:
            raise ValueError("do_train must be set to True for distillation.")

        teacher_config = AutoConfig.from_pretrained(
            teacher_model_name,
            num_labels=num_labels,
            finetuning_task=None,
        )
        teacher_tokenizer = AutoTokenizer.from_pretrained(
            teacher_model_name,
            use_fast=True,
        )
        teacher_model = AutoModelForSequenceClassification.from_pretrained(
            teacher_model_name,
            from_tf=False,
            config=teacher_config,
        )

        teacher_model.to(training_args.device)

        if teacher_tokenizer.vocab != tokenizer.vocab:
            raise ValueError("Teacher model and student model should have same tokenizer.")

        distillation_config = IncDistillationConfig.from_pretrained(
            ".",
            config_file_name="distillation.yml",
            cache_dir=None,
        )

        # Creation Distillation object used for IncTrainer training loop
        distiller = IncDistiller(
            teacher_model=teacher_model, config=distillation_config, eval_func=eval_func, train_func=train_func
        )
        return distiller

    quantizer = make_quantizer()
    pruner = make_pruner()
    distiller = make_distiller() if teacher_model_name else None

    result_baseline_model = take_eval_steps(model, trainer, metric_name)

    optimizer = IncOptimizer(
        model,
        quantizer=quantizer,
        pruner=pruner,
        distiller=distiller,
        one_shot_optimization=True,
        eval_func=eval_func,
        train_func=train_func,
    )

    agent = optimizer.get_agent()
    optimized_model = optimizer.fit()
    result_optimized_model = take_eval_steps(optimized_model, trainer, metric_name, save_metrics=True)

    # Save the resulting model and its corresponding configuration in the given directory
    optimizer.save_pretrained(DATA_FOLDER / "example-model")
    # Compute the model's sparsity
    sparsity = optimizer.get_sparsity()

    print(
        f"Optimized model with {metric_name} of {result_optimized_model} and sparsity of {round(sparsity, 2)}% "
        f"saved to: {training_args.output_dir}. Original model had an {metric_name} of {result_baseline_model}."
    )

    # Load the model obtained after Intel Neural Compressor quantization
    q8_config = IncQuantizationConfig.from_pretrained(
            ".",
            config_file_name="quantization.yml",
            cache_dir=None,
        )
    loaded_model = IncQuantizedModelForSequenceClassification.from_pretrained(DATA_FOLDER / "example-model", inc_config=q8_config)
    loaded_model.eval()
    result_loaded_model = take_eval_steps(loaded_model, trainer, metric_name)

    if result_loaded_model != result_optimized_model:
        print("ERROR: The quantized model was not successfully loaded.")
    else:
        print("The quantized model was successfully loaded.")
Code
train()
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:97: FutureWarning: Deprecated argument(s) used in 'dataset_info': token. Will not be supported from version '0.12'.
  warnings.warn(message, FutureWarning)
2022-09-28 17:04:53 [WARNING] Using custom data configuration default
2022-09-28 17:04:53 [WARNING] Found cached dataset sst2 (/home/matthew/.cache/huggingface/datasets/sst2/default/2.0.0/9896208a8d85db057ac50c72282bcb8fe755accc671a57dd8059d4e130961ed5)
Sample 41436 of the training set: {'text': 'most romantic comedies ', 'label': 1}.
Sample 44667 of the training set: {'text': "promises is one film that 's truly deserving of its oscar nomination . ", 'label': 1}.
Sample 34844 of the training set: {'text': 'himself funny ', 'label': 1}.
2022-09-28 17:04:56 [WARNING] Found cached dataset amazon_polarity (/home/matthew/.cache/huggingface/datasets/amazon_polarity/amazon_polarity/3.0.0/a27b32b7e7b88eb274a8fa8ba0f654f1fe998a87c22547557317793b5d2772dc)
loading configuration file config.json from cache at /home/matthew/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/043235d6088ecd3dd5fb5ca3592b6913fd516027/config.json
Model config DistilBertConfig {
  "_name_or_path": "distilbert-base-uncased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.22.2",
  "vocab_size": 30522
}

loading configuration file config.json from cache at /home/matthew/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/043235d6088ecd3dd5fb5ca3592b6913fd516027/config.json
Model config DistilBertConfig {
  "_name_or_path": "distilbert-base-uncased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.22.2",
  "vocab_size": 30522
}

loading file vocab.txt from cache at /home/matthew/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/043235d6088ecd3dd5fb5ca3592b6913fd516027/vocab.txt
loading file tokenizer.json from cache at /home/matthew/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/043235d6088ecd3dd5fb5ca3592b6913fd516027/tokenizer.json
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at None
loading file tokenizer_config.json from cache at /home/matthew/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/043235d6088ecd3dd5fb5ca3592b6913fd516027/tokenizer_config.json
loading configuration file config.json from cache at /home/matthew/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/043235d6088ecd3dd5fb5ca3592b6913fd516027/config.json
Model config DistilBertConfig {
  "_name_or_path": "distilbert-base-uncased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForMaskedLM"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.22.2",
  "vocab_size": 30522
}

loading weights file pytorch_model.bin from cache at /home/matthew/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/043235d6088ecd3dd5fb5ca3592b6913fd516027/pytorch_model.bin
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.weight', 'pre_classifier.bias', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
PyTorch: setting up devices
2022-09-28 17:04:59 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
[375/375 08:35]
2022-09-28 17:06:14 [WARNING] Find different value 100 and 4 on key max_trials. Use first key-value (max_trials: 100) pair as default
2022-09-28 17:06:14 [WARNING] Find different value False and {'diagnosis_after_tuning': False, 'op_list': [], 'iteration_list': [1], 'inspect_type': 'activation', 'save_to_disk': True, 'save_path': './nc_workspace/inspect_saved/'} on key diagnosis. Use first key-value (diagnosis: False) pair as default
2022-09-28 17:06:14 [WARNING] Find different value 1978 and 9527 on key random_seed. Use first key-value (random_seed: 1978) pair as default
2022-09-28 17:06:14 [WARNING] Find different value 0.01 and 0.05 on key relative. Use first key-value (relative: 0.01) pair as default
2022-09-28 17:06:14 [INFO] Start sequential pipeline execution.
2022-09-28 17:06:14 [INFO] The 0th step being executing is COMBINATION OF PRUNING,QUANTIZATION.
2022-09-28 17:06:14 [INFO] Fx trace of the entire model failed. We will conduct auto quantization
eval_accuracy: 0.5166666507720947
Throughput: 40.304 samples/sec
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/ao/quantization/observer.py:176: UserWarning: Please use quant_min and quant_max to specify the range for observers.                     reduce_range will be deprecated in a future release of PyTorch.
  warnings.warn(
2022-09-28 17:06:15 [INFO] The following columns in the training set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
2022-09-28 17:06:15 [INFO] ***** Running training *****
2022-09-28 17:06:15 [INFO]   Num examples = 67349
2022-09-28 17:06:15 [INFO]   Num Epochs = 3
2022-09-28 17:06:15 [INFO]   Instantaneous batch size per device = 8
2022-09-28 17:06:15 [INFO]   Total train batch size (w. parallel, distributed & accumulation) = 8
2022-09-28 17:06:15 [INFO]   Gradient Accumulation steps = 1
2022-09-28 17:06:15 [INFO]   Total optimization steps = 25257
[25257/25257 6:16:30, Epoch 3/3]
Step Training Loss Validation Loss Accuracy
500 0.580000 0.661112 0.688333
1000 0.419600 0.432076 0.810333
1500 0.394600 0.395575 0.829000
2000 0.361000 0.454810 0.811000
2500 0.355000 0.426880 0.822333
3000 0.340200 0.375353 0.849333
3500 0.322000 0.402735 0.858333
4000 0.342900 0.362576 0.850333
4500 0.307400 0.468966 0.849667
5000 0.292200 0.451365 0.852333
5500 0.308300 0.474243 0.839000
6000 0.300900 0.611464 0.830667
6500 0.292600 0.521944 0.845000
7000 0.311800 0.504542 0.850667
7500 0.287000 0.608976 0.838333
8000 0.285800 0.558398 0.848667
8500 0.270900 0.739248 0.845000
9000 0.223800 0.541361 0.858000
9500 0.250000 0.545745 0.854333
10000 0.235000 0.665810 0.855333
10500 0.220800 0.698272 0.839333
11000 0.220000 0.811272 0.839333
11500 0.225500 0.704798 0.848000
12000 0.225200 0.652409 0.855000
12500 0.196900 0.730247 0.840333
13000 0.210900 0.688768 0.840667
13500 0.200400 0.693728 0.851667
14000 0.216000 0.620037 0.851000
14500 0.204000 0.568831 0.852000
15000 0.218800 0.593239 0.845000
15500 0.203500 0.678790 0.843333
16000 0.189900 0.732859 0.842667
16500 0.217800 0.570437 0.850000
17000 0.189400 0.723809 0.861333
17500 0.148800 0.735274 0.843667
18000 0.156100 0.693834 0.845000
18500 0.131600 0.733690 0.844667
19000 0.156200 0.776869 0.840000
19500 0.157700 0.761233 0.832333
20000 0.169800 0.693654 0.845000
20500 0.169200 0.704324 0.843333
21000 0.140800 0.768836 0.839667
21500 0.164700 0.791290 0.830000
22000 0.135400 0.723575 0.848667
22500 0.146100 0.790115 0.836000
23000 0.133600 0.739328 0.847000
23500 0.157800 0.732798 0.847667
24000 0.141700 0.726812 0.847333
24500 0.164400 0.736584 0.845667
25000 0.152700 0.716430 0.851333

2022-09-28 17:11:11 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 17:13:35 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-500/config.json
2022-09-28 17:13:35 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4000] due to args.save_total_limit
2022-09-28 17:18:20 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 17:21:08 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1000/config.json
2022-09-28 17:21:08 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6000] due to args.save_total_limit
2022-09-28 17:25:59 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 17:28:23 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1500/config.json
2022-09-28 17:28:23 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-500] due to args.save_total_limit
2022-09-28 17:33:08 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 17:35:32 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2000/config.json
2022-09-28 17:35:32 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1000] due to args.save_total_limit
2022-09-28 17:40:16 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 17:42:40 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2500/config.json
2022-09-28 17:42:40 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2000] due to args.save_total_limit
2022-09-28 17:47:24 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 17:49:48 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3000/config.json
2022-09-28 17:49:49 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1500] due to args.save_total_limit
2022-09-28 17:54:32 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 17:57:04 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3500/config.json
2022-09-28 17:57:04 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2500] due to args.save_total_limit
2022-09-28 18:02:03 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 18:04:27 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4000/config.json
2022-09-28 18:04:28 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3000] due to args.save_total_limit
2022-09-28 18:09:13 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 18:11:36 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4500/config.json
2022-09-28 18:11:37 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4000] due to args.save_total_limit
2022-09-28 18:16:20 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 18:18:44 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5000/config.json
2022-09-28 18:18:44 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4500] due to args.save_total_limit
2022-09-28 18:23:27 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 18:25:51 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5500/config.json
2022-09-28 18:25:51 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5000] due to args.save_total_limit
2022-09-28 18:30:34 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 18:32:58 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6000/config.json
2022-09-28 18:32:59 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5500] due to args.save_total_limit
2022-09-28 18:37:43 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 18:40:07 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6500/config.json
2022-09-28 18:40:08 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6000] due to args.save_total_limit
2022-09-28 18:44:51 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 18:47:16 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7000/config.json
2022-09-28 18:47:16 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6500] due to args.save_total_limit
2022-09-28 18:51:58 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 18:54:22 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7500/config.json
2022-09-28 18:54:22 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7000] due to args.save_total_limit
2022-09-28 18:59:06 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 19:01:29 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8000/config.json
2022-09-28 19:01:30 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7500] due to args.save_total_limit
2022-09-28 19:05:29 [INFO]                                                  Name         Shape  \
0   distilbert.embeddings.word_embeddings.module.w...  [30522, 768]   
1   distilbert.embeddings.position_embeddings.modu...    [512, 768]   
2   distilbert.transformer.layer.0.attention.q_lin...    [768, 768]   
3   distilbert.transformer.layer.0.attention.k_lin...    [768, 768]   
4   distilbert.transformer.layer.0.attention.v_lin...    [768, 768]   
5   distilbert.transformer.layer.0.attention.out_l...    [768, 768]   
6      distilbert.transformer.layer.0.ffn.lin1.weight   [3072, 768]   
7      distilbert.transformer.layer.0.ffn.lin2.weight   [768, 3072]   
8   distilbert.transformer.layer.1.attention.q_lin...    [768, 768]   
9   distilbert.transformer.layer.1.attention.k_lin...    [768, 768]   
10  distilbert.transformer.layer.1.attention.v_lin...    [768, 768]   
11  distilbert.transformer.layer.1.attention.out_l...    [768, 768]   
12     distilbert.transformer.layer.1.ffn.lin1.weight   [3072, 768]   
13     distilbert.transformer.layer.1.ffn.lin2.weight   [768, 3072]   
14  distilbert.transformer.layer.2.attention.q_lin...    [768, 768]   
15  distilbert.transformer.layer.2.attention.k_lin...    [768, 768]   
16  distilbert.transformer.layer.2.attention.v_lin...    [768, 768]   
17  distilbert.transformer.layer.2.attention.out_l...    [768, 768]   
18     distilbert.transformer.layer.2.ffn.lin1.weight   [3072, 768]   
19     distilbert.transformer.layer.2.ffn.lin2.weight   [768, 3072]   
20  distilbert.transformer.layer.3.attention.q_lin...    [768, 768]   
21  distilbert.transformer.layer.3.attention.k_lin...    [768, 768]   
22  distilbert.transformer.layer.3.attention.v_lin...    [768, 768]   
23  distilbert.transformer.layer.3.attention.out_l...    [768, 768]   
24     distilbert.transformer.layer.3.ffn.lin1.weight   [3072, 768]   
25     distilbert.transformer.layer.3.ffn.lin2.weight   [768, 3072]   
26  distilbert.transformer.layer.4.attention.q_lin...    [768, 768]   
27  distilbert.transformer.layer.4.attention.k_lin...    [768, 768]   
28  distilbert.transformer.layer.4.attention.v_lin...    [768, 768]   
29  distilbert.transformer.layer.4.attention.out_l...    [768, 768]   
30     distilbert.transformer.layer.4.ffn.lin1.weight   [3072, 768]   
31     distilbert.transformer.layer.4.ffn.lin2.weight   [768, 3072]   
32  distilbert.transformer.layer.5.attention.q_lin...    [768, 768]   
33  distilbert.transformer.layer.5.attention.k_lin...    [768, 768]   
34  distilbert.transformer.layer.5.attention.v_lin...    [768, 768]   
35  distilbert.transformer.layer.5.attention.out_l...    [768, 768]   
36     distilbert.transformer.layer.5.ffn.lin1.weight   [3072, 768]   
37     distilbert.transformer.layer.5.ffn.lin2.weight   [768, 3072]   
38                       pre_classifier.module.weight    [768, 768]   
39                           classifier.module.weight      [2, 768]   
40                                    Total sparsity:      66892800   

   NNZ (dense)  NNZ (sparse)  Sparsity(%)   Std      Mean  Abs-Mean  
0     23440896             0         0.00  0.05 -3.83e-02      0.05  
1       393216             0         0.00  0.02 -4.15e-05      0.01  
2       589824             0         0.00  0.04  5.91e-05      0.03  
3       589824             0         0.00  0.04  2.07e-05      0.03  
4       589824             0         0.00  0.03 -7.84e-05      0.03  
5       589824             0         0.00  0.03 -1.93e-05      0.03  
6      2241331        117965         5.00  0.04 -4.72e-06      0.03  
7      2241331        117965         5.00  0.04 -9.20e-05      0.03  
8       589824             0         0.00  0.06 -3.91e-06      0.04  
9       589824             0         0.00  0.06  3.94e-05      0.04  
10      589824             0         0.00  0.04 -1.24e-04      0.03  
11      589824             0         0.00  0.04 -3.29e-06      0.03  
12     2241331        117965         5.00  0.04  1.62e-04      0.03  
13     2241331        117965         5.00  0.04 -3.65e-05      0.03  
14      589824             0         0.00  0.05 -1.90e-04      0.04  
15      589824             0         0.00  0.05  4.44e-05      0.04  
16      589824             0         0.00  0.04  8.39e-05      0.03  
17      589824             0         0.00  0.04  1.40e-05      0.03  
18     2241331        117965         5.00  0.05  2.58e-04      0.04  
19     2241331        117965         5.00  0.04 -2.50e-06      0.03  
20      589824             0         0.00  0.05  1.57e-06      0.04  
21      589824             0         0.00  0.05 -8.19e-06      0.04  
22      589824             0         0.00  0.05  1.83e-04      0.04  
23      589824             0         0.00  0.04 -2.85e-05      0.03  
24     2241331        117965         5.00  0.04  6.56e-04      0.03  
25     2241331        117965         5.00  0.04  1.49e-05      0.03  
26      589824             0         0.00  0.05 -2.16e-04      0.04  
27      589824             0         0.00  0.05  1.73e-07      0.04  
28      589824             0         0.00  0.05  2.35e-05      0.04  
29      589824             0         0.00  0.04  4.81e-06      0.04  
30     2241331        117965         5.00  0.04  8.08e-04      0.03  
31     2241331        117965         5.00  0.04 -1.81e-05      0.03  
32      589824             0         0.00  0.05 -2.06e-04      0.04  
33      589824             0         0.00  0.05  2.11e-04      0.04  
34      589824             0         0.00  0.05  2.19e-05      0.04  
35      589824             0         0.00  0.05  2.24e-06      0.04  
36     2241331        117965         5.00  0.04  5.40e-04      0.03  
37     2241331        117965         5.00  0.04 -6.37e-06      0.03  
38      589824             0         0.00  0.02 -3.73e-06      0.02  
39        1536             0         0.00  0.02 -3.04e-04      0.02  
40           -       1415580         2.12  0.00  0.00e+00      0.00  
2022-09-28 19:05:29 [INFO] 2.1161918771526977
2022-09-28 19:06:27 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 19:08:50 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8500/config.json
2022-09-28 19:08:50 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8000] due to args.save_total_limit
2022-09-28 19:13:35 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 19:15:58 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9000/config.json
2022-09-28 19:15:58 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8500] due to args.save_total_limit
2022-09-28 19:20:43 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 19:23:07 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9500/config.json
2022-09-28 19:23:08 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9000] due to args.save_total_limit
2022-09-28 19:27:51 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 19:30:15 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10000/config.json
2022-09-28 19:30:15 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9500] due to args.save_total_limit
2022-09-28 19:34:59 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 19:37:24 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10500/config.json
2022-09-28 19:37:24 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10000] due to args.save_total_limit
2022-09-28 19:42:07 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 19:44:31 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11000/config.json
2022-09-28 19:44:31 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10500] due to args.save_total_limit
2022-09-28 19:49:15 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 19:51:39 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11500/config.json
2022-09-28 19:51:39 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11000] due to args.save_total_limit
2022-09-28 19:56:22 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 19:58:46 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12000/config.json
2022-09-28 19:58:46 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11500] due to args.save_total_limit
2022-09-28 20:03:31 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 20:05:55 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12500/config.json
2022-09-28 20:05:55 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12000] due to args.save_total_limit
2022-09-28 20:10:38 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 20:13:02 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13000/config.json
2022-09-28 20:13:03 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12500] due to args.save_total_limit
2022-09-28 20:17:53 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 20:20:23 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13500/config.json
2022-09-28 20:20:23 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13000] due to args.save_total_limit
2022-09-28 20:25:17 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 20:27:46 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14000/config.json
2022-09-28 20:27:46 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13500] due to args.save_total_limit
2022-09-28 20:32:40 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 20:35:11 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14500/config.json
2022-09-28 20:35:11 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14000] due to args.save_total_limit
2022-09-28 20:40:16 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 20:42:45 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15000/config.json
2022-09-28 20:42:46 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14500] due to args.save_total_limit
2022-09-28 20:47:50 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 20:50:32 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15500/config.json
2022-09-28 20:50:32 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15000] due to args.save_total_limit
2022-09-28 20:55:37 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 20:58:16 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16000/config.json
2022-09-28 20:58:16 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15500] due to args.save_total_limit
2022-09-28 21:03:20 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 21:06:04 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16500/config.json
2022-09-28 21:06:05 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16000] due to args.save_total_limit
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.0.ffn.lin1.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.0.ffn.lin2.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.1.ffn.lin1.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.1.ffn.lin2.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.2.ffn.lin1.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.2.ffn.lin2.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.3.ffn.lin1.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.3.ffn.lin2.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.4.ffn.lin1.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.4.ffn.lin2.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.5.ffn.lin1.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.5.ffn.lin2.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO]                                                  Name         Shape  \
0   distilbert.embeddings.word_embeddings.module.w...  [30522, 768]   
1   distilbert.embeddings.position_embeddings.modu...    [512, 768]   
2   distilbert.transformer.layer.0.attention.q_lin...    [768, 768]   
3   distilbert.transformer.layer.0.attention.k_lin...    [768, 768]   
4   distilbert.transformer.layer.0.attention.v_lin...    [768, 768]   
5   distilbert.transformer.layer.0.attention.out_l...    [768, 768]   
6      distilbert.transformer.layer.0.ffn.lin1.weight   [3072, 768]   
7      distilbert.transformer.layer.0.ffn.lin2.weight   [768, 3072]   
8   distilbert.transformer.layer.1.attention.q_lin...    [768, 768]   
9   distilbert.transformer.layer.1.attention.k_lin...    [768, 768]   
10  distilbert.transformer.layer.1.attention.v_lin...    [768, 768]   
11  distilbert.transformer.layer.1.attention.out_l...    [768, 768]   
12     distilbert.transformer.layer.1.ffn.lin1.weight   [3072, 768]   
13     distilbert.transformer.layer.1.ffn.lin2.weight   [768, 3072]   
14  distilbert.transformer.layer.2.attention.q_lin...    [768, 768]   
15  distilbert.transformer.layer.2.attention.k_lin...    [768, 768]   
16  distilbert.transformer.layer.2.attention.v_lin...    [768, 768]   
17  distilbert.transformer.layer.2.attention.out_l...    [768, 768]   
18     distilbert.transformer.layer.2.ffn.lin1.weight   [3072, 768]   
19     distilbert.transformer.layer.2.ffn.lin2.weight   [768, 3072]   
20  distilbert.transformer.layer.3.attention.q_lin...    [768, 768]   
21  distilbert.transformer.layer.3.attention.k_lin...    [768, 768]   
22  distilbert.transformer.layer.3.attention.v_lin...    [768, 768]   
23  distilbert.transformer.layer.3.attention.out_l...    [768, 768]   
24     distilbert.transformer.layer.3.ffn.lin1.weight   [3072, 768]   
25     distilbert.transformer.layer.3.ffn.lin2.weight   [768, 3072]   
26  distilbert.transformer.layer.4.attention.q_lin...    [768, 768]   
27  distilbert.transformer.layer.4.attention.k_lin...    [768, 768]   
28  distilbert.transformer.layer.4.attention.v_lin...    [768, 768]   
29  distilbert.transformer.layer.4.attention.out_l...    [768, 768]   
30     distilbert.transformer.layer.4.ffn.lin1.weight   [3072, 768]   
31     distilbert.transformer.layer.4.ffn.lin2.weight   [768, 3072]   
32  distilbert.transformer.layer.5.attention.q_lin...    [768, 768]   
33  distilbert.transformer.layer.5.attention.k_lin...    [768, 768]   
34  distilbert.transformer.layer.5.attention.v_lin...    [768, 768]   
35  distilbert.transformer.layer.5.attention.out_l...    [768, 768]   
36     distilbert.transformer.layer.5.ffn.lin1.weight   [3072, 768]   
37     distilbert.transformer.layer.5.ffn.lin2.weight   [768, 3072]   
38                       pre_classifier.module.weight    [768, 768]   
39                           classifier.module.weight      [2, 768]   
40                                    Total sparsity:      66892800   

   NNZ (dense)  NNZ (sparse)  Sparsity(%)   Std      Mean  Abs-Mean  
0     23440896             0         0.00  0.05 -3.83e-02      0.05  
1       393216             0         0.00  0.02 -4.17e-05      0.01  
2       589824             0         0.00  0.04  5.87e-05      0.03  
3       589824             0         0.00  0.04  2.50e-05      0.03  
4       589824             0         0.00  0.03 -7.74e-05      0.03  
5       589824             0         0.00  0.04 -2.00e-05      0.03  
6      2123366        235930        10.00  0.04 -4.01e-06      0.03  
7      2123366        235930        10.00  0.04 -9.18e-05      0.03  
8       589824             0         0.00  0.06 -7.05e-06      0.04  
9       589824             0         0.00  0.06  3.96e-05      0.04  
10      589824             0         0.00  0.04 -1.26e-04      0.03  
11      589824             0         0.00  0.04 -2.72e-06      0.03  
12     2123366        235930        10.00  0.04  1.62e-04      0.03  
13     2123366        235930        10.00  0.04 -3.54e-05      0.03  
14      589824             0         0.00  0.05 -1.91e-04      0.04  
15      589824             0         0.00  0.05  4.29e-05      0.04  
16      589824             0         0.00  0.04  8.37e-05      0.03  
17      589824             0         0.00  0.04  1.65e-05      0.03  
18     2123366        235930        10.00  0.05  2.58e-04      0.04  
19     2123366        235930        10.00  0.04 -2.79e-06      0.03  
20      589824             0         0.00  0.05  1.33e-06      0.04  
21      589824             0         0.00  0.05 -8.86e-06      0.04  
22      589824             0         0.00  0.05  1.80e-04      0.04  
23      589824             0         0.00  0.04 -2.84e-05      0.03  
24     2123366        235930        10.00  0.04  6.56e-04      0.03  
25     2123366        235930        10.00  0.04  1.45e-05      0.03  
26      589824             0         0.00  0.05 -2.18e-04      0.04  
27      589824             0         0.00  0.05 -2.13e-06      0.04  
28      589824             0         0.00  0.05  2.43e-05      0.04  
29      589824             0         0.00  0.04  4.51e-06      0.04  
30     2123366        235930        10.00  0.04  8.08e-04      0.03  
31     2123366        235930        10.00  0.04 -1.83e-05      0.03  
32      589824             0         0.00  0.05 -2.03e-04      0.04  
33      589824             0         0.00  0.05  2.12e-04      0.04  
34      589824             0         0.00  0.05  2.22e-05      0.04  
35      589824             0         0.00  0.05  2.52e-06      0.04  
36     2123366        235930        10.00  0.04  5.40e-04      0.03  
37     2123366        235930        10.00  0.04 -5.80e-06      0.03  
38      589824             0         0.00  0.02 -5.81e-05      0.02  
39        1536             0         0.00  0.02 -4.54e-04      0.02  
40           -       2831160         4.23  0.00  0.00e+00      0.00  
2022-09-28 21:09:32 [INFO] 4.2323837543053955
2022-09-28 21:11:14 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 21:14:10 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17000/config.json
2022-09-28 21:14:10 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3500] due to args.save_total_limit
2022-09-28 21:19:30 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 21:22:17 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17500/config.json
2022-09-28 21:22:17 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16500] due to args.save_total_limit
2022-09-28 21:27:27 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 21:30:18 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18000/config.json
2022-09-28 21:30:18 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17500] due to args.save_total_limit
2022-09-28 21:35:30 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 21:38:13 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18500/config.json
2022-09-28 21:38:13 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18000] due to args.save_total_limit
2022-09-28 21:43:16 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 21:46:00 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19000/config.json
2022-09-28 21:46:00 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18500] due to args.save_total_limit
2022-09-28 21:51:07 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 21:53:45 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19500/config.json
2022-09-28 21:53:45 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19000] due to args.save_total_limit
2022-09-28 21:58:48 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 22:01:36 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20000/config.json
2022-09-28 22:01:36 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19500] due to args.save_total_limit
2022-09-28 22:06:37 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 22:09:17 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20500/config.json
2022-09-28 22:09:17 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20000] due to args.save_total_limit
2022-09-28 22:14:37 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 22:17:26 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21000/config.json
2022-09-28 22:17:26 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20500] due to args.save_total_limit
2022-09-28 22:22:35 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 22:25:11 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21500/config.json
2022-09-28 22:25:12 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21000] due to args.save_total_limit
2022-09-28 22:30:21 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 22:33:07 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22000/config.json
2022-09-28 22:33:07 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21500] due to args.save_total_limit
2022-09-28 22:38:14 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 22:41:00 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22500/config.json
2022-09-28 22:41:00 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22000] due to args.save_total_limit
2022-09-28 22:46:01 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 22:48:41 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23000/config.json
2022-09-28 22:48:41 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22500] due to args.save_total_limit
2022-09-28 22:53:50 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 22:56:38 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23500/config.json
2022-09-28 22:56:38 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23000] due to args.save_total_limit
2022-09-28 23:01:54 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 23:04:40 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24000/config.json
2022-09-28 23:04:41 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23500] due to args.save_total_limit
2022-09-28 23:09:51 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 23:12:36 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24500/config.json
2022-09-28 23:12:36 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24000] due to args.save_total_limit
2022-09-28 23:17:39 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
2022-09-28 23:20:18 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-25000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-25000/config.json
2022-09-28 23:20:18 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-25000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-25000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-25000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24500] due to args.save_total_limit
2022-09-28 23:22:57 [INFO]                                                  Name         Shape  \
0   distilbert.embeddings.word_embeddings.module.w...  [30522, 768]   
1   distilbert.embeddings.position_embeddings.modu...    [512, 768]   
2   distilbert.transformer.layer.0.attention.q_lin...    [768, 768]   
3   distilbert.transformer.layer.0.attention.k_lin...    [768, 768]   
4   distilbert.transformer.layer.0.attention.v_lin...    [768, 768]   
5   distilbert.transformer.layer.0.attention.out_l...    [768, 768]   
6      distilbert.transformer.layer.0.ffn.lin1.weight   [3072, 768]   
7      distilbert.transformer.layer.0.ffn.lin2.weight   [768, 3072]   
8   distilbert.transformer.layer.1.attention.q_lin...    [768, 768]   
9   distilbert.transformer.layer.1.attention.k_lin...    [768, 768]   
10  distilbert.transformer.layer.1.attention.v_lin...    [768, 768]   
11  distilbert.transformer.layer.1.attention.out_l...    [768, 768]   
12     distilbert.transformer.layer.1.ffn.lin1.weight   [3072, 768]   
13     distilbert.transformer.layer.1.ffn.lin2.weight   [768, 3072]   
14  distilbert.transformer.layer.2.attention.q_lin...    [768, 768]   
15  distilbert.transformer.layer.2.attention.k_lin...    [768, 768]   
16  distilbert.transformer.layer.2.attention.v_lin...    [768, 768]   
17  distilbert.transformer.layer.2.attention.out_l...    [768, 768]   
18     distilbert.transformer.layer.2.ffn.lin1.weight   [3072, 768]   
19     distilbert.transformer.layer.2.ffn.lin2.weight   [768, 3072]   
20  distilbert.transformer.layer.3.attention.q_lin...    [768, 768]   
21  distilbert.transformer.layer.3.attention.k_lin...    [768, 768]   
22  distilbert.transformer.layer.3.attention.v_lin...    [768, 768]   
23  distilbert.transformer.layer.3.attention.out_l...    [768, 768]   
24     distilbert.transformer.layer.3.ffn.lin1.weight   [3072, 768]   
25     distilbert.transformer.layer.3.ffn.lin2.weight   [768, 3072]   
26  distilbert.transformer.layer.4.attention.q_lin...    [768, 768]   
27  distilbert.transformer.layer.4.attention.k_lin...    [768, 768]   
28  distilbert.transformer.layer.4.attention.v_lin...    [768, 768]   
29  distilbert.transformer.layer.4.attention.out_l...    [768, 768]   
30     distilbert.transformer.layer.4.ffn.lin1.weight   [3072, 768]   
31     distilbert.transformer.layer.4.ffn.lin2.weight   [768, 3072]   
32  distilbert.transformer.layer.5.attention.q_lin...    [768, 768]   
33  distilbert.transformer.layer.5.attention.k_lin...    [768, 768]   
34  distilbert.transformer.layer.5.attention.v_lin...    [768, 768]   
35  distilbert.transformer.layer.5.attention.out_l...    [768, 768]   
36     distilbert.transformer.layer.5.ffn.lin1.weight   [3072, 768]   
37     distilbert.transformer.layer.5.ffn.lin2.weight   [768, 3072]   
38                       pre_classifier.module.weight    [768, 768]   
39                           classifier.module.weight      [2, 768]   
40                                    Total sparsity:      66892800   

   NNZ (dense)  NNZ (sparse)  Sparsity(%)   Std      Mean  Abs-Mean  
0     23440896             0         0.00  0.05 -3.83e-02      0.05  
1       393216             0         0.00  0.02 -4.15e-05      0.01  
2       589824             0         0.00  0.04  5.91e-05      0.03  
3       589824             0         0.00  0.04  2.48e-05      0.03  
4       589824             0         0.00  0.03 -7.52e-05      0.03  
5       589824             0         0.00  0.04 -1.98e-05      0.03  
6      2123366        235930        10.00  0.04 -4.01e-06      0.03  
7      2123366        235930        10.00  0.04 -9.18e-05      0.03  
8       589824             0         0.00  0.06 -7.70e-06      0.04  
9       589824             0         0.00  0.06  3.87e-05      0.04  
10      589824             0         0.00  0.04 -1.25e-04      0.03  
11      589824             0         0.00  0.04 -2.65e-06      0.03  
12     2123366        235930        10.00  0.04  1.62e-04      0.03  
13     2123366        235930        10.00  0.04 -3.54e-05      0.03  
14      589824             0         0.00  0.05 -1.92e-04      0.04  
15      589824             0         0.00  0.05  4.25e-05      0.04  
16      589824             0         0.00  0.04  8.36e-05      0.03  
17      589824             0         0.00  0.04  1.64e-05      0.03  
18     2123366        235930        10.00  0.05  2.58e-04      0.04  
19     2123366        235930        10.00  0.04 -2.79e-06      0.03  
20      589824             0         0.00  0.05  2.05e-06      0.04  
21      589824             0         0.00  0.05 -8.60e-06      0.04  
22      589824             0         0.00  0.05  1.81e-04      0.04  
23      589824             0         0.00  0.04 -2.81e-05      0.03  
24     2123366        235930        10.00  0.04  6.56e-04      0.03  
25     2123366        235930        10.00  0.04  1.45e-05      0.03  
26      589824             0         0.00  0.05 -2.19e-04      0.04  
27      589824             0         0.00  0.05 -2.40e-06      0.04  
28      589824             0         0.00  0.05  2.38e-05      0.04  
29      589824             0         0.00  0.04  4.27e-06      0.04  
30     2123366        235930        10.00  0.04  8.08e-04      0.03  
31     2123366        235930        10.00  0.04 -1.83e-05      0.03  
32      589824             0         0.00  0.05 -2.03e-04      0.04  
33      589824             0         0.00  0.05  2.12e-04      0.04  
34      589824             0         0.00  0.05  2.31e-05      0.04  
35      589824             0         0.00  0.05  2.60e-06      0.04  
36     2123366        235930        10.00  0.04  5.40e-04      0.03  
37     2123366        235930        10.00  0.04 -5.80e-06      0.03  
38      589824             0         0.00  0.02 -6.99e-05      0.02  
39        1536             0         0.00  0.02 -6.27e-04      0.02  
40           -       2831160         4.23  0.00  0.00e+00      0.00  
2022-09-28 23:22:57 [INFO] 4.2323837543053955
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/nn/quantized/_reference/modules/utils.py:25: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  torch.tensor(weight_qparams["scale"], dtype=torch.float, device=device))
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/nn/quantized/_reference/modules/utils.py:28: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  torch.tensor(weight_qparams["zero_point"], dtype=zero_point_dtype, device=device))
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/ao/quantization/observer.py:176: UserWarning: Please use quant_min and quant_max to specify the range for observers.                     reduce_range will be deprecated in a future release of PyTorch.
  warnings.warn(
2022-09-28 23:22:59 [INFO] 

Training completed. Do not forget to share your model on huggingface.co/models =)


2022-09-28 23:22:59 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/config.json
2022-09-28 23:22:59 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/special_tokens_map.json
2022-09-28 23:22:59 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
***** train metrics *****
  epoch                    =        3.0
  total_flos               =  1445093GF
  train_loss               =     0.2358
  train_runtime            = 6:16:43.78
  train_samples_per_second =      8.939
  train_steps_per_second   =      1.117
[375/375 03:24]
2022-09-28 23:24:07 [INFO] Evaluated model score is 0.8473333120346069.
2022-09-28 23:24:07 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
eval_accuracy: 0.8473333120346069
Throughput: 44.556 samples/sec
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/example-model/config.json
eval_accuracy: 0.8473333120346069
Throughput: 43.912 samples/sec
2022-09-28 23:25:15 [INFO] Model weights saved to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/example-model
loading configuration file /data/blog/2022-09-26-optimum-intel-quantization-aware-training/example-model/config.json
Model config DistilBertConfig {
  "_name_or_path": "/data/blog/2022-09-26-optimum-intel-quantization-aware-training/example-model",
  "activation": "gelu",
  "architectures": [
    "DistilBertForSequenceClassification"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": 0,
    "1": 1
  },
  "initializer_range": 0.02,
  "label2id": {
    "0": 0,
    "1": 1
  },
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "problem_type": "single_label_classification",
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "torch_dtype": "float32",
  "transformers_version": "4.22.2",
  "vocab_size": 30522
}

loading configuration file /data/blog/2022-09-26-optimum-intel-quantization-aware-training/example-model/config.json
Model config DistilBertConfig {
  "_name_or_path": "distilbert-base-uncased",
  "activation": "gelu",
  "architectures": [
    "DistilBertForSequenceClassification"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "id2label": {
    "0": 0,
    "1": 1
  },
  "initializer_range": 0.02,
  "label2id": {
    "0": 0,
    "1": 1
  },
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "problem_type": "single_label_classification",
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "torch_dtype": "float32",
  "transformers_version": "4.22.2",
  "vocab_size": 30522
}

loading weights file /data/blog/2022-09-26-optimum-intel-quantization-aware-training/example-model/pytorch_model.bin
Optimized model with eval_accuracy of 0.8473333120346069 and sparsity of 4.5% saved to: /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs. Original model had an eval_accuracy of 0.5166666507720947.
All model checkpoint weights were used when initializing DistilBertForSequenceClassification.

All the weights of DistilBertForSequenceClassification were initialized from the model checkpoint at /data/blog/2022-09-26-optimum-intel-quantization-aware-training/example-model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use DistilBertForSequenceClassification for predictions without further training.
2022-09-28 23:25:16 [INFO] Fx trace of the entire model failed. We will conduct auto quantization
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/ao/quantization/observer.py:176: UserWarning: Please use quant_min and quant_max to specify the range for observers.                     reduce_range will be deprecated in a future release of PyTorch.
  warnings.warn(
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/nn/quantized/_reference/modules/utils.py:25: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  torch.tensor(weight_qparams["scale"], dtype=torch.float, device=device))
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/nn/quantized/_reference/modules/utils.py:28: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  torch.tensor(weight_qparams["zero_point"], dtype=zero_point_dtype, device=device))
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/ao/quantization/utils.py:280: UserWarning: must run observer before calling calculate_qparams. Returning default values.
  warnings.warn(
2022-09-28 23:25:18 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 3000
  Batch size = 8
eval_accuracy: 0.8473333120346069
Throughput: 45.424 samples/sec
The quantized model was successfully loaded.
Code
from optimum.intel.neural_compressor.quantization import IncQuantizedModelForSequenceClassification
from transformers import AutoTokenizer
import pandas as pd
import evaluate

model = IncQuantizedModelForSequenceClassification.from_pretrained(
    DATA_FOLDER / "example-model"
)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

task_evaluator = evaluate.evaluator("text-classification")
accuracy = evaluate.load("accuracy")
pd.DataFrame([
    task_evaluator.compute(
        model_or_pipeline=model,
        tokenizer=tokenizer,
        data=data,
        metric=accuracy,
        input_column="text",
        label_column="label",
        label_mapping={0: 0, 1: 1},
        device=-1, # CPU
    )
])
2022-09-29 00:16:42 [INFO] Fx trace of the entire model failed. We will conduct auto quantization
Disabling tokenizer parallelism, we're using DataLoader multithreading already
accuracy total_time_in_seconds samples_per_second latency_in_seconds
0 0.84 42.08 71.29 0.01