The readme already indicates that there are explicit flags for the interesting parts:
This suggests that quantization aware training is incompatible with dynamic quantization. I am going to review and implement the GLUE example here (still using Standford Sentiment Treebank (Socher et al. 2013) and Amazon Polarity (Zhang, Zhao, and LeCun 2015) so I can compare results).
The first thing to do will be to review the code to see if I can spot where these flags are used and what changes they introduce.
The code is pretty wild. It’s so easy to read. Using this code as-is on arbitrary datasets seems reasonable.
I’m going to heavily crop the code and run it, below.
Code
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:97: FutureWarning: Deprecated argument(s) used in 'dataset_info': token. Will not be supported from version '0.12'.
warnings.warn(message, FutureWarning)
2022-09-28 17:04:53 [WARNING] Using custom data configuration default
2022-09-28 17:04:53 [WARNING] Found cached dataset sst2 (/home/matthew/.cache/huggingface/datasets/sst2/default/2.0.0/9896208a8d85db057ac50c72282bcb8fe755accc671a57dd8059d4e130961ed5)
Sample 41436 of the training set: {'text': 'most romantic comedies ', 'label': 1}.
Sample 44667 of the training set: {'text': "promises is one film that 's truly deserving of its oscar nomination . ", 'label': 1}.
Sample 34844 of the training set: {'text': 'himself funny ', 'label': 1}.
2022-09-28 17:04:56 [WARNING] Found cached dataset amazon_polarity (/home/matthew/.cache/huggingface/datasets/amazon_polarity/amazon_polarity/3.0.0/a27b32b7e7b88eb274a8fa8ba0f654f1fe998a87c22547557317793b5d2772dc)
loading configuration file config.json from cache at /home/matthew/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/043235d6088ecd3dd5fb5ca3592b6913fd516027/config.json
Model config DistilBertConfig {
"_name_or_path": "distilbert-base-uncased",
"activation": "gelu",
"architectures": [
"DistilBertForMaskedLM"
],
"attention_dropout": 0.1,
"dim": 768,
"dropout": 0.1,
"hidden_dim": 3072,
"initializer_range": 0.02,
"max_position_embeddings": 512,
"model_type": "distilbert",
"n_heads": 12,
"n_layers": 6,
"pad_token_id": 0,
"qa_dropout": 0.1,
"seq_classif_dropout": 0.2,
"sinusoidal_pos_embds": false,
"tie_weights_": true,
"transformers_version": "4.22.2",
"vocab_size": 30522
}
loading configuration file config.json from cache at /home/matthew/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/043235d6088ecd3dd5fb5ca3592b6913fd516027/config.json
Model config DistilBertConfig {
"_name_or_path": "distilbert-base-uncased",
"activation": "gelu",
"architectures": [
"DistilBertForMaskedLM"
],
"attention_dropout": 0.1,
"dim": 768,
"dropout": 0.1,
"hidden_dim": 3072,
"initializer_range": 0.02,
"max_position_embeddings": 512,
"model_type": "distilbert",
"n_heads": 12,
"n_layers": 6,
"pad_token_id": 0,
"qa_dropout": 0.1,
"seq_classif_dropout": 0.2,
"sinusoidal_pos_embds": false,
"tie_weights_": true,
"transformers_version": "4.22.2",
"vocab_size": 30522
}
loading file vocab.txt from cache at /home/matthew/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/043235d6088ecd3dd5fb5ca3592b6913fd516027/vocab.txt
loading file tokenizer.json from cache at /home/matthew/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/043235d6088ecd3dd5fb5ca3592b6913fd516027/tokenizer.json
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at None
loading file tokenizer_config.json from cache at /home/matthew/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/043235d6088ecd3dd5fb5ca3592b6913fd516027/tokenizer_config.json
loading configuration file config.json from cache at /home/matthew/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/043235d6088ecd3dd5fb5ca3592b6913fd516027/config.json
Model config DistilBertConfig {
"_name_or_path": "distilbert-base-uncased",
"activation": "gelu",
"architectures": [
"DistilBertForMaskedLM"
],
"attention_dropout": 0.1,
"dim": 768,
"dropout": 0.1,
"hidden_dim": 3072,
"initializer_range": 0.02,
"max_position_embeddings": 512,
"model_type": "distilbert",
"n_heads": 12,
"n_layers": 6,
"pad_token_id": 0,
"qa_dropout": 0.1,
"seq_classif_dropout": 0.2,
"sinusoidal_pos_embds": false,
"tie_weights_": true,
"transformers_version": "4.22.2",
"vocab_size": 30522
}
loading weights file pytorch_model.bin from cache at /home/matthew/.cache/huggingface/hub/models--distilbert-base-uncased/snapshots/043235d6088ecd3dd5fb5ca3592b6913fd516027/pytorch_model.bin
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.weight', 'pre_classifier.bias', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
PyTorch: setting up devices
2022-09-28 17:04:59 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
2022-09-28 17:06:14 [WARNING] Find different value 100 and 4 on key max_trials. Use first key-value (max_trials: 100) pair as default
2022-09-28 17:06:14 [WARNING] Find different value False and {'diagnosis_after_tuning': False, 'op_list': [], 'iteration_list': [1], 'inspect_type': 'activation', 'save_to_disk': True, 'save_path': './nc_workspace/inspect_saved/'} on key diagnosis. Use first key-value (diagnosis: False) pair as default
2022-09-28 17:06:14 [WARNING] Find different value 1978 and 9527 on key random_seed. Use first key-value (random_seed: 1978) pair as default
2022-09-28 17:06:14 [WARNING] Find different value 0.01 and 0.05 on key relative. Use first key-value (relative: 0.01) pair as default
2022-09-28 17:06:14 [INFO] Start sequential pipeline execution.
2022-09-28 17:06:14 [INFO] The 0th step being executing is COMBINATION OF PRUNING,QUANTIZATION.
2022-09-28 17:06:14 [INFO] Fx trace of the entire model failed. We will conduct auto quantization
eval_accuracy: 0.5166666507720947
Throughput: 40.304 samples/sec
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/ao/quantization/observer.py:176: UserWarning: Please use quant_min and quant_max to specify the range for observers. reduce_range will be deprecated in a future release of PyTorch.
warnings.warn(
2022-09-28 17:06:15 [INFO] The following columns in the training set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
2022-09-28 17:06:15 [INFO] ***** Running training *****
2022-09-28 17:06:15 [INFO] Num examples = 67349
2022-09-28 17:06:15 [INFO] Num Epochs = 3
2022-09-28 17:06:15 [INFO] Instantaneous batch size per device = 8
2022-09-28 17:06:15 [INFO] Total train batch size (w. parallel, distributed & accumulation) = 8
2022-09-28 17:06:15 [INFO] Gradient Accumulation steps = 1
2022-09-28 17:06:15 [INFO] Total optimization steps = 25257
[25257/25257 6:16:30, Epoch 3/3]
Step |
Training Loss |
Validation Loss |
Accuracy |
500 |
0.580000 |
0.661112 |
0.688333 |
1000 |
0.419600 |
0.432076 |
0.810333 |
1500 |
0.394600 |
0.395575 |
0.829000 |
2000 |
0.361000 |
0.454810 |
0.811000 |
2500 |
0.355000 |
0.426880 |
0.822333 |
3000 |
0.340200 |
0.375353 |
0.849333 |
3500 |
0.322000 |
0.402735 |
0.858333 |
4000 |
0.342900 |
0.362576 |
0.850333 |
4500 |
0.307400 |
0.468966 |
0.849667 |
5000 |
0.292200 |
0.451365 |
0.852333 |
5500 |
0.308300 |
0.474243 |
0.839000 |
6000 |
0.300900 |
0.611464 |
0.830667 |
6500 |
0.292600 |
0.521944 |
0.845000 |
7000 |
0.311800 |
0.504542 |
0.850667 |
7500 |
0.287000 |
0.608976 |
0.838333 |
8000 |
0.285800 |
0.558398 |
0.848667 |
8500 |
0.270900 |
0.739248 |
0.845000 |
9000 |
0.223800 |
0.541361 |
0.858000 |
9500 |
0.250000 |
0.545745 |
0.854333 |
10000 |
0.235000 |
0.665810 |
0.855333 |
10500 |
0.220800 |
0.698272 |
0.839333 |
11000 |
0.220000 |
0.811272 |
0.839333 |
11500 |
0.225500 |
0.704798 |
0.848000 |
12000 |
0.225200 |
0.652409 |
0.855000 |
12500 |
0.196900 |
0.730247 |
0.840333 |
13000 |
0.210900 |
0.688768 |
0.840667 |
13500 |
0.200400 |
0.693728 |
0.851667 |
14000 |
0.216000 |
0.620037 |
0.851000 |
14500 |
0.204000 |
0.568831 |
0.852000 |
15000 |
0.218800 |
0.593239 |
0.845000 |
15500 |
0.203500 |
0.678790 |
0.843333 |
16000 |
0.189900 |
0.732859 |
0.842667 |
16500 |
0.217800 |
0.570437 |
0.850000 |
17000 |
0.189400 |
0.723809 |
0.861333 |
17500 |
0.148800 |
0.735274 |
0.843667 |
18000 |
0.156100 |
0.693834 |
0.845000 |
18500 |
0.131600 |
0.733690 |
0.844667 |
19000 |
0.156200 |
0.776869 |
0.840000 |
19500 |
0.157700 |
0.761233 |
0.832333 |
20000 |
0.169800 |
0.693654 |
0.845000 |
20500 |
0.169200 |
0.704324 |
0.843333 |
21000 |
0.140800 |
0.768836 |
0.839667 |
21500 |
0.164700 |
0.791290 |
0.830000 |
22000 |
0.135400 |
0.723575 |
0.848667 |
22500 |
0.146100 |
0.790115 |
0.836000 |
23000 |
0.133600 |
0.739328 |
0.847000 |
23500 |
0.157800 |
0.732798 |
0.847667 |
24000 |
0.141700 |
0.726812 |
0.847333 |
24500 |
0.164400 |
0.736584 |
0.845667 |
25000 |
0.152700 |
0.716430 |
0.851333 |
2022-09-28 17:11:11 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 17:13:35 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-500/config.json
2022-09-28 17:13:35 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4000] due to args.save_total_limit
2022-09-28 17:18:20 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 17:21:08 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1000/config.json
2022-09-28 17:21:08 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6000] due to args.save_total_limit
2022-09-28 17:25:59 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 17:28:23 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1500/config.json
2022-09-28 17:28:23 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-500] due to args.save_total_limit
2022-09-28 17:33:08 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 17:35:32 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2000/config.json
2022-09-28 17:35:32 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1000] due to args.save_total_limit
2022-09-28 17:40:16 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 17:42:40 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2500/config.json
2022-09-28 17:42:40 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2000] due to args.save_total_limit
2022-09-28 17:47:24 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 17:49:48 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3000/config.json
2022-09-28 17:49:49 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-1500] due to args.save_total_limit
2022-09-28 17:54:32 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 17:57:04 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3500/config.json
2022-09-28 17:57:04 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-2500] due to args.save_total_limit
2022-09-28 18:02:03 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 18:04:27 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4000/config.json
2022-09-28 18:04:28 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3000] due to args.save_total_limit
2022-09-28 18:09:13 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 18:11:36 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4500/config.json
2022-09-28 18:11:37 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4000] due to args.save_total_limit
2022-09-28 18:16:20 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 18:18:44 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5000/config.json
2022-09-28 18:18:44 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-4500] due to args.save_total_limit
2022-09-28 18:23:27 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 18:25:51 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5500/config.json
2022-09-28 18:25:51 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5000] due to args.save_total_limit
2022-09-28 18:30:34 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 18:32:58 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6000/config.json
2022-09-28 18:32:59 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-5500] due to args.save_total_limit
2022-09-28 18:37:43 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 18:40:07 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6500/config.json
2022-09-28 18:40:08 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6000] due to args.save_total_limit
2022-09-28 18:44:51 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 18:47:16 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7000/config.json
2022-09-28 18:47:16 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-6500] due to args.save_total_limit
2022-09-28 18:51:58 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 18:54:22 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7500/config.json
2022-09-28 18:54:22 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7000] due to args.save_total_limit
2022-09-28 18:59:06 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 19:01:29 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8000/config.json
2022-09-28 19:01:30 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-7500] due to args.save_total_limit
2022-09-28 19:05:29 [INFO] Name Shape \
0 distilbert.embeddings.word_embeddings.module.w... [30522, 768]
1 distilbert.embeddings.position_embeddings.modu... [512, 768]
2 distilbert.transformer.layer.0.attention.q_lin... [768, 768]
3 distilbert.transformer.layer.0.attention.k_lin... [768, 768]
4 distilbert.transformer.layer.0.attention.v_lin... [768, 768]
5 distilbert.transformer.layer.0.attention.out_l... [768, 768]
6 distilbert.transformer.layer.0.ffn.lin1.weight [3072, 768]
7 distilbert.transformer.layer.0.ffn.lin2.weight [768, 3072]
8 distilbert.transformer.layer.1.attention.q_lin... [768, 768]
9 distilbert.transformer.layer.1.attention.k_lin... [768, 768]
10 distilbert.transformer.layer.1.attention.v_lin... [768, 768]
11 distilbert.transformer.layer.1.attention.out_l... [768, 768]
12 distilbert.transformer.layer.1.ffn.lin1.weight [3072, 768]
13 distilbert.transformer.layer.1.ffn.lin2.weight [768, 3072]
14 distilbert.transformer.layer.2.attention.q_lin... [768, 768]
15 distilbert.transformer.layer.2.attention.k_lin... [768, 768]
16 distilbert.transformer.layer.2.attention.v_lin... [768, 768]
17 distilbert.transformer.layer.2.attention.out_l... [768, 768]
18 distilbert.transformer.layer.2.ffn.lin1.weight [3072, 768]
19 distilbert.transformer.layer.2.ffn.lin2.weight [768, 3072]
20 distilbert.transformer.layer.3.attention.q_lin... [768, 768]
21 distilbert.transformer.layer.3.attention.k_lin... [768, 768]
22 distilbert.transformer.layer.3.attention.v_lin... [768, 768]
23 distilbert.transformer.layer.3.attention.out_l... [768, 768]
24 distilbert.transformer.layer.3.ffn.lin1.weight [3072, 768]
25 distilbert.transformer.layer.3.ffn.lin2.weight [768, 3072]
26 distilbert.transformer.layer.4.attention.q_lin... [768, 768]
27 distilbert.transformer.layer.4.attention.k_lin... [768, 768]
28 distilbert.transformer.layer.4.attention.v_lin... [768, 768]
29 distilbert.transformer.layer.4.attention.out_l... [768, 768]
30 distilbert.transformer.layer.4.ffn.lin1.weight [3072, 768]
31 distilbert.transformer.layer.4.ffn.lin2.weight [768, 3072]
32 distilbert.transformer.layer.5.attention.q_lin... [768, 768]
33 distilbert.transformer.layer.5.attention.k_lin... [768, 768]
34 distilbert.transformer.layer.5.attention.v_lin... [768, 768]
35 distilbert.transformer.layer.5.attention.out_l... [768, 768]
36 distilbert.transformer.layer.5.ffn.lin1.weight [3072, 768]
37 distilbert.transformer.layer.5.ffn.lin2.weight [768, 3072]
38 pre_classifier.module.weight [768, 768]
39 classifier.module.weight [2, 768]
40 Total sparsity: 66892800
NNZ (dense) NNZ (sparse) Sparsity(%) Std Mean Abs-Mean
0 23440896 0 0.00 0.05 -3.83e-02 0.05
1 393216 0 0.00 0.02 -4.15e-05 0.01
2 589824 0 0.00 0.04 5.91e-05 0.03
3 589824 0 0.00 0.04 2.07e-05 0.03
4 589824 0 0.00 0.03 -7.84e-05 0.03
5 589824 0 0.00 0.03 -1.93e-05 0.03
6 2241331 117965 5.00 0.04 -4.72e-06 0.03
7 2241331 117965 5.00 0.04 -9.20e-05 0.03
8 589824 0 0.00 0.06 -3.91e-06 0.04
9 589824 0 0.00 0.06 3.94e-05 0.04
10 589824 0 0.00 0.04 -1.24e-04 0.03
11 589824 0 0.00 0.04 -3.29e-06 0.03
12 2241331 117965 5.00 0.04 1.62e-04 0.03
13 2241331 117965 5.00 0.04 -3.65e-05 0.03
14 589824 0 0.00 0.05 -1.90e-04 0.04
15 589824 0 0.00 0.05 4.44e-05 0.04
16 589824 0 0.00 0.04 8.39e-05 0.03
17 589824 0 0.00 0.04 1.40e-05 0.03
18 2241331 117965 5.00 0.05 2.58e-04 0.04
19 2241331 117965 5.00 0.04 -2.50e-06 0.03
20 589824 0 0.00 0.05 1.57e-06 0.04
21 589824 0 0.00 0.05 -8.19e-06 0.04
22 589824 0 0.00 0.05 1.83e-04 0.04
23 589824 0 0.00 0.04 -2.85e-05 0.03
24 2241331 117965 5.00 0.04 6.56e-04 0.03
25 2241331 117965 5.00 0.04 1.49e-05 0.03
26 589824 0 0.00 0.05 -2.16e-04 0.04
27 589824 0 0.00 0.05 1.73e-07 0.04
28 589824 0 0.00 0.05 2.35e-05 0.04
29 589824 0 0.00 0.04 4.81e-06 0.04
30 2241331 117965 5.00 0.04 8.08e-04 0.03
31 2241331 117965 5.00 0.04 -1.81e-05 0.03
32 589824 0 0.00 0.05 -2.06e-04 0.04
33 589824 0 0.00 0.05 2.11e-04 0.04
34 589824 0 0.00 0.05 2.19e-05 0.04
35 589824 0 0.00 0.05 2.24e-06 0.04
36 2241331 117965 5.00 0.04 5.40e-04 0.03
37 2241331 117965 5.00 0.04 -6.37e-06 0.03
38 589824 0 0.00 0.02 -3.73e-06 0.02
39 1536 0 0.00 0.02 -3.04e-04 0.02
40 - 1415580 2.12 0.00 0.00e+00 0.00
2022-09-28 19:05:29 [INFO] 2.1161918771526977
2022-09-28 19:06:27 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 19:08:50 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8500/config.json
2022-09-28 19:08:50 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8000] due to args.save_total_limit
2022-09-28 19:13:35 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 19:15:58 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9000/config.json
2022-09-28 19:15:58 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-8500] due to args.save_total_limit
2022-09-28 19:20:43 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 19:23:07 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9500/config.json
2022-09-28 19:23:08 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9000] due to args.save_total_limit
2022-09-28 19:27:51 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 19:30:15 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10000/config.json
2022-09-28 19:30:15 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-9500] due to args.save_total_limit
2022-09-28 19:34:59 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 19:37:24 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10500/config.json
2022-09-28 19:37:24 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10000] due to args.save_total_limit
2022-09-28 19:42:07 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 19:44:31 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11000/config.json
2022-09-28 19:44:31 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-10500] due to args.save_total_limit
2022-09-28 19:49:15 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 19:51:39 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11500/config.json
2022-09-28 19:51:39 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11000] due to args.save_total_limit
2022-09-28 19:56:22 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 19:58:46 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12000/config.json
2022-09-28 19:58:46 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-11500] due to args.save_total_limit
2022-09-28 20:03:31 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 20:05:55 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12500/config.json
2022-09-28 20:05:55 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12000] due to args.save_total_limit
2022-09-28 20:10:38 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 20:13:02 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13000/config.json
2022-09-28 20:13:03 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-12500] due to args.save_total_limit
2022-09-28 20:17:53 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 20:20:23 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13500/config.json
2022-09-28 20:20:23 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13000] due to args.save_total_limit
2022-09-28 20:25:17 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 20:27:46 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14000/config.json
2022-09-28 20:27:46 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-13500] due to args.save_total_limit
2022-09-28 20:32:40 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 20:35:11 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14500/config.json
2022-09-28 20:35:11 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14000] due to args.save_total_limit
2022-09-28 20:40:16 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 20:42:45 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15000/config.json
2022-09-28 20:42:46 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-14500] due to args.save_total_limit
2022-09-28 20:47:50 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 20:50:32 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15500/config.json
2022-09-28 20:50:32 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15000] due to args.save_total_limit
2022-09-28 20:55:37 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 20:58:16 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16000/config.json
2022-09-28 20:58:16 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-15500] due to args.save_total_limit
2022-09-28 21:03:20 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 21:06:04 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16500/config.json
2022-09-28 21:06:05 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16000] due to args.save_total_limit
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.0.ffn.lin1.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.0.ffn.lin2.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.1.ffn.lin1.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.1.ffn.lin2.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.2.ffn.lin1.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.2.ffn.lin2.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.3.ffn.lin1.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.3.ffn.lin2.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.4.ffn.lin1.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.4.ffn.lin2.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.5.ffn.lin1.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Set distilbert.transformer.layer.5.ffn.lin2.weight sparsity with mask 2359296 2123366 0.10000016954210067.
2022-09-28 21:09:32 [INFO] Name Shape \
0 distilbert.embeddings.word_embeddings.module.w... [30522, 768]
1 distilbert.embeddings.position_embeddings.modu... [512, 768]
2 distilbert.transformer.layer.0.attention.q_lin... [768, 768]
3 distilbert.transformer.layer.0.attention.k_lin... [768, 768]
4 distilbert.transformer.layer.0.attention.v_lin... [768, 768]
5 distilbert.transformer.layer.0.attention.out_l... [768, 768]
6 distilbert.transformer.layer.0.ffn.lin1.weight [3072, 768]
7 distilbert.transformer.layer.0.ffn.lin2.weight [768, 3072]
8 distilbert.transformer.layer.1.attention.q_lin... [768, 768]
9 distilbert.transformer.layer.1.attention.k_lin... [768, 768]
10 distilbert.transformer.layer.1.attention.v_lin... [768, 768]
11 distilbert.transformer.layer.1.attention.out_l... [768, 768]
12 distilbert.transformer.layer.1.ffn.lin1.weight [3072, 768]
13 distilbert.transformer.layer.1.ffn.lin2.weight [768, 3072]
14 distilbert.transformer.layer.2.attention.q_lin... [768, 768]
15 distilbert.transformer.layer.2.attention.k_lin... [768, 768]
16 distilbert.transformer.layer.2.attention.v_lin... [768, 768]
17 distilbert.transformer.layer.2.attention.out_l... [768, 768]
18 distilbert.transformer.layer.2.ffn.lin1.weight [3072, 768]
19 distilbert.transformer.layer.2.ffn.lin2.weight [768, 3072]
20 distilbert.transformer.layer.3.attention.q_lin... [768, 768]
21 distilbert.transformer.layer.3.attention.k_lin... [768, 768]
22 distilbert.transformer.layer.3.attention.v_lin... [768, 768]
23 distilbert.transformer.layer.3.attention.out_l... [768, 768]
24 distilbert.transformer.layer.3.ffn.lin1.weight [3072, 768]
25 distilbert.transformer.layer.3.ffn.lin2.weight [768, 3072]
26 distilbert.transformer.layer.4.attention.q_lin... [768, 768]
27 distilbert.transformer.layer.4.attention.k_lin... [768, 768]
28 distilbert.transformer.layer.4.attention.v_lin... [768, 768]
29 distilbert.transformer.layer.4.attention.out_l... [768, 768]
30 distilbert.transformer.layer.4.ffn.lin1.weight [3072, 768]
31 distilbert.transformer.layer.4.ffn.lin2.weight [768, 3072]
32 distilbert.transformer.layer.5.attention.q_lin... [768, 768]
33 distilbert.transformer.layer.5.attention.k_lin... [768, 768]
34 distilbert.transformer.layer.5.attention.v_lin... [768, 768]
35 distilbert.transformer.layer.5.attention.out_l... [768, 768]
36 distilbert.transformer.layer.5.ffn.lin1.weight [3072, 768]
37 distilbert.transformer.layer.5.ffn.lin2.weight [768, 3072]
38 pre_classifier.module.weight [768, 768]
39 classifier.module.weight [2, 768]
40 Total sparsity: 66892800
NNZ (dense) NNZ (sparse) Sparsity(%) Std Mean Abs-Mean
0 23440896 0 0.00 0.05 -3.83e-02 0.05
1 393216 0 0.00 0.02 -4.17e-05 0.01
2 589824 0 0.00 0.04 5.87e-05 0.03
3 589824 0 0.00 0.04 2.50e-05 0.03
4 589824 0 0.00 0.03 -7.74e-05 0.03
5 589824 0 0.00 0.04 -2.00e-05 0.03
6 2123366 235930 10.00 0.04 -4.01e-06 0.03
7 2123366 235930 10.00 0.04 -9.18e-05 0.03
8 589824 0 0.00 0.06 -7.05e-06 0.04
9 589824 0 0.00 0.06 3.96e-05 0.04
10 589824 0 0.00 0.04 -1.26e-04 0.03
11 589824 0 0.00 0.04 -2.72e-06 0.03
12 2123366 235930 10.00 0.04 1.62e-04 0.03
13 2123366 235930 10.00 0.04 -3.54e-05 0.03
14 589824 0 0.00 0.05 -1.91e-04 0.04
15 589824 0 0.00 0.05 4.29e-05 0.04
16 589824 0 0.00 0.04 8.37e-05 0.03
17 589824 0 0.00 0.04 1.65e-05 0.03
18 2123366 235930 10.00 0.05 2.58e-04 0.04
19 2123366 235930 10.00 0.04 -2.79e-06 0.03
20 589824 0 0.00 0.05 1.33e-06 0.04
21 589824 0 0.00 0.05 -8.86e-06 0.04
22 589824 0 0.00 0.05 1.80e-04 0.04
23 589824 0 0.00 0.04 -2.84e-05 0.03
24 2123366 235930 10.00 0.04 6.56e-04 0.03
25 2123366 235930 10.00 0.04 1.45e-05 0.03
26 589824 0 0.00 0.05 -2.18e-04 0.04
27 589824 0 0.00 0.05 -2.13e-06 0.04
28 589824 0 0.00 0.05 2.43e-05 0.04
29 589824 0 0.00 0.04 4.51e-06 0.04
30 2123366 235930 10.00 0.04 8.08e-04 0.03
31 2123366 235930 10.00 0.04 -1.83e-05 0.03
32 589824 0 0.00 0.05 -2.03e-04 0.04
33 589824 0 0.00 0.05 2.12e-04 0.04
34 589824 0 0.00 0.05 2.22e-05 0.04
35 589824 0 0.00 0.05 2.52e-06 0.04
36 2123366 235930 10.00 0.04 5.40e-04 0.03
37 2123366 235930 10.00 0.04 -5.80e-06 0.03
38 589824 0 0.00 0.02 -5.81e-05 0.02
39 1536 0 0.00 0.02 -4.54e-04 0.02
40 - 2831160 4.23 0.00 0.00e+00 0.00
2022-09-28 21:09:32 [INFO] 4.2323837543053955
2022-09-28 21:11:14 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 21:14:10 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17000/config.json
2022-09-28 21:14:10 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-3500] due to args.save_total_limit
2022-09-28 21:19:30 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 21:22:17 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17500/config.json
2022-09-28 21:22:17 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-16500] due to args.save_total_limit
2022-09-28 21:27:27 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 21:30:18 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18000/config.json
2022-09-28 21:30:18 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-17500] due to args.save_total_limit
2022-09-28 21:35:30 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 21:38:13 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18500/config.json
2022-09-28 21:38:13 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18000] due to args.save_total_limit
2022-09-28 21:43:16 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 21:46:00 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19000/config.json
2022-09-28 21:46:00 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-18500] due to args.save_total_limit
2022-09-28 21:51:07 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 21:53:45 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19500/config.json
2022-09-28 21:53:45 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19000] due to args.save_total_limit
2022-09-28 21:58:48 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 22:01:36 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20000/config.json
2022-09-28 22:01:36 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-19500] due to args.save_total_limit
2022-09-28 22:06:37 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 22:09:17 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20500/config.json
2022-09-28 22:09:17 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20000] due to args.save_total_limit
2022-09-28 22:14:37 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 22:17:26 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21000/config.json
2022-09-28 22:17:26 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-20500] due to args.save_total_limit
2022-09-28 22:22:35 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 22:25:11 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21500/config.json
2022-09-28 22:25:12 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21000] due to args.save_total_limit
2022-09-28 22:30:21 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 22:33:07 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22000/config.json
2022-09-28 22:33:07 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-21500] due to args.save_total_limit
2022-09-28 22:38:14 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 22:41:00 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22500/config.json
2022-09-28 22:41:00 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22000] due to args.save_total_limit
2022-09-28 22:46:01 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 22:48:41 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23000/config.json
2022-09-28 22:48:41 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-22500] due to args.save_total_limit
2022-09-28 22:53:50 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 22:56:38 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23500/config.json
2022-09-28 22:56:38 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23000] due to args.save_total_limit
2022-09-28 23:01:54 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 23:04:40 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24000/config.json
2022-09-28 23:04:41 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-23500] due to args.save_total_limit
2022-09-28 23:09:51 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 23:12:36 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24500
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24500/config.json
2022-09-28 23:12:36 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24500/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24500/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24500/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24000] due to args.save_total_limit
2022-09-28 23:17:39 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
2022-09-28 23:20:18 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-25000
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-25000/config.json
2022-09-28 23:20:18 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-25000/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-25000/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-25000/special_tokens_map.json
Deleting older checkpoint [/data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/checkpoint-24500] due to args.save_total_limit
2022-09-28 23:22:57 [INFO] Name Shape \
0 distilbert.embeddings.word_embeddings.module.w... [30522, 768]
1 distilbert.embeddings.position_embeddings.modu... [512, 768]
2 distilbert.transformer.layer.0.attention.q_lin... [768, 768]
3 distilbert.transformer.layer.0.attention.k_lin... [768, 768]
4 distilbert.transformer.layer.0.attention.v_lin... [768, 768]
5 distilbert.transformer.layer.0.attention.out_l... [768, 768]
6 distilbert.transformer.layer.0.ffn.lin1.weight [3072, 768]
7 distilbert.transformer.layer.0.ffn.lin2.weight [768, 3072]
8 distilbert.transformer.layer.1.attention.q_lin... [768, 768]
9 distilbert.transformer.layer.1.attention.k_lin... [768, 768]
10 distilbert.transformer.layer.1.attention.v_lin... [768, 768]
11 distilbert.transformer.layer.1.attention.out_l... [768, 768]
12 distilbert.transformer.layer.1.ffn.lin1.weight [3072, 768]
13 distilbert.transformer.layer.1.ffn.lin2.weight [768, 3072]
14 distilbert.transformer.layer.2.attention.q_lin... [768, 768]
15 distilbert.transformer.layer.2.attention.k_lin... [768, 768]
16 distilbert.transformer.layer.2.attention.v_lin... [768, 768]
17 distilbert.transformer.layer.2.attention.out_l... [768, 768]
18 distilbert.transformer.layer.2.ffn.lin1.weight [3072, 768]
19 distilbert.transformer.layer.2.ffn.lin2.weight [768, 3072]
20 distilbert.transformer.layer.3.attention.q_lin... [768, 768]
21 distilbert.transformer.layer.3.attention.k_lin... [768, 768]
22 distilbert.transformer.layer.3.attention.v_lin... [768, 768]
23 distilbert.transformer.layer.3.attention.out_l... [768, 768]
24 distilbert.transformer.layer.3.ffn.lin1.weight [3072, 768]
25 distilbert.transformer.layer.3.ffn.lin2.weight [768, 3072]
26 distilbert.transformer.layer.4.attention.q_lin... [768, 768]
27 distilbert.transformer.layer.4.attention.k_lin... [768, 768]
28 distilbert.transformer.layer.4.attention.v_lin... [768, 768]
29 distilbert.transformer.layer.4.attention.out_l... [768, 768]
30 distilbert.transformer.layer.4.ffn.lin1.weight [3072, 768]
31 distilbert.transformer.layer.4.ffn.lin2.weight [768, 3072]
32 distilbert.transformer.layer.5.attention.q_lin... [768, 768]
33 distilbert.transformer.layer.5.attention.k_lin... [768, 768]
34 distilbert.transformer.layer.5.attention.v_lin... [768, 768]
35 distilbert.transformer.layer.5.attention.out_l... [768, 768]
36 distilbert.transformer.layer.5.ffn.lin1.weight [3072, 768]
37 distilbert.transformer.layer.5.ffn.lin2.weight [768, 3072]
38 pre_classifier.module.weight [768, 768]
39 classifier.module.weight [2, 768]
40 Total sparsity: 66892800
NNZ (dense) NNZ (sparse) Sparsity(%) Std Mean Abs-Mean
0 23440896 0 0.00 0.05 -3.83e-02 0.05
1 393216 0 0.00 0.02 -4.15e-05 0.01
2 589824 0 0.00 0.04 5.91e-05 0.03
3 589824 0 0.00 0.04 2.48e-05 0.03
4 589824 0 0.00 0.03 -7.52e-05 0.03
5 589824 0 0.00 0.04 -1.98e-05 0.03
6 2123366 235930 10.00 0.04 -4.01e-06 0.03
7 2123366 235930 10.00 0.04 -9.18e-05 0.03
8 589824 0 0.00 0.06 -7.70e-06 0.04
9 589824 0 0.00 0.06 3.87e-05 0.04
10 589824 0 0.00 0.04 -1.25e-04 0.03
11 589824 0 0.00 0.04 -2.65e-06 0.03
12 2123366 235930 10.00 0.04 1.62e-04 0.03
13 2123366 235930 10.00 0.04 -3.54e-05 0.03
14 589824 0 0.00 0.05 -1.92e-04 0.04
15 589824 0 0.00 0.05 4.25e-05 0.04
16 589824 0 0.00 0.04 8.36e-05 0.03
17 589824 0 0.00 0.04 1.64e-05 0.03
18 2123366 235930 10.00 0.05 2.58e-04 0.04
19 2123366 235930 10.00 0.04 -2.79e-06 0.03
20 589824 0 0.00 0.05 2.05e-06 0.04
21 589824 0 0.00 0.05 -8.60e-06 0.04
22 589824 0 0.00 0.05 1.81e-04 0.04
23 589824 0 0.00 0.04 -2.81e-05 0.03
24 2123366 235930 10.00 0.04 6.56e-04 0.03
25 2123366 235930 10.00 0.04 1.45e-05 0.03
26 589824 0 0.00 0.05 -2.19e-04 0.04
27 589824 0 0.00 0.05 -2.40e-06 0.04
28 589824 0 0.00 0.05 2.38e-05 0.04
29 589824 0 0.00 0.04 4.27e-06 0.04
30 2123366 235930 10.00 0.04 8.08e-04 0.03
31 2123366 235930 10.00 0.04 -1.83e-05 0.03
32 589824 0 0.00 0.05 -2.03e-04 0.04
33 589824 0 0.00 0.05 2.12e-04 0.04
34 589824 0 0.00 0.05 2.31e-05 0.04
35 589824 0 0.00 0.05 2.60e-06 0.04
36 2123366 235930 10.00 0.04 5.40e-04 0.03
37 2123366 235930 10.00 0.04 -5.80e-06 0.03
38 589824 0 0.00 0.02 -6.99e-05 0.02
39 1536 0 0.00 0.02 -6.27e-04 0.02
40 - 2831160 4.23 0.00 0.00e+00 0.00
2022-09-28 23:22:57 [INFO] 4.2323837543053955
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/nn/quantized/_reference/modules/utils.py:25: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
torch.tensor(weight_qparams["scale"], dtype=torch.float, device=device))
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/nn/quantized/_reference/modules/utils.py:28: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
torch.tensor(weight_qparams["zero_point"], dtype=zero_point_dtype, device=device))
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/ao/quantization/observer.py:176: UserWarning: Please use quant_min and quant_max to specify the range for observers. reduce_range will be deprecated in a future release of PyTorch.
warnings.warn(
2022-09-28 23:22:59 [INFO]
Training completed. Do not forget to share your model on huggingface.co/models =)
2022-09-28 23:22:59 [INFO] Saving model checkpoint to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/config.json
2022-09-28 23:22:59 [INFO] Model weights saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/pytorch_model.bin
tokenizer config file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/tokenizer_config.json
Special tokens file saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs/special_tokens_map.json
2022-09-28 23:22:59 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
***** train metrics *****
epoch = 3.0
total_flos = 1445093GF
train_loss = 0.2358
train_runtime = 6:16:43.78
train_samples_per_second = 8.939
train_steps_per_second = 1.117
2022-09-28 23:24:07 [INFO] Evaluated model score is 0.8473333120346069.
2022-09-28 23:24:07 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
eval_accuracy: 0.8473333120346069
Throughput: 44.556 samples/sec
Configuration saved in /data/blog/2022-09-26-optimum-intel-quantization-aware-training/example-model/config.json
eval_accuracy: 0.8473333120346069
Throughput: 43.912 samples/sec
2022-09-28 23:25:15 [INFO] Model weights saved to /data/blog/2022-09-26-optimum-intel-quantization-aware-training/example-model
loading configuration file /data/blog/2022-09-26-optimum-intel-quantization-aware-training/example-model/config.json
Model config DistilBertConfig {
"_name_or_path": "/data/blog/2022-09-26-optimum-intel-quantization-aware-training/example-model",
"activation": "gelu",
"architectures": [
"DistilBertForSequenceClassification"
],
"attention_dropout": 0.1,
"dim": 768,
"dropout": 0.1,
"hidden_dim": 3072,
"id2label": {
"0": 0,
"1": 1
},
"initializer_range": 0.02,
"label2id": {
"0": 0,
"1": 1
},
"max_position_embeddings": 512,
"model_type": "distilbert",
"n_heads": 12,
"n_layers": 6,
"pad_token_id": 0,
"problem_type": "single_label_classification",
"qa_dropout": 0.1,
"seq_classif_dropout": 0.2,
"sinusoidal_pos_embds": false,
"tie_weights_": true,
"torch_dtype": "float32",
"transformers_version": "4.22.2",
"vocab_size": 30522
}
loading configuration file /data/blog/2022-09-26-optimum-intel-quantization-aware-training/example-model/config.json
Model config DistilBertConfig {
"_name_or_path": "distilbert-base-uncased",
"activation": "gelu",
"architectures": [
"DistilBertForSequenceClassification"
],
"attention_dropout": 0.1,
"dim": 768,
"dropout": 0.1,
"hidden_dim": 3072,
"id2label": {
"0": 0,
"1": 1
},
"initializer_range": 0.02,
"label2id": {
"0": 0,
"1": 1
},
"max_position_embeddings": 512,
"model_type": "distilbert",
"n_heads": 12,
"n_layers": 6,
"pad_token_id": 0,
"problem_type": "single_label_classification",
"qa_dropout": 0.1,
"seq_classif_dropout": 0.2,
"sinusoidal_pos_embds": false,
"tie_weights_": true,
"torch_dtype": "float32",
"transformers_version": "4.22.2",
"vocab_size": 30522
}
loading weights file /data/blog/2022-09-26-optimum-intel-quantization-aware-training/example-model/pytorch_model.bin
Optimized model with eval_accuracy of 0.8473333120346069 and sparsity of 4.5% saved to: /data/blog/2022-09-26-optimum-intel-quantization-aware-training/runs. Original model had an eval_accuracy of 0.5166666507720947.
All model checkpoint weights were used when initializing DistilBertForSequenceClassification.
All the weights of DistilBertForSequenceClassification were initialized from the model checkpoint at /data/blog/2022-09-26-optimum-intel-quantization-aware-training/example-model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use DistilBertForSequenceClassification for predictions without further training.
2022-09-28 23:25:16 [INFO] Fx trace of the entire model failed. We will conduct auto quantization
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/ao/quantization/observer.py:176: UserWarning: Please use quant_min and quant_max to specify the range for observers. reduce_range will be deprecated in a future release of PyTorch.
warnings.warn(
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/nn/quantized/_reference/modules/utils.py:25: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
torch.tensor(weight_qparams["scale"], dtype=torch.float, device=device))
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/nn/quantized/_reference/modules/utils.py:28: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
torch.tensor(weight_qparams["zero_point"], dtype=zero_point_dtype, device=device))
/home/matthew/.local/share/virtualenvs/blog-1tuLwbZm/lib/python3.10/site-packages/torch/ao/quantization/utils.py:280: UserWarning: must run observer before calling calculate_qparams. Returning default values.
warnings.warn(
2022-09-28 23:25:18 [INFO] The following columns in the evaluation set don't have a corresponding argument in `DistilBertForSequenceClassification.forward` and have been ignored: text. If text are not expected by `DistilBertForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 3000
Batch size = 8
eval_accuracy: 0.8473333120346069
Throughput: 45.424 samples/sec
The quantized model was successfully loaded.