Is ONNX really a reliable way to translate code from Python to Java?
quantization
Published
November 23, 2022
One of the attractive things about ONNX is the ability to translate code from Python to Java. This allows a production model to run in Java, which engineers often prefer, while data scientists can work on it in Python. If this is going to work then the Java version needs to produce the same output as the Python version. Does it?
This post will be an investigation of that. For most of the posts on this blog I include everything required to reproduce the results. Since jupyter cannot run Java code I have created a separate repo that can load an ONNX model and allow you to perform inference over a simple REST API. The code is available here.
Model
The task that we perform is not significant, so I am going to use a pretrained sentiment model from the huggingface hub. This one is distilbert base uncased (Sanh et al. 2019) and has been fine tuned for sentiment on SST2 (Socher et al. 2013). I am going to export it to the onnx format and load it using pure onnx code. The huggingface optimum library is not available in Java, so I will not be using that, to try to keep this as fair as possible.
Sanh, Victor, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. “DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter.” arXiv. https://doi.org/10.48550/ARXIV.1910.01108.
Socher, Richard, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. “Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank.” In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 1631–42. Seattle, Washington, USA: Association for Computational Linguistics. https://aclanthology.org/D13-1170.
Let’s start comparing the unoptimized ONNX model to the original pytorch / huggingface version. We can then try this same comparison against the java version.
2022-11-23 17:35:04.025416476 [W:onnxruntime:, inference_session.cc:1458 Initialize] Serializing optimized model with Graph Optimization level greater than ORT_ENABLE_EXTENDED and the NchwcTransformer enabled. The generated model may contain hardware specific optimizations, and should only be used in the same environment the model was optimized in.
Code
import onnxruntime as ortimport numpy as npimport pandas as pdort_session = ort.InferenceSession(str(MODEL_QUANTIZED_FILE))ort_quantized_logits = python_inference(ort_session=ort_session, **inputs)pd.DataFrame([ {"label": label,"pytorch": logits[0, index].item(),"py-quantized": ort_quantized_logits[0, index],"pytorch py-quantized Δ": logits[0, index].item() - ort_quantized_logits[0, index], }for index, label in model.config.id2label.items()])
The quantization process has dramatically increased the amount of difference, however the java version of the code is still consistent with the python version.
Systematic Testing
Let’s make this a more thorough test by expanding the dataset trying to find sentences that cause greater differences. I care most about consistency rather than classification, so a dataset with more text is all that I require. For now I am going to use the IMDB dataset from huggingface (Maas et al. 2011).
Maas, Andrew L., Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. “Learning Word Vectors for Sentiment Analysis.” In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 142–50. Portland, Oregon, USA: Association for Computational Linguistics. http://www.aclweb.org/anthology/P11-1015.
Downloading and preparing dataset imdb/plain_text to /home/matthew/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1...
Dataset imdb downloaded and prepared to /home/matthew/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1. Subsequent calls will reuse this data.
25000
Code
imdb_dataset["text"][0]
'I love sci-fi and am willing to put up with a lot. Sci-fi movies/TV are usually underfunded, under-appreciated and misunderstood. I tried to like this, I really did, but it is to good TV sci-fi as Babylon 5 is to Star Trek (the original). Silly prosthetics, cheap cardboard sets, stilted dialogues, CG that doesn\'t match the background, and painfully one-dimensional characters cannot be overcome with a \'sci-fi\' setting. (I\'m sure there are those of you out there who think Babylon 5 is good sci-fi TV. It\'s not. It\'s clichéd and uninspiring.) While US viewers might like emotion and character development, sci-fi is a genre that does not take itself seriously (cf. Star Trek). It may treat important issues, yet not as a serious philosophy. It\'s really difficult to care about the characters here as they are not simply foolish, just missing a spark of life. Their actions and reactions are wooden and predictable, often painful to watch. The makers of Earth KNOW it\'s rubbish as they have to always say "Gene Roddenberry\'s Earth..." otherwise people would not continue watching. Roddenberry\'s ashes must be turning in their orbit as this dull, cheap, poorly edited (watching it without advert breaks really brings this home) trudging Trabant of a show lumbers into space. Spoiler. So, kill off a main character. And then bring him back as another actor. Jeeez! Dallas all over again.'
When the largest difference between the two quantization runs is in the order of \(\frac{2}{10000000}\) I think that the Java version of ONNX is a safe substitute for Python.