Matthew’s Blog

Hyperparameter Search with the Huggingface Trainer

Optimizing training hyperparameters with various libraries

Mar 8, 2025

New Huggingface Agents course

Trying out the smolagents framework for this course

Feb 17, 2025

Retrieval Augmented Generation over … this blog

Creating a Q&A bot for this blog

Feb 8, 2025

Extracting Facts from Text

Breaking text down to atomic facts with FactScore

Oct 21, 2024

Extending Huggingface Model

Customizing a pretrained model while keeping the handy utility methods

Oct 5, 2024

RAG for Graphs

How to perform Retrieval Augmented Generation over graph datastructures?

Aug 1, 2024

Using AI to Generate Art

What’s the worst way I can generate art with deep learning models?

May 1, 2024

Dell 1350cnw on Ubuntu

What I had to do to set up my printer

Apr 23, 2024

SSH Conditional Directives

Changing ssh configuration based on laptop location

Apr 13, 2024

Whisper Speed Test

How fast is Whisper v3? How fast can I make it?

Mar 28, 2024

Binary Embeddings

Measuring the difference between floating point embeddings and the binarized form

Mar 28, 2024

Conversations with a Paper

Can I use a LLM to discuss a paper?

Mar 27, 2024

Conversational Interface Automation

Part 2

Mar 12, 2024

Conversational Interface

Creating a conversational interface to an API

Feb 28, 2024

MNLI vs XNLI

How Multilingual is Multi-Genre Natural Language Inference

Feb 7, 2024

Retrieval Augmented Generation and Requirement Matching

Generating text based on a skill list to match an advert

Feb 4, 2024

Transcribing Speech with Whisper v3

Using the latest whisper model to transcribe speech

Dec 26, 2023

Detecting NPCs in Old School Runescape

Old school runescape is an old mmo which relies on grinding statistics for hours on end. Once you’ve done that you can…

Oct 16, 2023

Investigating Regression Discontinuity

Can the effect of an intervention be measured when the intervention is dependent on the measured value?

Oct 11, 2023

Creating an Aspect Sentiment Dataset

Uploading some Aspect Sentiment datasets to the Huggingface hub

Oct 2, 2023

Purging and Reinstalling CUDA

What I did to clear up my install

Sep 30, 2023

How is Cosine Similarity affected by dimension count

Is a Cosine Similarity of 0.5 discriminative for high dimensional vectors?

Sep 21, 2023

KL Divergence and Scoring Rules

Is KL Divergence better than Cross Entropy?

Sep 13, 2023

Understanding Scoring Rules

What is a scoring rule and how does it work?

Aug 30, 2023

CTranslate2 vs Transformers

How does CTranslate2 compare to Transformers for speed?

Aug 1, 2023

Retraining with ALiBi

Changing GPT2 model to use ALiBi positional embeddings

Jul 11, 2023

Past Key Values and Large Language Models

Are large language models unidirectional? Can they benefit from saving intermediate state of previous runs?

Jun 30, 2023

Falcon and MPT

A comparison of two 7B language models

Jun 24, 2023

Guidance and Jsonformer

Ways to restrict the output of Large Language Models

Jun 16, 2023

Kinesis Advantage settings

A reminder for me about how I configure my keyboard

May 30, 2023

Low Rank Adaptation (LoRA)

Investigation of the technique for training large models

May 17, 2023

Huggingface Agents

Checking out the new composition of deep learning models by Huggingface

May 13, 2023

Perfume Smells and Collaborative Filtering

Can I recommend perfumes based on comments?

Apr 29, 2023

Perfume Smells and Bayesian Networks

Can I find relationships between scents in perfumes using Bayesian Networks?

Apr 25, 2023

LLaMA

First run at using LLaMA-7B

Apr 3, 2023

Bayesian Probability

Working through the basics before getting onto networks

Mar 19, 2023

Interactive Visualizations

Evaluating different visualization techniques for this blog

Mar 10, 2023

Explainability Principles - Working In Reverse

Can we show what matters by calculating over missing values?

Mar 3, 2023

Explaining PyTorch Modules

How can we show the influence of the input to the output?

Feb 20, 2023

Evaluation of TabPFN

Looking at a transformer model for tabular data

Jan 31, 2023

Perceptual Advert Blocking

Reviewing a paper on Blocking Adverts based on image content

Nov 29, 2022

ONNX Python vs Java

Is ONNX really a reliable way to translate code from Python to Java?

Nov 23, 2022

Explaining CLIP Output

What parts of an image contribute to the CLIP classification?

Nov 19, 2022

Evaluating OpenAI Whisper

How good is the OpenAI Whisper speech to text model? How easy is it to quantize?

Nov 9, 2022

Using Huggingface Trainer to train a Sentence Transformer model

Sentence Transformer models create embeddings out of text, can they be trained with the Huggingface Trainer?

Oct 19, 2022

Using Optimum Intel

Using the Intel Neural Compressor to perform Quantization Aware Training

Sep 26, 2022

Using Optimum Benchmarking

Using the built in benchmarking code to measure quantized model performance

Sep 26, 2022

Playing with Stable Diffusion

This model can generate images from a prompt, like DALL-E

Sep 24, 2022

Optimizing a BERT Model using Optimimum ONNX

Different ways to Optimize a model with performance

Sep 21, 2022

Using the Wikipedia Dataset for Cross Language WSI

Training a model with the Wikipedia Dataset

Sep 13, 2022

Quantizing a BERT Model using Optimimum ONNX

Different ways to Quantize a model with performance

Sep 12, 2022

Creating a Train Dataset for Cross Language WSI

Using Wikipedia and Wikidata to link other languages to English articles

Sep 5, 2022

Performing Word Sense Induction

Creating an evaluation framework for Word Sense Induction

Aug 24, 2022

Changing to Quarto

Changing the blog framework

Aug 23, 2022

Highest path through a Number Pyramid

A bit of fun

Jul 28, 2022

Cross Language Prompt Internalization - Synonym Clustering Visualization

Visualizing the different clustering metrics

Jul 24, 2022

Cross Language Prompt Internalization - Wikipedia Synonym Clustering

Resolving words based on model features and Wikipedia Synonyms

Jul 21, 2022

Cross Language Prompt Internalization - Article Feature Clustering

Clustering the Wikipedia linked article features for Word Sense Induction

Jul 19, 2022

Cross Language Prompt Internalization - Wikipedia Processing

Can use Wikipedia articles as targets for word sense induction

Jul 15, 2022

Cross Language Prompt Internalization - Double Head with Language Modelling

Trying to reduce model collapse by simultaneous language modelling

Jul 12, 2022

Cross Language Prompt Internalization - Model Collapse

Does XLM-RoBERTa collapse into a fixed set of outputs if it is trained for a long time

Jul 10, 2022

Cross Language Prompt Internalization - Proper Model

I’ve been using the wrong model all this time!

Jul 9, 2022

Cross Language Prompt Internalization - Excluding Common Tokens

Excluding or punishing the most common tokens for XLM-RoBERTa

Jul 9, 2022

Cross Language Prompt Internalization - Most Common Tokens

Calculating the frequency statistics for token predictions

Jul 8, 2022

Cross Language Prompt Internalization - Recreating Dataset

Using the large Spacy models to preprocess the Tatoeba dataset

Jul 7, 2022

Training my own Dall-E

Using CLIP to VQGAN to turn text into an image

Jul 3, 2022

Installing the missing package for Nvidia CUDA

Another reminder for myself about getting CUDA working on Ubuntu

Jun 29, 2022

Reproducing DALL-E in PyTorch

Using the recent kuprel rewrite to understand and implement DALL-E

Jun 28, 2022

Cross Language Prompt Internalization - Model Type

Trying different models for the multilingual prediction

Jun 24, 2022

Cross Language Prompt Internalization - First Token

Trying another way to calculate the multi-token prediction

Jun 22, 2022

Disable Fn Keys

Turn off Fn key behaviour on Ubuntu

Jun 22, 2022

Running DALL-E locally

Getting the DALL-E Mini model from huggingface running

Jun 18, 2022

Cross Language Prompt Internalization

Using the Tatoeba cross language dataset to internalize a multi lingual prompt in a single space

Jun 18, 2022

Image Caption Generation

How well do image captioning models work?

Jun 17, 2022

Retraining a model for Persona Conversations

Can I update the model used for persona based conversations?

Jun 1, 2022

Prompt Internalization - Word Sense Induction

Can I use prompt internalization to perform Word Sense Induction?

May 2, 2022

Prompt Internalization

Can I use KL Divergence to teach a model to match a prompted model?

Apr 25, 2022

Distillation - Cosine Similarity

Create a distilled model with a different adjustment to the teacher

Apr 24, 2022

Distillation - Changing Temperature

Create a distilled model

Apr 19, 2022

Huggingface Distillation Workshop

Create a distilled model

Apr 13, 2022

Speech Recognition with Transformers and Gradio

Is it possible to make a better speech recognizer?

Apr 9, 2022

Understanding Attention

It’s about time that I thoroughly explored the Attention mechanism

Mar 25, 2022

Domain Shift by Language Pre-Training of Embeddings

Can I shift the domain of a sentiment model by changing embeddings only?

Jan 28, 2022

Solving Constrained Optimization Problems with Pytorch

Can I do the homework that has been set by a cruel and unusual math tutor?

Jan 22, 2022

Using a Language Model to Unshuffle Words

Can we unshuffle the shuffled words we made?

Jan 8, 2022

Quantum Computing: Creating a full Algorithm

Can I implement a full shuffler of a number of arbitrary 8-bit values?

Jan 1, 2022

Quantum Computing: Understanding Quantum Circuits

How do Quantum Circuits actually work?

Dec 30, 2021

Quantum Computing: The QuTiP Framework

How does QuTiP compare to Qiskit for Quantum Computing?

Dec 28, 2021

Quantum Computing: The Qiskit Framework

Can I perform the simplest operation using a simulated Quantum Computer?

Dec 26, 2021

LaTeX Flow Charts

I’ve been using graphviz to create the flow charts for these posts. Can I use LaTeX instead?

Dec 24, 2021

How does the past parameter work with bi-directional models?

GPT-2 can take previous activations to process the following tokens in the sentence. How does this work with a bi-directional model like RoBERTa?

Dec 11, 2021

Language Describer

How can we know what we talk about?

Nov 15, 2021

JINA + CLIP for Image and Text Search

JINA is a content insensitive document index. Can I use it to index images with CLIP and then search for them?

Oct 29, 2021

YOLO + CLIP for Object Detection

YOLO detects bounding boxes, CLIP can classify images using natural language promopts. Can they be combined?

Oct 20, 2021

FiftyOne Dataset Zoo

See how easy it is to get some images from this zoo of different datasets

Oct 18, 2021

Title Generator with Simple Transformers

I want to see how easy simple transformers is to use for context sensitive text generation

Oct 15, 2021

Wikipedia Focused Mean Squared Error Train

Train the wikipedia link recognition model using mean squared error and reduce the number of tokens considered

Oct 10, 2021

Wikipedia Mean Squared Error Train

Train the wikipedia link recognition model using mean squared error

Oct 3, 2021

Bird Brain Mammogram Classifier

Someone linked me a 2015 paper on using pigeons to classify mammograms. I want to reproduce this using a neural network

Sep 24, 2021

Training a Custom Huggingface Model

How to prepare data, customize a model, and train it

Sep 12, 2021

Wikipedia Link Recognition - Paper

Arranging the recent work on Wikipedia Link recognition into a paper

Sep 4, 2021

Wikipedia Perplexity Train

Can try training the model using the class based approach hinted at with the perplexity metric

Sep 2, 2021

Wikipedia Link Metrics

Training is slow and performance varies wildly. A way to track the quality of the model early could make progress more systematic

Aug 31, 2021

Wikipedia Boundary Metrics

Training is slow and performance varies wildly. A way to track the quality of the model early could make progress more systematic

Aug 30, 2021

Wikipedia IOB Link Boundaries

Link spotting seems weak, using inside - outside - beginning as the classifier could help?

Aug 21, 2021

Wikipedia Preprocessing Performance

Try to improve the preprocessing performance

Aug 7, 2021

Wikipedia Link Model

First go at training a Wikipedia Link recognizer

Aug 4, 2021

Wikipedia Page Mutual Information

Trying to extract meaningful tokens from a page using mutual information

Aug 1, 2021

Wikipedia Data Generation

This seems to be bigger than I anticipated - let’s further investigate and prepare the data

Jul 30, 2021

Wikipedia Link Recognition

I can use the aspect sentiment technique to recognize and cluster wikipedia linked text

Jul 28, 2021

Aspect Sentiment - Sentiment Only

Using a different system for entity extraction allows for a more focused model

Jul 25, 2021

Tweet Hashtag Prediction

Predicting the presence of hashtags on tweets

Jul 21, 2021

Aspect Sentiment Metrics

How to evaluate the performance?

Jul 20, 2021

Aspect Sentiment Training

How well does the token classification approach work?

Jul 19, 2021

Aspect Sentiment Dataset Review

Can I find a dataset that can work with my approach?

Jul 18, 2021

Facebook Prophet Evaluation

Can the prophet predict the most erratic of all timeseries?

Jul 17, 2021

Single Pass Entity Extraction

Marking up entities in a single pass

Jul 17, 2021

Sequence to Aspect Sentiment

Can Sequence to Sequence models be trained to extract entities and mark up sentiment?

Jul 3, 2021

Prompting Blog Post

A review of a comprehensive blog post on prompt training

Jun 28, 2021

Prompt Training - Strange Results

Prompt Training GPT2 works well until GPT2-large

Jun 27, 2021

Torch Multiprocessing Pipeline

Handling pipe/spread/collect patterns across process boundaries in pytorch

Jun 25, 2021

Prompt Training - Huggingface

Using Huggingface Trainer and Dataset to do Prompt Training

Jun 13, 2021

Prompt Training - My Paper

I’ve done enough research on Prompt Training to try to write a paper about it

Jun 12, 2021

NVidia Driver Woes

The solution to a recent problem I had with NVidia and CUDA

Jun 7, 2021

Dagster Evaluation

Is dagster a good fit for a data science pipeline?

Jun 3, 2021

ULMFit for Images

Using the techniques from ULMFit to do image classification

May 24, 2021

Prompt Training - Linear Head

Train a linear classifier per prompt

May 23, 2021

Prompt Training - Centroid Distance

Try to calculate the moving centroid for each class

May 10, 2021

Different Clustering Techniques

Clustering the trained prompts to try to extract the different classes

May 7, 2021

Prompt Training - Clustering Tokens

Classification with trained prompts by clustering the token confidence

May 6, 2021

Prompt Training - Clustering Raw Output

Classification with trained prompts by clustering the raw transformer output instead

May 4, 2021

Prompt Training - IMDB Movie Review Sentiment

Evaluating prompt training on a more popular dataset

Apr 28, 2021

Prompt Training - The Paper

Google published a paper on training a prompt for a language model

Apr 23, 2021

Neural Networks and Calculus - More Basics

Another Calculus lesson for Neural Networks

Apr 15, 2021

Dreaming of Prompts

Using DeepDream techniques to generate a language model prompt

Apr 13, 2021

GPT-2 Recycle

Actually training GPT-2 for another language properly

Apr 12, 2021

Listening to Speech

Making a chatbot listen to you

Apr 3, 2021

Conversation with a House

A chatbot that thinks it is a house

Apr 2, 2021

Neural Networks and Calculus - Basics

A primer in Calculus as it relates to Neural Networks

Apr 1, 2021

Fine Tuning GPT-2 with Transformers - Training the Model

Train GPT-2 with some Spanish data

Mar 29, 2021

Fine Tuning GPT-2 with Transformers - Making the Dataset

Prepare some Spanish data using Wikipedia dumps

Mar 28, 2021

Machine Learning Interview Questions

Some difficult questions I found

Mar 16, 2021

Using Kaggle Kernels with Make

How to update Kaggle Kernels with make commands

Mar 14, 2021

Using Kaggle Kernels with Git

How to use Kaggle Kernels with your regular development setup

Mar 8, 2021

Model Interpretability - LIME

How to explain the output a Deep Learning model

Mar 7, 2021

Train Weights & Biases from a Notebook

See if I can report training progress to Weights and Biases

Feb 26, 2021

Quantize GPT-2

Convert GPT-2 to int8 and check performance

Feb 21, 2021

FastAI Course - Lesson 2

Another questionnaire on deep learning

Feb 18, 2021

Man Woman Classifier

A model to classify pictures of people

Feb 18, 2021

Tensor Sensor Evaluation

Evaluation of error visualizer for tensor operations

Feb 16, 2021

FastAI Course - Lesson 1 - Models

Quick evaluation of different models introduced in first lesson

Feb 4, 2021

Quantize PyTorch Models

Convert floating point models into int8 models

Feb 1, 2021

Comparing Data Frames

Visualize differences between two dataframes

Jan 29, 2021

CMU - Machine Translation

CMU lecture on fundamentals of Macchine Translation - part 2

Jan 21, 2021

FastAI Course - Lesson 1

Questionnaire on Deep Learning

Jan 20, 2021

CMU - Machine Translation

CMU lecture on fundamentals of Macchine Translation

Jan 14, 2021

Publication Ideas

What is worth investigating in Data Science and Deep Learning?

Jan 11, 2021

Kaggle HuBMAP Competition

Data investigation for Kidney Image Segmentation challenge

Jan 9, 2021

Netron Evaluation

Checking out a network visualizer

Jan 8, 2021

Part of Speech Tagging

Checking out spaCy part of speech tagging

Jan 8, 2021

Starting a Blog

How I originally set this blog up

Jan 7, 2021

OpenAI Clip Evaluation

Checking out a new model from OpenAI

Jan 7, 2021