Matthew’s Blog - Prompting Blog Post

A blog post has been published on prompt training with an accompanying paper (Gao, Fisch, and Chen 2021). It’s a really nice post that covers the general theory as well as the approach that they have taken. When reading it I notice that they have approached it as a masked token prediction problem where they have a preset prompt template with a [MASK] token, and then they compare the output probability of N tokens corresponding to their desired classification classes.

Gao, Tianyu, Adam Fisch, and Danqi Chen. 2021. “Making Pre-Trained Language Models Better Few-Shot Learners.” https://arxiv.org/abs/2012.15723.

They are able to train effective prompts + tokens given a small number of examples and both the prompt and the tokens remain legible. There is a discussion of derived prompts given fixed tokens.

When reading this I think I need to refocus my work on the following:

Semantic search techniques to align token output for the given prompt, as well as training the prompt
Discrete prompts and finding the best (most distinguishing) tokens for that prompt
Training over a small number of examples, performance of
Comparison to fine tuning

I need to clean up this post but I wanted to quickly note my thoughts.