A blog post has been published on prompt training with an accompanying paper (Gao, Fisch, and Chen 2021). It’s a really nice post that covers the general theory as well as the approach that they have taken. When reading it I notice that they have approached it as a masked token prediction problem where they have a preset prompt template with a [MASK]
token, and then they compare the output probability of N tokens corresponding to their desired classification classes.
They are able to train effective prompts + tokens given a small number of examples and both the prompt and the tokens remain legible. There is a discussion of derived prompts given fixed tokens.
When reading this I think I need to refocus my work on the following:
- Semantic search techniques to align token output for the given prompt, as well as training the prompt
- Discrete prompts and finding the best (most distinguishing) tokens for that prompt
- Training over a small number of examples, performance of
- Comparison to fine tuning
I need to clean up this post but I wanted to quickly note my thoughts.