Matthew’s Blog - Domain Shift by Language Pre-Training of Embeddings

An NLP model performs a specific task. To do this it has expectations about the words and phrases used. This is why training a task specific model from a general language model is so good - the general language model has already learned the expectations, and then they can be used to perform the task.

This all becomes a problem when the expectations are wrong. If we consider science and cooking then the word chemical has very different meanings. In science it is merely a descriptive word, while describing food with the word chemical has strong negative connotations. A model that has expectations that are appropriate for scientific writing will not perform well when used on an article by a food critic.

The word chemical hasn’t changed it’s strict meaning - it still means a basic substance. Cooking can even be considered chemistry. Words are more than their dictionary definition though. The difference between the meanings is because Scientists and Cooks are in two different Domains (A sphere of activity, influence, or knowledge).

If we want to have a model that works well in one domain and then transfer it to another, we have to shift domains. This post is an exploration of that process.

Code

from pathlib import Path

PROJECT_NAME = "domain-shift"

DATA_FOLDER = Path("/data/blog/2022-01-28-domain-shift-by-embedding-replacement")
DATA_FOLDER.mkdir(parents=True, exist_ok=True)

Hypothesis

My idea is that a general purpose sentiment model that is trained without altering the embedding layer can be shifted to a specific domain by retraining only the embeddings of the original language model.

Experimental Setup

The task will be sentiment analysis. The general purpose sentiment model will be trained using the Sentiment 140 dataset (Go, Bhayani, and Huang 2009). This trained model will then be transformed into a domain specific model using the Multi-Domain Sentiment dataset {% cite blitzer-etal-2007-biographies %}. These datasets only have positive and negative sentiment text.

Go, Alec, Richa Bhayani, and Lei Huang. 2009. “Twitter Sentiment Classification Using Distant Supervision.” CS224N Project Report, Stanford 1 (12): 2009.

This experiment will evaluate several different models to see how they perform against the domain specific dataset with and without retraining.

Data Preparation

The datasets need to be restructured to have the text to classify and the target sentiment.

Data Preparation - Sentiment140

This has sentiment as an integer with values 0 (negative) and 4 (positive) along with several bits of metadata that are not interesting for this task.

Code

# from src/main/python/blog/domain_shift/data/sentiment140.py
from pathlib import Path

import pandas as pd


def load_sentiment140(path: Path) -> pd.DataFrame:
    df = pd.read_csv(
        path,
        names=["sentiment", "id", "date", "query", "user", "text"],
        encoding="ISO-8859-1",
    )
    df = df[["sentiment", "text"]].copy()

    # The sentiment column contains two values, 0 and 4.
    # There are 80,000 rows of each.
    # Example sentiment 0 row: my whole body feels itchy and like its on fire
    # Example sentiment 4 row: Happy 38th Birthday to my boo of alll time!!!
    df["sentiment"] = df.sentiment.map({0: "negative", 4: "positive"})

    return df

Code

#collapse
from pathlib import Path

GENERAL_DATASET = Path("/data/sentiment/sentiment140/sentiment140.zip")

general_df = load_sentiment140(GENERAL_DATASET)
general_df.to_parquet(
    "/data/sentiment/sentiment140/sentiment.gz.parquet",
    compression="gzip"
)

display(
    general_df.sentiment
        .value_counts()
        .to_frame()
)
general_df

	sentiment
negative	800000
positive	800000

	sentiment	text
0	negative	@switchfoot http://twitpic.com/2y1zl - Awww, t...
1	negative	is upset that he can't update his Facebook by ...
2	negative	@Kenichan I dived many times for the ball. Man...
3	negative	my whole body feels itchy and like its on fire
4	negative	@nationwideclass no, it's not behaving at all....
...	...	...
1599995	positive	Just woke up. Having no school is the best fee...
1599996	positive	TheWDB.com - Very cool to hear old Walt interv...
1599997	positive	Are you ready for your MoJo Makeover? Ask me f...
1599998	positive	Happy 38th Birthday to my boo of alll time!!! ...
1599999	positive	happy #charitytuesday @theNSPCC @SparksCharity...

1600000 rows × 2 columns

Code

#collapse
import datasets

general_ds = (
    datasets.Dataset.from_pandas(general_df)
        .train_test_split(test_size=10_000)
)
general_ds.save_to_disk("/data/sentiment/sentiment140/sentiment.dataset")

Data Preparation - Multi-Domain Sentiment

This is encoded in an almost xml file that needs some preprocessing. The xml file mixes & which is illegal in XML with the correctly encoded &, there are characters that are out of range for the default pandas xml parser, and finally there is no root node for the file.

Once all of these have been fixed the dataset is quite rich with \(rating \in \{ 1, 2, 4, 5 \}\). I’m going to consider \(negative \in \{ 1, 2 \}\) and \(positive \in \{ 4, 5 \}\). I’m only taking the text of the review, not the title, and some of them have no text.

Finally the domains are not evenly distributed. To ensure that there is enough training and evaluation data the top 5 domains are being used.

Code

# from src/main/python/blog/domain_shift/data/multi_domain_sentiment.py
from pathlib import Path
from typing import Tuple, Union

import pandas as pd
from lxml import etree


def load_multi_domain_sentiment(folder: Path) -> pd.DataFrame:
    # Loading these files requires quite a lot of preprocessing. This is split
    # into individual cleaning methods which are composed in the load_file
    # method below. The last section reads all of the different reviews and
    # filters them to the top 5 by domain volume.

    def load_all_files(folder: Path) -> pd.DataFrame:
        files = sorted(folder.glob("*/all.review"))
        df = pd.concat([load_file(path) for path in files])
        df = filter_to_top_5_domains(df)
        return df

    def filter_to_top_5_domains(df: pd.DataFrame) -> pd.DataFrame:
        # The number of reviews for each domain vary, from a few hundred to over
        # ten thousand. To ensure that there is enough data to train a model we
        # will take the top 5 domains by volume.
        top_5_domains = df.domain.value_counts()[:5].index
        df = df[df.domain.isin(top_5_domains)]
        return df

    def load_file(path: Path) -> pd.DataFrame:
        df = read_file(path)
        df = split_helpful_column(df)
        df = clean_rating(df)
        df = clean_text(df)
        df = parse_date(df)
        df = drop_unrelated_columns(df)
        df = rating_to_sentiment(df)
        return df

    def read_file(path: Path) -> pd.DataFrame:
        # This reads the file from disk. There are three problems with the data
        # that need to be addressed before it can be loaded:

        # The data is stored in an xml-like structure where each data row is contained in a node.
        # There is no root node so the document is not valid xml.
        xml = path.read_text(encoding="ISO-8859-1")
        xml = f"<node>{xml}</node>"

        # Secondly the ampersand symbol is not consistently escaped, sometimes
        # appearing as &amp; and sometimes appearing as &.
        xml = xml.replace("&amp;", "&").replace("&", "&amp;")

        # Finally there are invalid characters in the document as the document
        # seems to lack a consistent encoding. It is possible that the xml like
        # structure is in ISO-8859-1 and the contents of each field are in
        # UTF-8?
        parser = etree.XMLParser(ns_clean=True, recover=True)
        tree = etree.fromstring(xml, parser=parser)
        xml = etree.tostring(tree, encoding="utf-8")

        # Pandas can load dataframes from xml. In order to get the character
        # re-encoding to work we must use the etree parser that we used to
        # reencode the xml, instead of the libxml parser (which is faster and
        # is the default).
        return pd.read_xml(xml, parser="etree")

    def split_helpful_column(df: pd.DataFrame) -> pd.DataFrame:
        # There is a "helpful" column which is a review of the review by other
        # users. A review that is more consistently marked as helpful may be
        # higher quality.

        # If this is present then it is a string of the form "N of M" where N
        # is the number of users that considered the review helpful. If no-one
        # has reviewed the column then this value is missing.

        def parse_helpful(value: Union[str, float]) -> Tuple[int, int]:
            if not isinstance(value, str):
                return (0, 0)
            helpful, total = value.split(" of ")
            return int(helpful), int(total)

        def get_helpful(row: Tuple[int, int]) -> int:
            return row[0]

        def get_unhelpful(row: Tuple[int, int]) -> int:
            helpful, total = row
            return total - helpful

        helpful_total = df.helpful.apply(parse_helpful)
        df["helpful"] = helpful_total.apply(get_helpful)
        df["unhelpful"] = helpful_total.apply(get_unhelpful)

        return df

    def clean_rating(df: pd.DataFrame) -> pd.DataFrame:
        # The rating column is the sentiment proxy for the text. Some rows are
        # missing a value for this, and so cannot be used. Once they have been
        # dropped the column can be converted from a float to an int.
        df = df.dropna(subset=["rating"]).copy()
        df["rating"] = df.rating.astype(int)
        return df

    def clean_text(df: pd.DataFrame) -> pd.DataFrame:
        # Some reviews only have a title and no text body. We are not
        # considering the title for this so any rows that are considered blank
        # or too short have to be dropped.
        df = df[df.review_text.str.len() > 10]
        return df

    def parse_date(df: pd.DataFrame) -> pd.DataFrame:
        df["date"] = pd.to_datetime(df.date)
        return df

    def drop_unrelated_columns(df: pd.DataFrame) -> pd.DataFrame:
        # The product_type column is the domain, the rating is the sentiment
        # and the review_text is the text.
        df = df[["product_type", "rating", "review_text"]].copy()
        df = df.rename(
            columns={
                "product_type": "domain",
                "review_text": "text",
            }
        )
        return df

    def rating_to_sentiment(df: pd.DataFrame) -> pd.DataFrame:
        # The rating contains values 1, 2, 4, and 5, being the number of stars
        # assigned to the review by the reviewer.
        df["sentiment"] = df.rating.map(
            {1: "negative", 2: "negative", 4: "positive", 5: "positive"}
        )
        df = df.drop(columns=["rating"])
        return df

    return load_all_files(folder)

Code

#collapse
from pathlib import Path

DOMAIN_DATASET_FOLDER = Path("/data/sentiment/multi-domain-sentiment/sorted_data")

domain_df = load_multi_domain_sentiment(DOMAIN_DATASET_FOLDER)
domain_df.to_parquet(
    "/data/sentiment/multi-domain-sentiment/sentiment-top-5.gz.parquet",
    compression="gzip"
)
display(
    domain_df[["domain", "sentiment"]]
        .value_counts()
        .to_frame()
        .rename(columns={0: "count"})
        .reset_index()
        .sort_values(by=["domain", "sentiment"], ascending=[True, True])
        .set_index(["domain", "sentiment"])
)
domain_df

		count
domain	sentiment
electronics	negative	5048
electronics	positive	17959
kitchen & housewares	negative	4119
kitchen & housewares	positive	15737
music	negative	2441
music	positive	14587
toys & games	negative	2568
toys & games	positive	10579
video	negative	2587
video	positive	12764

	domain	text	sentiment
0	electronics	I have bought and returned three of these unit...	negative
1	electronics	I used a 25 pack of these doing DVD backups, a...	negative
2	electronics	I bought these discs at CompUSA because I need...	negative
3	electronics	The DVDs I burned successfully showed the movi...	negative
4	electronics	Please don't expect to get the cash back from ...	negative
...	...	...	...
15346	video	After watching this documentary, I was left th...	positive
15347	video	I finally made my first purchase from Amazon's...	positive
15348	video	Don't buy this disc unless you are a real Jack...	negative
15349	video	Oh my goodness, they've outlawed sex! That is ...	positive
15350	video	In this erotic science fiction film from the f...	positive

88389 rows × 3 columns

Code

# from src/main/python/blog/domain_shift/data/balance_domain.py
import datasets
import pandas as pd


def make_product_dataset(domain_df: pd.DataFrame, domain: str) -> datasets.DatasetDict:
    """
    This creates a balanced dataset that is limited to the specified domain.
    """

    df = domain_df[domain_df.domain == domain]

    # sample the dataframe to balance the sentiment classes
    positive_df = df[df.sentiment == "positive"]
    negative_df = df[df.sentiment == "negative"]

    smaller_size = min(len(positive_df), len(negative_df))
    positive_df = positive_df.sample(n=smaller_size)
    negative_df = negative_df.sample(n=smaller_size)

    # recombine and shuffle
    df = pd.concat([positive_df, negative_df]).sample(frac=1)

    test_size = min(1_000, len(df) // 4)

    return datasets.Dataset.from_pandas(df).train_test_split(test_size=test_size)

Code

#collapse

electronics_ds = make_product_dataset(domain_df, "electronics")
electronics_ds.save_to_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-electronics.dataset"
)

kitchen_ds = make_product_dataset(domain_df, "kitchen & housewares")
kitchen_ds.save_to_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-kitchen.dataset"
)

music_ds = make_product_dataset(domain_df, "music")
music_ds.save_to_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-music.dataset"
)

toys_ds = make_product_dataset(domain_df, "toys & games")
toys_ds.save_to_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-toys.dataset"
)

video_ds = make_product_dataset(domain_df, "video")
video_ds.save_to_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-video.dataset"
)

Reload Data

This is more for me to make it easy to work with this notebook. By having this step I can run parts of this more easily.

Code

#collapse
import datasets

general_ds = datasets.load_from_disk(
    "/data/sentiment/sentiment140/sentiment.dataset"
)

electronics_ds = datasets.load_from_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-electronics.dataset"
)
kitchen_ds = datasets.load_from_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-kitchen.dataset"
)
music_ds = datasets.load_from_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-music.dataset"
)
toys_ds = datasets.load_from_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-toys.dataset"
)
video_ds = datasets.load_from_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-video.dataset"
)

Methods

Training the different models needs to be consistent and the easiest way to produce consistency is to use the same code. Here I am defining the different methods that are required to train and evaluate the models.

Methods - Training Functions

To consistently train the different models we have a set of methods:

train_classifier_full This trains a normal classifier on the dataset. The classifier can adjust any parameters in the entire model. This provides a baseline to measure against as this domain specific classifier should be the best achievable performance.
train_classifier_base This trains a classifier with a frozen embedding layer. The classifier can be adjusted to become domain specific by swapping out the embedding layer. This provides the base for the domain specific classifier.
train_language_model_embedding This trains an embedding by language model pretraining. The embedding layer is the only part of the model that can be adjusted. This can be swapped into the base classifier to make it domain specific.
get_embedding_parameters_bert This method returns all of the parameters in the model that form the embedding layer. The train_classifier_base and train_language_model_embedding methods use this to either freeze the embedding layer or freeze the model and unfreeze the embedding layer.

Code

# from src/main/python/blog/domain_shift/model/train_classifier.py
from pathlib import Path
from typing import Callable, Dict, List, Optional

import datasets
import torch
import wandb
from transformers import (
    AutoModel,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    Trainer,
    TrainingArguments,
)
from transformers.trainer_utils import EvalPrediction


def train_classifier_full(
    ds: datasets.Dataset,
    *,
    project_name: str,
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    metric: Callable[[EvalPrediction], Dict[str, float]],
    batch_size: int,
    epochs: float = 5,
    **settings,
) -> None:
    """
    This trains the classifier for the single purpose of classifying this
    dataset. The training process has full freedom to alter any and all
    parameters in the model. This should produce a model with the best
    performance possible.
    """
    train_classifier(
        ds=ds,
        train_name="full",
        project_name=project_name,
        model_name=model_name,
        dataset_name=dataset_name,
        data_folder=data_folder,
        metric=metric,
        batch_size=batch_size,
        epochs=epochs,
        **settings,
    )


def train_classifier_base(
    ds: datasets.Dataset,
    *,
    project_name: str,
    model_name: str,
    embedding_accessor: Callable[[AutoModel], List[torch.nn.Parameter]],
    dataset_name: str,
    data_folder: Path,
    metric: Callable[[EvalPrediction], Dict[str, float]],
    batch_size: int,
    epochs: float = 5,
    **settings,
) -> None:
    """
    This trains the classifier for the single purpose of classifying this
    dataset. The training process can alter any parameters except for the
    initial embedding layer. This should produce a model with good
    performance which is compatible with a retrained embedding layer.
    """

    def model_preparation(model: AutoModelForSequenceClassification) -> None:
        for parameter in embedding_accessor(model):
            parameter.requires_grad_(False)

    train_classifier(
        ds=ds,
        train_name="no-embedding",
        project_name=project_name,
        model_name=model_name,
        dataset_name=dataset_name,
        data_folder=data_folder,
        metric=metric,
        batch_size=batch_size,
        epochs=epochs,
        model_preparation=model_preparation,
        **settings,
    )


def train_classifier(
    ds: datasets.Dataset,
    *,
    train_name: str,
    project_name: str,
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    metric: Callable[[EvalPrediction], Dict[str, float]],
    batch_size: int,
    epochs: float = 5,
    model_preparation: Optional[
        Callable[[AutoModelForSequenceClassification], None]
    ] = None,
    **settings,
) -> None:
    """
    This trains the classifier for the single purpose of classifying this
    dataset. The model_preparation function, if provided, can alter the model
    to freeze or alter layers as appropriate.
    """

    # Set default values for training, which can be overridden with the settings
    training_arguments = {
        "per_device_train_batch_size": batch_size,
        "per_device_eval_batch_size": batch_size,
        "num_train_epochs": epochs,
        "learning_rate": 5e-5,
        "warmup_ratio": 0.06,
        "logging_steps": 1_000,
        "save_steps": 1_000,
        "eval_steps": 1_000,
        "metric_for_best_model": "accuracy",
        "greater_is_better": True,
    } | settings

    run_name = f"{train_name}-{model_name}-{dataset_name}-{batch_size}bs-{epochs}e"
    model_run_folder = data_folder / "runs" / run_name
    model_run_folder.mkdir(parents=True, exist_ok=True)
    best_model_folder = data_folder / "best-model" / run_name
    best_model_folder.mkdir(parents=True, exist_ok=True)

    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    if model_preparation is not None:
        model_preparation(model)
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    with wandb.init(
        project=project_name,
        name=run_name,
        mode="online",
    ):
        training_args = TrainingArguments(
            report_to=["wandb"],
            output_dir=model_run_folder / "output",
            logging_dir=model_run_folder / "output",
            overwrite_output_dir=True,
            evaluation_strategy="steps",
            load_best_model_at_end=True,
            **training_arguments,
        )

        trainer = Trainer(
            model=model,
            args=training_args,
            train_dataset=ds["train"],
            eval_dataset=ds["test"],
            tokenizer=tokenizer,
            compute_metrics=metric,
        )

        trainer.train()

    model.save_pretrained(best_model_folder)



# from src/main/python/blog/domain_shift/model/train_language_model.py
from pathlib import Path
from typing import Callable, Dict, List, Optional

import datasets
import torch
import wandb
from transformers import (
    AutoModel,
    AutoModelForMaskedLM,
    AutoTokenizer,
    DataCollatorForLanguageModeling,
    Trainer,
    TrainingArguments,
)
from transformers.trainer_utils import EvalPrediction


def train_language_model_embedding(
    ds: datasets.Dataset,
    *,
    project_name: str,
    model_name: str,
    embedding_accessor: Callable[[AutoModel], List[torch.nn.Parameter]],
    dataset_name: str,
    data_folder: Path,
    batch_size: int,
    epochs: float = 5,
    metric: Optional[Callable[[EvalPrediction], Dict[str, float]]] = None,
    **settings,
) -> None:
    """
    This trains the embedding layer of the language model using language model
    pretraining. This involves adjusting the model to better match the domain
    specific language use.
    """

    def model_preparation(model: AutoModelForMaskedLM) -> None:
        # disable gradient updates on the model
        model.requires_grad_(False)
        # enable gradient updates on the embedding
        for parameter in embedding_accessor(model):
            parameter.requires_grad_(True)

    train_language_model(
        ds=ds,
        train_name="embedding",
        project_name=project_name,
        model_name=model_name,
        dataset_name=dataset_name,
        data_folder=data_folder,
        batch_size=batch_size,
        epochs=epochs,
        metric=metric,
        model_preparation=model_preparation,
        **settings,
    )


def train_language_model(
    ds: datasets.Dataset,
    *,
    train_name: str,
    project_name: str,
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    batch_size: int,
    epochs: float = 5,
    metric: Optional[Callable[[EvalPrediction], Dict[str, float]]] = None,
    model_preparation: Optional[Callable[[AutoModelForMaskedLM], None]] = None,
    save_preparation: Optional[Callable[[AutoModelForMaskedLM], None]] = None,
    **settings,
) -> None:
    # Set default values for training, which can be overridden with the settings
    training_arguments = {
        "per_device_train_batch_size": batch_size,
        "per_device_eval_batch_size": batch_size,
        "num_train_epochs": epochs,
        "learning_rate": 5e-5,
        "warmup_ratio": 0.06,
        "logging_steps": 1_000,
        "save_steps": 1_000,
        "eval_steps": 1_000,
        "metric_for_best_model": "loss",
        "greater_is_better": False,
    } | settings

    run_name = f"{train_name}-{model_name}-{dataset_name}-{batch_size}bs-{epochs}e"
    model_run_folder = data_folder / "runs" / run_name
    model_run_folder.mkdir(parents=True, exist_ok=True)
    best_model_folder = data_folder / "best-model" / run_name
    best_model_folder.mkdir(parents=True, exist_ok=True)

    model = AutoModelForMaskedLM.from_pretrained(model_name)
    if model_preparation is not None:
        model_preparation(model)
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    non_text_columns = set(ds["train"].column_names) - set(["input_ids"])
    ds = ds.remove_columns(non_text_columns)

    # there is a problem running the evaluation over more than 100 rows
    test_ds = datasets.Dataset.from_dict(ds["test"][:100])

    with wandb.init(
        project=project_name,
        name=run_name,
        mode="online",
    ):
        training_args = TrainingArguments(
            report_to=["wandb"],
            output_dir=model_run_folder / "output",
            logging_dir=model_run_folder / "output",
            overwrite_output_dir=True,
            evaluation_strategy="steps",
            load_best_model_at_end=True,
            **training_arguments,
        )

        trainer = Trainer(
            model=model,
            args=training_args,
            train_dataset=ds["train"],
            eval_dataset=test_ds,
            data_collator=DataCollatorForLanguageModeling(
                tokenizer=tokenizer,
                mlm=True,
            ),
            tokenizer=tokenizer,
            compute_metrics=metric,
        )

        trainer.train()

    if save_preparation is not None:
        save_preparation(model)
    model.save_pretrained(best_model_folder)



# from src/main/python/blog/domain_shift/model/embedding.py
from typing import List

import torch
from transformers import BertModel


def get_embedding_parameters_bert(model: BertModel) -> List[torch.nn.Parameter]:
    # Given a classification model, base_model returns the core bert model
    # without the classification head. Given the core bert model, base_model
    # returns the core bert model again! This means this approach works with
    # any kind of bert model.
    return list(model.base_model.embeddings.parameters())

Methods - Evaluation

After training the model we need a way to evaluate it.

evaluate_classifier This evaluates a classification model without altering it.
evaluate_combined_classifier This evaluates a classification model made from a base model combined with the embedding layer of a pretrained language model.

Code

# from src/main/python/blog/domain_shift/model/evaluate.py
from pathlib import Path
from typing import Callable, Dict, List, Optional

import datasets
import torch
from transformers import (
    AutoModel,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    Trainer,
    TrainingArguments,
)
from transformers.trainer_utils import EvalPrediction


@torch.no_grad()
def evaluate_classifier(
    ds: datasets.Dataset,
    *,
    model_name: str,
    model: AutoModelForSequenceClassification,
    batch_size: int,
    data_folder: Path,
    metric: Optional[Callable[[EvalPrediction], Dict[str, float]]] = None,
) -> Dict[str, float]:
    model_run_folder = data_folder / "evaluation"
    model_run_folder.mkdir(parents=True, exist_ok=True)

    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model.eval()

    training_args = TrainingArguments(
        report_to=[],
        output_dir=model_run_folder / "output",
        logging_dir=model_run_folder / "output",
        overwrite_output_dir=True,
        num_train_epochs=1,
        per_device_eval_batch_size=batch_size,
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=ds["train"],
        eval_dataset=ds["test"],
        tokenizer=tokenizer,
        compute_metrics=metric,
    )

    return trainer.evaluate()


@torch.no_grad()
def evaluate_combined_classifier(
    ds: datasets.Dataset,
    *,
    model_name: str,
    base_model: AutoModelForSequenceClassification,
    embedding_model: AutoModel,
    embedding_accessor: Callable[[AutoModel], List[torch.nn.Parameter]],
    batch_size: int,
    data_folder: Path,
    metric: Optional[Callable[[EvalPrediction], Dict[str, float]]] = None,
) -> Dict[str, float]:
    model_run_folder = data_folder / "evaluation"
    model_run_folder.mkdir(parents=True, exist_ok=True)

    tokenizer = AutoTokenizer.from_pretrained(model_name)
    for model_parameter, embedding_parameter in zip(
        embedding_accessor(base_model),
        embedding_accessor(embedding_model),
    ):
        model_parameter.data = embedding_parameter.data
    base_model.eval()

    training_args = TrainingArguments(
        report_to=[],
        output_dir=model_run_folder / "output",
        logging_dir=model_run_folder / "output",
        overwrite_output_dir=True,
        num_train_epochs=1,
        per_device_eval_batch_size=batch_size,
    )

    trainer = Trainer(
        model=base_model,
        args=training_args,
        train_dataset=ds["train"],
        eval_dataset=ds["test"],
        tokenizer=tokenizer,
        compute_metrics=metric,
    )

    return trainer.evaluate()

To load the models we also have:

load_classifier_full This loads a classifier model created by train_classifier_full.
load_classifier_base This loads a classifier model created by train_classifier_base.
load_language_model_embedding This loads a language model created by train_language_model_embedding.

Code

# from src/main/python/blog/domain_shift/model/load.py
from pathlib import Path

from transformers import AutoModelForMaskedLM, AutoModelForSequenceClassification


def load_classifier_full(
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    batch_size: int,
    epochs: float = 5,
) -> AutoModelForSequenceClassification:
    return load_classifier(
        train_name="full",
        model_name=model_name,
        dataset_name=dataset_name,
        data_folder=data_folder,
        batch_size=batch_size,
        epochs=epochs,
    )


def load_classifier_base(
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    batch_size: int,
    epochs: float = 5,
) -> AutoModelForSequenceClassification:
    return load_classifier(
        train_name="no-embedding",
        model_name=model_name,
        dataset_name=dataset_name,
        data_folder=data_folder,
        batch_size=batch_size,
        epochs=epochs,
    )


def load_classifier(
    *,
    train_name: str,
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    batch_size: int,
    epochs: float = 5,
) -> AutoModelForSequenceClassification:
    run_name = f"{train_name}-{model_name}-{dataset_name}-{batch_size}bs-{epochs}e"
    best_model_folder = data_folder / "best-model" / run_name
    return AutoModelForSequenceClassification.from_pretrained(best_model_folder)


def load_language_model_embedding(
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    batch_size: int,
    epochs: float = 5,
) -> AutoModelForSequenceClassification:
    return load_language_model(
        train_name="embedding",
        model_name=model_name,
        dataset_name=dataset_name,
        data_folder=data_folder,
        batch_size=batch_size,
        epochs=epochs,
    )


def load_language_model_embedding_overlay(
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    batch_size: int,
    epochs: float = 5,
) -> AutoModelForSequenceClassification:
    return load_language_model(
        train_name="embedding-overlay",
        model_name=model_name,
        dataset_name=dataset_name,
        data_folder=data_folder,
        batch_size=batch_size,
        epochs=epochs,
    )


def load_language_model(
    *,
    train_name: str,
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    batch_size: int,
    epochs: float = 5,
) -> AutoModelForMaskedLM:
    run_name = f"{train_name}-{model_name}-{dataset_name}-{batch_size}bs-{epochs}e"
    best_model_folder = data_folder / "best-model" / run_name
    return AutoModelForMaskedLM.from_pretrained(best_model_folder)



# from src/main/python/blog/domain_shift/model/layer/embedding_overlay.py
import torch
from torch import nn
from transformers.models.bert.modeling_bert import BertEmbeddings, BertModel


class EmbeddingOverlay(nn.Module):
    def __init__(self, embedding: BertEmbeddings, device: str) -> None:
        super().__init__()
        self.embedding = embedding
        self.base = embedding.word_embeddings.weight
        self.overlay = torch.zeros_like(self.base, device=device)
        embedding.word_embeddings.weight = nn.Parameter(
            torch.zeros_like(self.base, device=device)
        )

    @classmethod
    def update_model(cls, model: BertModel) -> None:
        embedding = cls(
            model.base_model.embeddings,
            device="cuda" if torch.cuda.is_available() else "cpu",
        )
        model.base_model.embeddings = embedding
        model.requires_grad_(False)
        embedding.overlay.requires_grad_(True)

    @staticmethod
    def restore_model(model: BertModel) -> None:
        embedding = model.base_model.embeddings.to_embedding()
        model.base_model.embeddings = embedding

    def forward(self, *args, **kwargs) -> torch.Tensor:
        return self.to_embedding().forward(*args, **kwargs)

    def to_embedding(self) -> BertEmbeddings:
        self.embedding.word_embeddings.weight = nn.Parameter(self.base + self.overlay)
        return self.embedding

Experiments

First we need to define the code to train and evaluate the models, then we can run the experiments.

Code

import blog.transformers_logging

Experiment - Define Metrics

We need a way to measure the performance of the model. Since this is a two class problem accuracy is a sufficient metric.

Code

# from src/main/python/blog/metrics/accuracy.py
from typing import Dict

from sklearn.metrics import accuracy_score
from transformers.trainer_utils import EvalPrediction


def metric_accuracy(model_output: EvalPrediction) -> Dict[str, float]:
    predictions = model_output.predictions.argmax(axis=1)
    targets = model_output.label_ids
    accuracy = accuracy_score(targets, predictions)
    return {"accuracy": accuracy}



# from src/main/python/blog/metrics/perplexity.py
import torch
import torch.nn.functional as F
from transformers.trainer_utils import EvalPrediction


def metric_perplexity_bert(model_output: EvalPrediction, vocab_size: int = 30_522):
    # This loss calculation comes directly from the BERT forward method
    labels = torch.tensor(model_output.label_ids)
    lm_logits = torch.tensor(model_output.predictions)

    loss = F.cross_entropy(lm_logits.view(-1, vocab_size), labels.view(-1))

    perplexity = torch.exp(loss)
    return {"perplexity": perplexity.item()}


def metric_perplexity_gpt2(model_output: EvalPrediction):
    # This loss calculation comes directly from the GPT2 forward method
    # that handles correctly offsetting the labels to match the positions that are predicting

    labels = torch.tensor(model_output.label_ids)
    lm_logits = torch.tensor(model_output.predictions)

    # Shift so that tokens < n predict n
    shift_logits = lm_logits[..., :-1, :].contiguous()
    shift_labels = labels[..., 1:].contiguous()
    # Flatten the tokens
    loss = F.cross_entropy(
        shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1)
    )

    perplexity = torch.exp(loss)
    return {"perplexity": perplexity.item()}

For the language model pretraining we need a perplexity measure.

Experiment - BERT

This is going to evaluate BERT for this task.

The first stage will be to evaluate a pure classifier trained on the general dataset and on each domain dataset. This will establish a baseline.

Then the base sentiment model will be trained which will have a frozen embedding layer. After that the masked language model can be pretrained with the domain specific text, only training the embedding layer. I can also try restricting that to having weight decay only over the alteration to the base embeddings.

Code

MODEL_NAME = "bert-base-uncased"

Experiment - BERT - Encode Datasets

The raw datasets need to be encoded using the BERT tokenizer.

Code

#collapse
#hide_output
from typing import Any, Dict
from transformers import BertTokenizerFast

tokenizer = BertTokenizerFast.from_pretrained(MODEL_NAME)

sentiment_index = {
    "negative": 0,
    "positive": 1,
}

def encode(row: Dict[str, Any]) -> Dict[str, Any]:
    return {
        "input_ids": tokenizer(row["text"], truncation=True).input_ids,
        "label": sentiment_index[row["sentiment"]],
    }

general_ds = general_ds.map(encode)
electronics_ds = electronics_ds.map(encode)
kitchen_ds = kitchen_ds.map(encode)
music_ds = music_ds.map(encode)
toys_ds = toys_ds.map(encode)
video_ds = video_ds.map(encode)

Experiment - BERT - Establish Baselines

This trains a separate model for each task to provide a baseline for comparison.

Code

#hide_output
train_classifier_full(
    ds=general_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="general",
    data_folder=DATA_FOLDER,
    metric=metric_accuracy,
    batch_size=64,
    epochs=5,
)

wandb: Currently logged in as: brandwatch-ml (use `wandb login --relogin` to force relogin)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

Tracking run with wandb version 0.10.33
Syncing run full-bert-base-uncased-general-64bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/eyn03xlb
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220212_193000-eyn03xlb

Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

[124220/124220 7:43:16, Epoch 5/5]

Step	Training Loss	Validation Loss	Accuracy
1000	0.513200	0.403615	0.824300
2000	0.386900	0.369579	0.838800
3000	0.362800	0.352850	0.849500
4000	0.353500	0.352404	0.850100
5000	0.351500	0.345889	0.853700
6000	0.349400	0.340666	0.854200
7000	0.345300	0.344296	0.854400
8000	0.340700	0.352917	0.850300
9000	0.341600	0.341545	0.852500
10000	0.336900	0.332418	0.858500
11000	0.337300	0.334889	0.857700
12000	0.328600	0.343519	0.853100
13000	0.330900	0.335796	0.862200
14000	0.323400	0.325259	0.861300
15000	0.327800	0.337436	0.855200
16000	0.325600	0.340725	0.861600
17000	0.324700	0.329951	0.866400
18000	0.324500	0.316455	0.868000
19000	0.317800	0.314791	0.867000
20000	0.317200	0.312795	0.868500
21000	0.321100	0.312556	0.868100
22000	0.318400	0.315015	0.868300
23000	0.319700	0.313522	0.867100
24000	0.315900	0.311461	0.869200
25000	0.311100	0.322506	0.873000
26000	0.277000	0.310173	0.870000
27000	0.280600	0.319567	0.867000
28000	0.277400	0.314650	0.870900
29000	0.278500	0.311275	0.870000
30000	0.278100	0.318782	0.871400
31000	0.277300	0.306667	0.870800
32000	0.276500	0.309154	0.870300
33000	0.277400	0.320674	0.868100
34000	0.279300	0.318263	0.872000
35000	0.282800	0.307125	0.873200
36000	0.277600	0.320166	0.873300
37000	0.278900	0.309924	0.870800
38000	0.280500	0.312997	0.870400
39000	0.275800	0.317793	0.868100
40000	0.280100	0.305172	0.871300
41000	0.282300	0.312909	0.871100
42000	0.282300	0.305909	0.873700
43000	0.277500	0.310162	0.869600
44000	0.277100	0.313971	0.869600
45000	0.275900	0.319665	0.872500
46000	0.277900	0.318283	0.868900
47000	0.281000	0.313041	0.872800
48000	0.276200	0.300781	0.877500
49000	0.281300	0.307026	0.875800
50000	0.259300	0.325227	0.871600
51000	0.221100	0.325990	0.871500
52000	0.223000	0.337404	0.874800
53000	0.215900	0.340205	0.872400
54000	0.217100	0.325959	0.873400
55000	0.219700	0.329393	0.872000
56000	0.221900	0.325739	0.871300
57000	0.224400	0.327636	0.870800
58000	0.222300	0.341080	0.872400
59000	0.226100	0.332482	0.869100
60000	0.223500	0.323489	0.874300
61000	0.226700	0.316245	0.872000
62000	0.222300	0.325050	0.875100
63000	0.226700	0.325248	0.874400
64000	0.225100	0.322689	0.871700
65000	0.225300	0.321286	0.871200
66000	0.224800	0.334543	0.870800
67000	0.226700	0.326299	0.869100
68000	0.224900	0.329174	0.870800
69000	0.221800	0.323690	0.871900
70000	0.225800	0.318156	0.872000
71000	0.224500	0.326445	0.873700
72000	0.226000	0.328679	0.869900
73000	0.224500	0.316885	0.876000
74000	0.222900	0.316419	0.873000
75000	0.196700	0.378959	0.870200
76000	0.159600	0.375573	0.873600
77000	0.155600	0.379308	0.872300
78000	0.159400	0.365117	0.872600
79000	0.159700	0.387099	0.870200
80000	0.163400	0.381868	0.868100
81000	0.163000	0.369293	0.871300
82000	0.161500	0.361120	0.866600
83000	0.161300	0.381293	0.869200
84000	0.161400	0.381637	0.869200
85000	0.163700	0.378771	0.867900
86000	0.165200	0.372763	0.868500
87000	0.164500	0.372205	0.869500
88000	0.164000	0.387928	0.869300
89000	0.163700	0.366503	0.871200
90000	0.165600	0.371311	0.870200
91000	0.165700	0.368546	0.870400
92000	0.161900	0.390258	0.866100
93000	0.160700	0.373525	0.868200
94000	0.162000	0.359105	0.869300
95000	0.164800	0.380203	0.868400
96000	0.161400	0.366745	0.871500
97000	0.160300	0.379058	0.873300
98000	0.163600	0.377647	0.869500
99000	0.162400	0.376305	0.872400
100000	0.133200	0.441574	0.870600
101000	0.112200	0.450519	0.868300
102000	0.111800	0.461471	0.867600
103000	0.111000	0.453476	0.868500
104000	0.112100	0.464585	0.870000
105000	0.108200	0.461612	0.869700
106000	0.115700	0.450644	0.870300
107000	0.112700	0.463014	0.868500
108000	0.112600	0.454774	0.870100
109000	0.113800	0.455071	0.869800
110000	0.113400	0.474106	0.866800
111000	0.113100	0.450578	0.868300
112000	0.113700	0.453555	0.868100
113000	0.109200	0.453771	0.869300
114000	0.114500	0.449492	0.869100
115000	0.109200	0.450538	0.867900
116000	0.110700	0.463325	0.870000
117000	0.109900	0.464494	0.869500
118000	0.104600	0.462927	0.867900
119000	0.109000	0.455559	0.870600
120000	0.112200	0.450070	0.869400
121000	0.109800	0.447597	0.869500
122000	0.109600	0.451558	0.869200
123000	0.109600	0.451785	0.869400
124000	0.108400	0.454002	0.869700

Waiting for W&B process to finish, PID 2581287
Program ended successfully.

Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220212_193000-eyn03xlb/logs/debug.log

Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220212_193000-eyn03xlb/logs/debug-internal.log

Run summary:

train/loss	0.1084
train/learning_rate	0.0
train/epoch	5.0
train/global_step	124220
_runtime	27799
_timestamp	1644721999
_step	248
eval/loss	0.454
eval/accuracy	0.8697
eval/runtime	10.5481
eval/samples_per_second	948.04
eval/steps_per_second	14.884
train/train_runtime	27796.6567
train/train_samples_per_second	286.006
train/train_steps_per_second	4.469
train/total_flos	2.0687274460977792e+17
train/train_loss	0.2234

Run history:

train/loss	█▅▅▅▅▅▅▅▄▄▄▄▄▄▄▄▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁
train/learning_rate	▂▅████▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▁▁▁
train/epoch	▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/global_step	▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_runtime	▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_timestamp	▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_step	▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
eval/loss	▅▃▃▂▂▃▂▂▁▁▁▁▂▁▂▁▂▂▂▂▂▂▂▂▄▅▄▄▄▅▄▄▇██▇▇█▇█
eval/accuracy	▁▄▅▆▆▆▇▇▇▇▇▇▇█▇█▇▇▇█▇▇▇█▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
eval/runtime	▁▅▆▆▇▆▇▇▇▇▇▇▇▇▇▇▇▇█▇██▇▇█▇█▇█▇███▇▇█▇██▇
eval/samples_per_second	█▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▂▂▁▂▁▂▁▂▁▁▁▂▂▁▁▁▁▂
eval/steps_per_second	█▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▂▂▁▂▁▂▁▂▁▁▁▂▂▁▁▁▁▂
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁
train/total_flos	▁
train/train_loss	▁

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced full-bert-base-uncased-general-64bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/eyn03xlb

Code

#hide_output
train_classifier_full(
    ds=electronics_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-electronics",
    data_folder=DATA_FOLDER,
    metric=metric_accuracy,
    batch_size=16,
    epochs=5,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

Tracking run with wandb version 0.10.33
Syncing run full-bert-base-uncased-domain-electronics-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/2cqwf0qp
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_033833-2cqwf0qp

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

[2845/2845 20:24, Epoch 5/5]

Step	Training Loss	Validation Loss	Accuracy
1000	0.258500	0.168365	0.950000
2000	0.068000	0.226074	0.958000

Waiting for W&B process to finish, PID 2608181
Program ended successfully.

Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_033833-2cqwf0qp/logs/debug.log

Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_033833-2cqwf0qp/logs/debug-internal.log

Run summary:

train/loss	0.068
train/learning_rate	2e-05
train/epoch	5.0
train/global_step	2845
_runtime	1226
_timestamp	1644724739
_step	4
eval/loss	0.22607
eval/accuracy	0.958
eval/runtime	8.4727
eval/samples_per_second	118.026
eval/steps_per_second	7.436
train/train_runtime	1225.0684
train/train_samples_per_second	37.124
train/train_steps_per_second	2.322
train/total_flos	9167977279601760.0
train/train_loss	0.1197

Run history:

train/loss	█▁
train/learning_rate	█▁
train/epoch	▁▁▅▅█
train/global_step	▁▁▅▅█
_runtime	▁▁▅▅█
_timestamp	▁▁▅▅█
_step	▁▃▅▆█
eval/loss	▁█
eval/accuracy	▁█
eval/runtime	▁█
eval/samples_per_second	█▁
eval/steps_per_second	█▁
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁
train/total_flos	▁
train/train_loss	▁

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced full-bert-base-uncased-domain-electronics-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/2cqwf0qp

Code

#hide_output
train_classifier_full(
    ds=kitchen_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-kitchen",
    data_folder=DATA_FOLDER,
    metric=metric_accuracy,
    batch_size=16,
    epochs=5,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

Tracking run with wandb version 0.10.33
Syncing run full-bert-base-uncased-domain-kitchen-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/22i0d0ij
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_035925-22i0d0ij

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

[2265/2265 14:05, Epoch 5/5]

Step	Training Loss	Validation Loss	Accuracy
1000	0.208400	0.314208	0.928000
2000	0.040300	0.361787	0.940000

Waiting for W&B process to finish, PID 2609298
Program ended successfully.

Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_035925-22i0d0ij/logs/debug.log

Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_035925-22i0d0ij/logs/debug-internal.log

Run summary:

train/loss	0.0403
train/learning_rate	1e-05
train/epoch	5.0
train/global_step	2265
_runtime	846
_timestamp	1644725611
_step	4
eval/loss	0.36179
eval/accuracy	0.94
eval/runtime	7.5391
eval/samples_per_second	132.642
eval/steps_per_second	8.356
train/train_runtime	845.3493
train/train_samples_per_second	42.811
train/train_steps_per_second	2.679
train/total_flos	6108018316871280.0
train/train_loss	0.11069

Run history:

train/loss	█▁
train/learning_rate	█▁
train/epoch	▁▁▇▇█
train/global_step	▁▁▇▇█
_runtime	▁▁▆▇█
_timestamp	▁▁▆▇█
_step	▁▃▅▆█
eval/loss	▁█
eval/accuracy	▁█
eval/runtime	█▁
eval/samples_per_second	▁█
eval/steps_per_second	▁█
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁
train/total_flos	▁
train/train_loss	▁

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced full-bert-base-uncased-domain-kitchen-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/22i0d0ij

Code

#hide_output
train_classifier_full(
    ds=music_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-music",
    data_folder=DATA_FOLDER,
    metric=metric_accuracy,
    batch_size=16,
    epochs=5,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

Tracking run with wandb version 0.10.33
Syncing run full-bert-base-uncased-domain-music-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/2ynufcz4
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_041355-2ynufcz4

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

[1215/1215 10:40, Epoch 5/5]

Step	Training Loss	Validation Loss	Accuracy
1000	0.165200	0.389923	0.929000

Waiting for W&B process to finish, PID 2610061
Program ended successfully.

Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_041355-2ynufcz4/logs/debug.log

Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_041355-2ynufcz4/logs/debug-internal.log

Run summary:

train/loss	0.1652
train/learning_rate	1e-05
train/epoch	5.0
train/global_step	1215
_runtime	642
_timestamp	1644726277
_step	2
eval/loss	0.38992
eval/accuracy	0.929
eval/runtime	11.275
eval/samples_per_second	88.692
eval/steps_per_second	5.588
train/train_runtime	641.3766
train/train_samples_per_second	30.263
train/train_steps_per_second	1.894
train/total_flos	4571028364769280.0
train/train_loss	0.1399

Run history:

train/loss	▁
train/learning_rate	▁
train/epoch	▁▁█
train/global_step	▁▁█
_runtime	▁▂█
_timestamp	▁▂█
_step	▁▅█
eval/loss	▁
eval/accuracy	▁
eval/runtime	▁
eval/samples_per_second	▁
eval/steps_per_second	▁
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁
train/total_flos	▁
train/train_loss	▁

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced full-bert-base-uncased-domain-music-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/2ynufcz4

Code

#hide_output
train_classifier_full(
    ds=toys_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-toys",
    data_folder=DATA_FOLDER,
    metric=metric_accuracy,
    batch_size=16,
    epochs=5,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

Tracking run with wandb version 0.10.33
Syncing run full-bert-base-uncased-domain-toys-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/16f10f6f
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_042455-16f10f6f

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

[1295/1295 07:28, Epoch 5/5]

Step	Training Loss	Validation Loss	Accuracy
1000	0.178000	0.477703	0.910000

Waiting for W&B process to finish, PID 2610605
Program ended successfully.

Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_042455-16f10f6f/logs/debug.log

Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_042455-16f10f6f/logs/debug-internal.log

Run summary:

train/loss	0.178
train/learning_rate	1e-05
train/epoch	5.0
train/global_step	1295
_runtime	450
_timestamp	1644726745
_step	2
eval/loss	0.4777
eval/accuracy	0.91
eval/runtime	7.1618
eval/samples_per_second	139.631
eval/steps_per_second	8.797
train/train_runtime	449.3178
train/train_samples_per_second	46.025
train/train_steps_per_second	2.882
train/total_flos	3259571864878560.0
train/train_loss	0.14495

Run history:

train/loss	▁
train/learning_rate	▁
train/epoch	▁▁█
train/global_step	▁▁█
_runtime	▁▁█
_timestamp	▁▁█
_step	▁▅█
eval/loss	▁
eval/accuracy	▁
eval/runtime	▁
eval/samples_per_second	▁
eval/steps_per_second	▁
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁
train/total_flos	▁
train/train_loss	▁

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced full-bert-base-uncased-domain-toys-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/16f10f6f

Code

#hide_output
train_classifier_full(
    ds=video_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-video",
    data_folder=DATA_FOLDER,
    metric=metric_accuracy,
    batch_size=16,
    epochs=5,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

Tracking run with wandb version 0.10.33
Syncing run full-bert-base-uncased-domain-video-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/iouz3zx9
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_043243-iouz3zx9

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

[1305/1305 11:59, Epoch 5/5]

Step	Training Loss	Validation Loss	Accuracy
1000	0.143700	0.310546	0.944000

Waiting for W&B process to finish, PID 2611053
Program ended successfully.

Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_043243-iouz3zx9/logs/debug.log

Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_043243-iouz3zx9/logs/debug-internal.log

Run summary:

train/loss	0.1437
train/learning_rate	1e-05
train/epoch	5.0
train/global_step	1305
_runtime	721
_timestamp	1644727485
_step	2
eval/loss	0.31055
eval/accuracy	0.944
eval/runtime	11.8683
eval/samples_per_second	84.258
eval/steps_per_second	5.308
train/train_runtime	720.419
train/train_samples_per_second	28.969
train/train_steps_per_second	1.811
train/total_flos	5153342463603840.0
train/train_loss	0.11579

Run history:

train/loss	▁
train/learning_rate	▁
train/epoch	▁▁█
train/global_step	▁▁█
_runtime	▁▁█
_timestamp	▁▁█
_step	▁▅█
eval/loss	▁
eval/accuracy	▁
eval/runtime	▁
eval/samples_per_second	▁
eval/steps_per_second	▁
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁
train/total_flos	▁
train/train_loss	▁

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced full-bert-base-uncased-domain-video-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/iouz3zx9

Experiment - BERT - Train Fixed Embedding Base

This trains a model for general sentiment classification with a frozen embedding layer. The resulting model will be used as the base for the domain adjusted models.

Code

#hide_output
train_classifier_base(
    ds=general_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    embedding_accessor=get_embedding_parameters_bert,
    dataset_name="general",
    data_folder=DATA_FOLDER,
    metric=metric_accuracy,
    batch_size=64,
    epochs=5,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

Tracking run with wandb version 0.10.33
Syncing run no-embedding-bert-base-uncased-general-64bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/1c4qi8r3
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_044503-1c4qi8r3

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

[124220/124220 7:29:53, Epoch 5/5]

Step	Training Loss	Validation Loss	Accuracy
1000	0.511300	0.406170	0.822400
2000	0.388700	0.371869	0.837000
3000	0.364600	0.354540	0.848000
4000	0.355000	0.356526	0.849200
5000	0.353200	0.346191	0.852400
6000	0.350800	0.341020	0.855100
7000	0.345700	0.343301	0.853700
8000	0.341700	0.352299	0.853900
9000	0.341200	0.339812	0.855800
10000	0.337600	0.333934	0.858800
11000	0.337800	0.333922	0.858500
12000	0.328800	0.339086	0.856300
13000	0.331700	0.329456	0.863300
14000	0.322400	0.325021	0.860800
15000	0.326400	0.341783	0.855300
16000	0.326400	0.338934	0.862700
17000	0.323400	0.331051	0.866300
18000	0.325000	0.319770	0.865600
19000	0.315500	0.309761	0.870200
20000	0.318200	0.311332	0.869200
21000	0.320600	0.314540	0.866800
22000	0.317600	0.313203	0.869900
23000	0.319400	0.311689	0.869800
24000	0.315400	0.306948	0.870800
25000	0.310800	0.316544	0.871000
26000	0.280500	0.305228	0.873900
27000	0.284600	0.315119	0.869700
28000	0.282100	0.316112	0.871200
29000	0.281500	0.311524	0.870400
30000	0.280600	0.318128	0.872500
31000	0.280600	0.310752	0.871400
32000	0.278800	0.309930	0.873400
33000	0.281400	0.318493	0.872800
34000	0.283400	0.321827	0.872100
35000	0.286400	0.307348	0.872800
36000	0.280200	0.323695	0.870700
37000	0.282700	0.306045	0.871800
38000	0.284300	0.306637	0.869500
39000	0.279700	0.313403	0.869800
40000	0.282500	0.301105	0.873000
41000	0.283600	0.306601	0.874800
42000	0.285000	0.307430	0.873400
43000	0.279200	0.305863	0.871100
44000	0.277800	0.307858	0.871100
45000	0.279200	0.306109	0.871500
46000	0.280300	0.312159	0.870600
47000	0.282800	0.310434	0.875600
48000	0.281400	0.302623	0.878500
49000	0.282900	0.302531	0.880000
50000	0.264000	0.309589	0.873400
51000	0.229000	0.321463	0.873500
52000	0.230600	0.341251	0.871700
53000	0.227100	0.330379	0.872000
54000	0.227500	0.321092	0.871200
55000	0.229700	0.319525	0.873400
56000	0.231000	0.317524	0.872500
57000	0.234000	0.315405	0.871000
58000	0.230900	0.317642	0.871800
59000	0.233800	0.321154	0.872600
60000	0.231900	0.318218	0.874600
61000	0.233800	0.316618	0.875300
62000	0.232700	0.325526	0.875100
63000	0.235600	0.320183	0.873000
64000	0.235500	0.319495	0.871600
65000	0.234800	0.314187	0.873500
66000	0.232800	0.325829	0.873500
67000	0.234400	0.311478	0.871800
68000	0.232200	0.317831	0.873600
69000	0.232300	0.311483	0.874300
70000	0.235600	0.305324	0.877200
71000	0.234200	0.315522	0.872300
72000	0.232900	0.319899	0.868900
73000	0.234100	0.308927	0.874900
74000	0.230300	0.308656	0.872400
75000	0.208900	0.360322	0.872700
76000	0.175900	0.357876	0.870900
77000	0.172800	0.349695	0.872500
78000	0.174800	0.351596	0.874000
79000	0.173700	0.359032	0.873700
80000	0.177600	0.355052	0.872600
81000	0.177800	0.353396	0.869900
82000	0.177200	0.346814	0.870100
83000	0.174200	0.354160	0.874200
84000	0.177300	0.355552	0.868400
85000	0.180100	0.355578	0.869500
86000	0.178800	0.358164	0.869500
87000	0.177000	0.355007	0.871100
88000	0.180300	0.348180	0.872800
89000	0.176000	0.346942	0.869900
90000	0.179300	0.344493	0.871400
91000	0.181100	0.344836	0.870600
92000	0.180300	0.353251	0.872400
93000	0.175700	0.354549	0.870000
94000	0.176000	0.353133	0.870100
95000	0.179900	0.374774	0.868700
96000	0.173200	0.355910	0.870800
97000	0.176100	0.361503	0.872800
98000	0.176800	0.356372	0.869200
99000	0.176300	0.360096	0.874800
100000	0.148100	0.418238	0.870900
101000	0.129100	0.416172	0.870400
102000	0.131000	0.419957	0.868900
103000	0.129700	0.409167	0.869400
104000	0.127100	0.430586	0.872300
105000	0.126700	0.428001	0.867700
106000	0.130300	0.415673	0.868400
107000	0.132300	0.414793	0.868600
108000	0.128100	0.438585	0.869700
109000	0.129900	0.424922	0.869000
110000	0.128400	0.429560	0.868200
111000	0.129300	0.420508	0.869400
112000	0.128200	0.426265	0.869900
113000	0.126300	0.419459	0.868500
114000	0.130100	0.412851	0.869100
115000	0.127900	0.417601	0.869100
116000	0.129400	0.421481	0.869300
117000	0.126300	0.432261	0.868900
118000	0.122100	0.426714	0.868600
119000	0.127300	0.419353	0.869800
120000	0.127800	0.418641	0.868600
121000	0.123400	0.421648	0.867300
122000	0.127500	0.419938	0.867600
123000	0.125900	0.421461	0.867100
124000	0.123900	0.422832	0.867600

wandb: Network error (ReadTimeout), entering retry loop.
wandb: Network error resolved after 0:00:42.418510, resuming normal operation.

Waiting for W&B process to finish, PID 2611682
Program ended successfully.

Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_044503-1c4qi8r3/logs/debug.log

Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_044503-1c4qi8r3/logs/debug-internal.log

Run summary:

train/loss	0.1239
train/learning_rate	0.0
train/epoch	5.0
train/global_step	124220
_runtime	26994
_timestamp	1644754497
_step	248
eval/loss	0.42283
eval/accuracy	0.8676
eval/runtime	10.5191
eval/samples_per_second	950.653
eval/steps_per_second	14.925
train/train_runtime	26993.4931
train/train_samples_per_second	294.515
train/train_steps_per_second	4.602
train/total_flos	2.0687274460977792e+17
train/train_loss	0.2321

Run history:

train/loss	█▅▅▅▅▅▄▅▄▄▄▄▄▄▄▄▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁
train/learning_rate	▂▅████▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▁▁▁
train/epoch	▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/global_step	▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_runtime	▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_timestamp	▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_step	▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
eval/loss	▆▄▃▃▂▃▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▁▁▁▄▄▄▄▃▄▅▄▇▇█▇▇█▇▇
eval/accuracy	▁▄▅▆▆▆▇▇▇▇▇▇▇▇▇█▇▇▇█▇▇██▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
eval/runtime	▄▃▃▂▃▂▃▂▂▂▂▃▄▄▂▃▅▅▆█▇▆▅▆▅▃▄▆▄▄▂▁▄▃▃▂▁▁▄▄
eval/samples_per_second	▅▆▆▇▆▇▆▇▇▇▇▆▅▅▇▆▄▄▃▁▂▃▄▃▄▆▅▃▅▅▇█▅▆▆▇██▅▅
eval/steps_per_second	▅▆▆▇▆▇▆▇▇▇▇▆▅▅▇▆▄▄▃▁▂▃▄▃▄▆▅▃▅▅▇█▅▆▆▇██▅▅
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁
train/total_flos	▁
train/train_loss	▁

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced no-embedding-bert-base-uncased-general-64bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/1c4qi8r3

Since this is the model that will be adjusted we can check that the evaluate method produces consistent results. The best evaluation was this one:

Step	Training Loss	Validation Loss	Accuracy
49000	0.282900	0.302531	0.880000

Code

evaluate_classifier(
    ds=general_ds,
    model_name=MODEL_NAME,
    model=load_classifier_base(
        model_name=MODEL_NAME,
        dataset_name="general",
        data_folder=DATA_FOLDER,
        batch_size=64,
        epochs=5,
    ),
    batch_size=64,
    data_folder=DATA_FOLDER,
    metric=metric_accuracy,
)

PyTorch: setting up devices

[157/157 00:09]

{'eval_loss': 0.30253100395202637,
 'eval_accuracy': 0.88,
 'eval_runtime': 9.1462,
 'eval_samples_per_second': 1093.35,
 'eval_steps_per_second': 17.166}

We haven’t run the training here so the training loss metric is not produced. The validation loss and accuracy match, accounting for rounding, so I am satisfied that the evaluation code works.

Experiment - BERT - Language Model Pretraining

This adjust the embeddings for a model to match the domain distribution.

Code

#hide_output
train_language_model_embedding(
    ds=electronics_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    embedding_accessor=get_embedding_parameters_bert,
    dataset_name="domain-electronics",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

Tracking run with wandb version 0.10.33
Syncing run embedding-bert-base-uncased-domain-electronics-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/17z6ree0
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_124044-17z6ree0

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

[2845/2845 19:10, Epoch 5/5]

Step	Training Loss	Validation Loss	Perplexity
1000	2.525900	2.186024	9.104887
2000	2.339300	2.050233	7.756434

Waiting for W&B process to finish, PID 2634489
Program ended successfully.

Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_124044-17z6ree0/logs/debug.log

Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_124044-17z6ree0/logs/debug-internal.log

Run summary:

train/loss	2.3393
train/learning_rate	2e-05
train/epoch	5.0
train/global_step	2845
_runtime	1152
_timestamp	1644757196
_step	4
eval/loss	2.05023
eval/perplexity	7.75643
eval/runtime	4.6223
eval/samples_per_second	21.634
eval/steps_per_second	1.514
train/train_runtime	1151.0397
train/train_samples_per_second	39.512
train/train_steps_per_second	2.472
train/total_flos	9171244212184800.0
train/train_loss	2.39209

Run history:

train/loss	█▁
train/learning_rate	█▁
train/epoch	▁▁▅▅█
train/global_step	▁▁▅▅█
_runtime	▁▁▅▅█
_timestamp	▁▁▅▅█
_step	▁▃▅▆█
eval/loss	█▁
eval/perplexity	█▁
eval/runtime	█▁
eval/samples_per_second	▁█
eval/steps_per_second	▁█
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁
train/total_flos	▁
train/train_loss	▁

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced embedding-bert-base-uncased-domain-electronics-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/17z6ree0

Code

#hide_output
train_language_model_embedding(
    ds=kitchen_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    embedding_accessor=get_embedding_parameters_bert,
    dataset_name="domain-kitchen",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

Tracking run with wandb version 0.10.33
Syncing run embedding-bert-base-uncased-domain-kitchen-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/3clte4rs
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_130021-3clte4rs

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

[2265/2265 13:03, Epoch 5/5]

Step	Training Loss	Validation Loss	Perplexity
1000	2.462700	2.134396	8.371171
2000	2.259400	2.161706	8.692562

Waiting for W&B process to finish, PID 2635617
Program ended successfully.

Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_130021-3clte4rs/logs/debug.log

Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_130021-3clte4rs/logs/debug-internal.log

Run summary:

train/loss	2.2594
train/learning_rate	1e-05
train/epoch	5.0
train/global_step	2265
_runtime	785
_timestamp	1644758006
_step	4
eval/loss	2.16171
eval/perplexity	8.69256
eval/runtime	4.6087
eval/samples_per_second	21.698
eval/steps_per_second	1.519
train/train_runtime	784.1761
train/train_samples_per_second	46.15
train/train_steps_per_second	2.888
train/total_flos	6110194858484400.0
train/train_loss	2.34641

Run history:

train/loss	█▁
train/learning_rate	█▁
train/epoch	▁▁▇▇█
train/global_step	▁▁▇▇█
_runtime	▁▁▆▇█
_timestamp	▁▁▆▇█
_step	▁▃▅▆█
eval/loss	▁█
eval/perplexity	▁█
eval/runtime	█▁
eval/samples_per_second	▁▁
eval/steps_per_second	▁▁
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁
train/total_flos	▁
train/train_loss	▁

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced embedding-bert-base-uncased-domain-kitchen-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/3clte4rs

Code

#hide_output
train_language_model_embedding(
    ds=music_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    embedding_accessor=get_embedding_parameters_bert,
    dataset_name="domain-music",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

Tracking run with wandb version 0.10.33
Syncing run embedding-bert-base-uncased-domain-music-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/2g7nseso
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_131348-2g7nseso

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

[1215/1215 10:03, Epoch 5/5]

Step	Training Loss	Validation Loss	Perplexity
1000	2.479700	2.364102	10.497963

Waiting for W&B process to finish, PID 2636367
Program ended successfully.

Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_131348-2g7nseso/logs/debug.log

Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_131348-2g7nseso/logs/debug-internal.log

Run summary:

train/loss	2.4797
train/learning_rate	1e-05
train/epoch	5.0
train/global_step	1215
_runtime	605
_timestamp	1644758633
_step	2
eval/loss	2.3641
eval/perplexity	10.49796
eval/runtime	5.052
eval/samples_per_second	19.794
eval/steps_per_second	1.386
train/train_runtime	604.3056
train/train_samples_per_second	32.12
train/train_steps_per_second	2.011
train/total_flos	4572657212774400.0
train/train_loss	2.46521

Run history:

train/loss	▁
train/learning_rate	▁
train/epoch	▁▁█
train/global_step	▁▁█
_runtime	▁▁█
_timestamp	▁▁█
_step	▁▅█
eval/loss	▁
eval/perplexity	▁
eval/runtime	▁
eval/samples_per_second	▁
eval/steps_per_second	▁
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁
train/total_flos	▁
train/train_loss	▁

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced embedding-bert-base-uncased-domain-music-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/2g7nseso

Code

#hide_output
train_language_model_embedding(
    ds=toys_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    embedding_accessor=get_embedding_parameters_bert,
    dataset_name="domain-toys",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

Tracking run with wandb version 0.10.33
Syncing run embedding-bert-base-uncased-domain-toys-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/3ubu4oj0
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_132411-3ubu4oj0

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

[1295/1295 06:56, Epoch 5/5]

Step	Training Loss	Validation Loss	Perplexity
1000	2.387300	2.285632	9.708434

Waiting for W&B process to finish, PID 2636916
Program ended successfully.

Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_132411-3ubu4oj0/logs/debug.log

Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_132411-3ubu4oj0/logs/debug-internal.log

Run summary:

train/loss	2.3873
train/learning_rate	1e-05
train/epoch	5.0
train/global_step	1295
_runtime	418
_timestamp	1644759069
_step	2
eval/loss	2.28563
eval/perplexity	9.70843
eval/runtime	4.7261
eval/samples_per_second	21.159
eval/steps_per_second	1.481
train/train_runtime	416.9974
train/train_samples_per_second	49.593
train/train_steps_per_second	3.106
train/total_flos	3260733386248800.0
train/train_loss	2.35955

Run history:

train/loss	▁
train/learning_rate	▁
train/epoch	▁▁█
train/global_step	▁▁█
_runtime	▁▁█
_timestamp	▁▁█
_step	▁▅█
eval/loss	▁
eval/perplexity	▁
eval/runtime	▁
eval/samples_per_second	▁
eval/steps_per_second	▁
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁
train/total_flos	▁
train/train_loss	▁

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced embedding-bert-base-uncased-domain-toys-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/3ubu4oj0

Code

#hide_output
train_language_model_embedding(
    ds=video_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    embedding_accessor=get_embedding_parameters_bert,
    dataset_name="domain-video",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

Tracking run with wandb version 0.10.33
Syncing run embedding-bert-base-uncased-domain-video-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/2v07t57q
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_133126-2v07t57q

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

[1305/1305 11:22, Epoch 5/5]

Step	Training Loss	Validation Loss	Perplexity
1000	2.390500	2.200149	8.946708

Waiting for W&B process to finish, PID 2637293
Program ended successfully.

Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_133126-2v07t57q/logs/debug.log

Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_133126-2v07t57q/logs/debug-internal.log

Run summary:

train/loss	2.3905
train/learning_rate	1e-05
train/epoch	5.0
train/global_step	1305
_runtime	684
_timestamp	1644759770
_step	2
eval/loss	2.20015
eval/perplexity	8.94671
eval/runtime	5.1369
eval/samples_per_second	19.467
eval/steps_per_second	1.363
train/train_runtime	683.0861
train/train_samples_per_second	30.553
train/train_steps_per_second	1.91
train/total_flos	5155178814403200.0
train/train_loss	2.36121

Run history:

train/loss	▁
train/learning_rate	▁
train/epoch	▁▁█
train/global_step	▁▁█
_runtime	▁▁█
_timestamp	▁▁█
_step	▁▅█
eval/loss	▁
eval/perplexity	▁
eval/runtime	▁
eval/samples_per_second	▁
eval/steps_per_second	▁
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁
train/total_flos	▁
train/train_loss	▁

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced embedding-bert-base-uncased-domain-video-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/2v07t57q

Experiment - BERT - Language Model Pretraining Overlay

This uses an overlay on the word embeddings to adjust them to the domain distribution. The overlay means that weight decay will act only on the adjustment instead of the pretrained model embedding.

Code

#hide_output
train_language_model(
    ds=electronics_ds,
    train_name="embedding-overlay",
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-electronics",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
    model_preparation=EmbeddingOverlay.update_model,
    save_preparation=EmbeddingOverlay.restore_model,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

Tracking run with wandb version 0.10.33
Syncing run embedding-overlay-bert-base-uncased-domain-electronics-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/ligwgdfe
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_214813-ligwgdfe

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

[2845/2845 17:53, Epoch 5/5]

Step	Training Loss	Validation Loss	Perplexity
1000	2.791700	2.655851	14.515713
2000	2.784700	2.461272	11.607203

Waiting for W&B process to finish, PID 2844382
Program ended successfully.

Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_214813-ligwgdfe/logs/debug.log

Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_214813-ligwgdfe/logs/debug-internal.log

Run summary:

train/loss	2.7847
train/learning_rate	2e-05
train/epoch	5.0
train/global_step	2845
_runtime	1075
_timestamp	1645049168
_step	4
eval/loss	2.46127
eval/perplexity	11.6072
eval/runtime	4.5188
eval/samples_per_second	22.13
eval/steps_per_second	1.549
train/train_runtime	1073.6786
train/train_samples_per_second	42.359
train/train_steps_per_second	2.65
train/total_flos	1.1680412853012192e+16
train/train_loss	2.7854

Run history:

train/loss	█▁
train/learning_rate	█▁
train/epoch	▁▁▅▅█
train/global_step	▁▁▅▅█
_runtime	▁▁▅▅█
_timestamp	▁▁▅▅█
_step	▁▃▅▆█
eval/loss	█▁
eval/perplexity	█▁
eval/runtime	█▁
eval/samples_per_second	▁█
eval/steps_per_second	▁█
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁
train/total_flos	▁
train/train_loss	▁

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 1 other file(s)

Synced embedding-overlay-bert-base-uncased-domain-electronics-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/ligwgdfe

Code

#hide_output
train_language_model(
    ds=kitchen_ds,
    train_name="embedding-overlay",
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-kitchen",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
    model_preparation=EmbeddingOverlay.update_model,
    save_preparation=EmbeddingOverlay.restore_model,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

Tracking run with wandb version 0.10.33
Syncing run embedding-overlay-bert-base-uncased-domain-kitchen-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/1l77hyg0
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_220718-1l77hyg0

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

[2265/2265 12:08, Epoch 5/5]

Step	Training Loss	Validation Loss	Perplexity
1000	2.751600	2.558401	12.753925
2000	2.722900	2.673736	14.132344

Waiting for W&B process to finish, PID 2845425
Program ended successfully.

Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_220718-1l77hyg0/logs/debug.log

Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_220718-1l77hyg0/logs/debug-internal.log

Run summary:

train/loss	2.7229
train/learning_rate	1e-05
train/epoch	5.0
train/global_step	2265
_runtime	730
_timestamp	1645049968
_step	4
eval/loss	2.67374
eval/perplexity	14.13234
eval/runtime	4.4861
eval/samples_per_second	22.291
eval/steps_per_second	1.56
train/train_runtime	728.6837
train/train_samples_per_second	49.665
train/train_steps_per_second	3.108
train/total_flos	7781888357593776.0
train/train_loss	2.73636

Run history:

train/loss	█▁
train/learning_rate	█▁
train/epoch	▁▁▇▇█
train/global_step	▁▁▇▇█
_runtime	▁▁▇▇█
_timestamp	▁▁▇▇█
_step	▁▃▅▆█
eval/loss	▁█
eval/perplexity	▁█
eval/runtime	█▁
eval/samples_per_second	▁█
eval/steps_per_second	▁▁
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁
train/total_flos	▁
train/train_loss	▁

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 1 other file(s)

Synced embedding-overlay-bert-base-uncased-domain-kitchen-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/1l77hyg0

Code

#hide_output
train_language_model(
    ds=music_ds,
    train_name="embedding-overlay",
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-music",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
    model_preparation=EmbeddingOverlay.update_model,
    save_preparation=EmbeddingOverlay.restore_model,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

Tracking run with wandb version 0.10.33
Syncing run embedding-overlay-bert-base-uncased-domain-music-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/1bc3396y
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_222004-1bc3396y

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

[1215/1215 09:27, Epoch 5/5]

Step	Training Loss	Validation Loss	Perplexity
1000	2.698700	2.667158	14.097964

Waiting for W&B process to finish, PID 2846045
Program ended successfully.

Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_222004-1bc3396y/logs/debug.log

Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_222004-1bc3396y/logs/debug-internal.log

Run summary:

train/loss	2.6987
train/learning_rate	1e-05
train/epoch	5.0
train/global_step	1215
_runtime	570
_timestamp	1645050574
_step	2
eval/loss	2.66716
eval/perplexity	14.09796
eval/runtime	5.3364
eval/samples_per_second	18.739
eval/steps_per_second	1.312
train/train_runtime	568.3681
train/train_samples_per_second	34.15
train/train_steps_per_second	2.138
train/total_flos	5823694456805376.0
train/train_loss	2.69958

Run history:

train/loss	▁
train/learning_rate	▁
train/epoch	▁▁█
train/global_step	▁▁█
_runtime	▁▁█
_timestamp	▁▁█
_step	▁▅█
eval/loss	▁
eval/perplexity	▁
eval/runtime	▁
eval/samples_per_second	▁
eval/steps_per_second	▁
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁
train/total_flos	▁
train/train_loss	▁

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 1 other file(s)

Synced embedding-overlay-bert-base-uncased-domain-music-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/1bc3396y

Code

#hide_output
train_language_model(
    ds=toys_ds,
    train_name="embedding-overlay",
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-toys",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
    model_preparation=EmbeddingOverlay.update_model,
    save_preparation=EmbeddingOverlay.restore_model,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

Tracking run with wandb version 0.10.33
Syncing run embedding-overlay-bert-base-uncased-domain-toys-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/3fsvsvap
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_222958-3fsvsvap

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

[1295/1295 06:31, Epoch 5/5]

Step	Training Loss	Validation Loss	Perplexity
1000	2.684900	2.626420	13.626650

Waiting for W&B process to finish, PID 2846570
Program ended successfully.

Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_222958-3fsvsvap/logs/debug.log

Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_222958-3fsvsvap/logs/debug-internal.log

Run summary:

train/loss	2.6849
train/learning_rate	1e-05
train/epoch	5.0
train/global_step	1295
_runtime	393
_timestamp	1645050991
_step	2
eval/loss	2.62642
eval/perplexity	13.62665
eval/runtime	4.6422
eval/samples_per_second	21.542
eval/steps_per_second	1.508
train/train_runtime	392.1879
train/train_samples_per_second	52.73
train/train_steps_per_second	3.302
train/total_flos	4152840255238752.0
train/train_loss	2.68775

Run history:

train/loss	▁
train/learning_rate	▁
train/epoch	▁▁█
train/global_step	▁▁█
_runtime	▁▁█
_timestamp	▁▁█
_step	▁▅█
eval/loss	▁
eval/perplexity	▁
eval/runtime	▁
eval/samples_per_second	▁
eval/steps_per_second	▁
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁
train/total_flos	▁
train/train_loss	▁

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 1 other file(s)

Synced embedding-overlay-bert-base-uncased-domain-toys-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/3fsvsvap

Code

#hide_output
train_language_model(
    ds=video_ds,
    train_name="embedding-overlay",
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-video",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
    model_preparation=EmbeddingOverlay.update_model,
    save_preparation=EmbeddingOverlay.restore_model,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade

Tracking run with wandb version 0.10.33
Syncing run embedding-overlay-bert-base-uncased-domain-video-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/1wwaq6od
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_223653-1wwaq6od

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"

[1305/1305 10:43, Epoch 5/5]

Step	Training Loss	Validation Loss	Perplexity
1000	2.609000	2.516007	12.213467

Waiting for W&B process to finish, PID 2846975
Program ended successfully.

Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_223653-1wwaq6od/logs/debug.log

Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_223653-1wwaq6od/logs/debug-internal.log

Run summary:

train/loss	2.609
train/learning_rate	1e-05
train/epoch	5.0
train/global_step	1305
_runtime	646
_timestamp	1645051659
_step	2
eval/loss	2.51601
eval/perplexity	12.21347
eval/runtime	5.1202
eval/samples_per_second	19.53
eval/steps_per_second	1.367
train/train_runtime	643.9198
train/train_samples_per_second	32.411
train/train_steps_per_second	2.027
train/total_flos	6565588647539328.0
train/train_loss	2.60432

Run history:

train/loss	▁
train/learning_rate	▁
train/epoch	▁▁█
train/global_step	▁▁█
_runtime	▁▁█
_timestamp	▁▁█
_step	▁▅█
eval/loss	▁
eval/perplexity	▁
eval/runtime	▁
eval/samples_per_second	▁
eval/steps_per_second	▁
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁
train/total_flos	▁
train/train_loss	▁

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 1 other file(s)

Synced embedding-overlay-bert-base-uncased-domain-video-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/1wwaq6od

Evaluation - BERT

Now that the models have been trained we can evaluate the different models against each dataset.

Code

#collapse
from typing import Dict, Union
import datasets

def cross_evaluation(
    ds: datasets.Dataset,
    model_name: str,
    domain: str,
    classifier_batch_size: int,
    domain_batch_size: int,
    lm_batch_size: int,
    epochs: int
) -> Dict[str, Union[str, float]]:
    specific_name = f"full-{model_name}-domain-{domain}-{domain_batch_size}bs-{epochs}e"
    full_name = f"full-{model_name}-general-{classifier_batch_size}bs-{epochs}e"
    no_embedding_name = f"no-embedding-{model_name}-general-{classifier_batch_size}bs-{epochs}e"
    embedding_name = f"embedding-{model_name}-domain-{domain}-{lm_batch_size}bs-{epochs}e"

    specific_results = evaluate_classifier(
        ds=ds,
        model_name=model_name,
        model=load_classifier_full(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=domain_batch_size,
            epochs=epochs,
        ),
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )
    specific_combined_results = evaluate_combined_classifier(
        ds=ds,
        model_name=model_name,
        base_model=load_classifier_full(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=domain_batch_size,
            epochs=epochs,
        ),
        embedding_model=load_language_model_embedding(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=lm_batch_size,
            epochs=epochs,
        ),
        embedding_accessor=get_embedding_parameters_bert,
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )
    specific_overlay_results = evaluate_combined_classifier(
        ds=ds,
        model_name=model_name,
        base_model=load_classifier_full(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=domain_batch_size,
            epochs=epochs,
        ),
        embedding_model=load_language_model_embedding_overlay(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=lm_batch_size,
            epochs=epochs,
        ),
        embedding_accessor=get_embedding_parameters_bert,
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )
    
    full_results = evaluate_classifier(
        ds=ds,
        model_name=model_name,
        model=load_classifier_full(
            model_name=model_name,
            dataset_name="general",
            data_folder=DATA_FOLDER,
            batch_size=classifier_batch_size,
            epochs=epochs,
        ),
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )
    full_combined_results = evaluate_combined_classifier(
        ds=ds,
        model_name=model_name,
        base_model=load_classifier_full(
            model_name=model_name,
            dataset_name="general",
            data_folder=DATA_FOLDER,
            batch_size=classifier_batch_size,
            epochs=epochs,
        ),
        embedding_model=load_language_model_embedding(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=lm_batch_size,
            epochs=epochs,
        ),
        embedding_accessor=get_embedding_parameters_bert,
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )
    full_overlay_results = evaluate_combined_classifier(
        ds=ds,
        model_name=model_name,
        base_model=load_classifier_full(
            model_name=model_name,
            dataset_name="general",
            data_folder=DATA_FOLDER,
            batch_size=classifier_batch_size,
            epochs=epochs,
        ),
        embedding_model=load_language_model_embedding_overlay(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=lm_batch_size,
            epochs=epochs,
        ),
        embedding_accessor=get_embedding_parameters_bert,
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )
    
    no_embedding_results = evaluate_classifier(
        ds=ds,
        model_name=model_name,
        model=load_classifier_base(
            model_name=model_name,
            dataset_name="general",
            data_folder=DATA_FOLDER,
            batch_size=classifier_batch_size,
            epochs=epochs,
        ),
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )
    base_combined_results = evaluate_combined_classifier(
        ds=ds,
        model_name=model_name,
        base_model=load_classifier_base(
            model_name=model_name,
            dataset_name="general",
            data_folder=DATA_FOLDER,
            batch_size=classifier_batch_size,
            epochs=epochs,
        ),
        embedding_model=load_language_model_embedding(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=lm_batch_size,
            epochs=epochs,
        ),
        embedding_accessor=get_embedding_parameters_bert,
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )
    base_overlay_results = evaluate_combined_classifier(
        ds=ds,
        model_name=model_name,
        base_model=load_classifier_base(
            model_name=model_name,
            dataset_name="general",
            data_folder=DATA_FOLDER,
            batch_size=classifier_batch_size,
            epochs=epochs,
        ),
        embedding_model=load_language_model_embedding_overlay(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=lm_batch_size,
            epochs=epochs,
        ),
        embedding_accessor=get_embedding_parameters_bert,
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )

    return {
        "domain": domain,
        "specific_accuracy": specific_results["eval_accuracy"],
        "specific_combined_accuracy": specific_combined_results["eval_accuracy"],
        "specific_overlay_accuracy": specific_overlay_results["eval_accuracy"],
        "full_accuracy": full_results["eval_accuracy"],
        "full_combined_accuracy": full_combined_results["eval_accuracy"],
        "full_overlay_accuracy": full_overlay_results["eval_accuracy"],
        "base_accuracy": no_embedding_results["eval_accuracy"],
        "base_combined_accuracy": base_combined_results["eval_accuracy"],
        "base_overlay_accuracy": base_overlay_results["eval_accuracy"],
    }

Code

base_model=load_classifier_base(
    model_name=MODEL_NAME,
    dataset_name="general",
    data_folder=DATA_FOLDER,
    batch_size=64,
    epochs=5,
)
embedding_model=load_language_model_embedding_overlay(
    model_name=MODEL_NAME,
    dataset_name=f"domain-electronics",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
)

Code

import torch
torch.all(torch.eq(base_model.bert.embeddings.word_embeddings.weight, embedding_model.bert.embeddings.word_embeddings.weight))

tensor(True)

Code

base_model

(BertForSequenceClassification(
   (bert): BertModel(
     (embeddings): BertEmbeddings(
       (word_embeddings): Embedding(30522, 768, padding_idx=0)
       (position_embeddings): Embedding(512, 768)
       (token_type_embeddings): Embedding(2, 768)
       (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
       (dropout): Dropout(p=0.1, inplace=False)
     )
     (encoder): BertEncoder(
       (layer): ModuleList(
         (0): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (1): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (2): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (3): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (4): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (5): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (6): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (7): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (8): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (9): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (10): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (11): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
       )
     )
     (pooler): BertPooler(
       (dense): Linear(in_features=768, out_features=768, bias=True)
       (activation): Tanh()
     )
   )
   (dropout): Dropout(p=0.1, inplace=False)
   (classifier): Linear(in_features=768, out_features=2, bias=True)
 ),)

Code

#hide_output
import pandas as pd

result_df = pd.DataFrame([
    cross_evaluation(
        ds=ds,
        model_name=MODEL_NAME,
        domain=domain,
        classifier_batch_size=64,
        domain_batch_size=16,
        lm_batch_size=16,
        epochs=5,
    )
    for ds, domain in [
        (electronics_ds, "electronics"),
        (kitchen_ds, "kitchen"),
        (music_ds, "music"),
        (toys_ds, "toys"),
        (video_ds, "video"),
    ]
])

PyTorch: setting up devices

[16/16 00:09]

PyTorch: setting up devices

[16/16 00:09]

PyTorch: setting up devices

[16/16 00:09]

PyTorch: setting up devices

[16/16 00:09]

PyTorch: setting up devices

[16/16 00:09]

PyTorch: setting up devices

[16/16 00:09]

PyTorch: setting up devices

[16/16 00:09]

PyTorch: setting up devices

[16/16 00:09]

PyTorch: setting up devices

[16/16 00:09]

PyTorch: setting up devices

[16/16 00:08]

PyTorch: setting up devices

[16/16 00:08]

PyTorch: setting up devices

[16/16 00:08]

PyTorch: setting up devices

[16/16 00:08]

PyTorch: setting up devices

[16/16 00:08]

PyTorch: setting up devices

[16/16 00:08]

PyTorch: setting up devices

[16/16 00:08]

PyTorch: setting up devices

[16/16 00:08]

PyTorch: setting up devices

[16/16 00:08]

PyTorch: setting up devices

[16/16 00:10]

PyTorch: setting up devices

[16/16 00:10]

PyTorch: setting up devices

[16/16 00:10]

PyTorch: setting up devices

[16/16 00:10]

PyTorch: setting up devices

[16/16 00:10]

PyTorch: setting up devices

[16/16 00:10]

PyTorch: setting up devices

[16/16 00:10]

PyTorch: setting up devices

[16/16 00:10]

PyTorch: setting up devices

[16/16 00:10]

PyTorch: setting up devices

[16/16 00:07]

PyTorch: setting up devices

[16/16 00:07]

PyTorch: setting up devices

[16/16 00:07]

PyTorch: setting up devices

[16/16 00:07]

PyTorch: setting up devices

[16/16 00:07]

PyTorch: setting up devices

[16/16 00:07]

PyTorch: setting up devices

[16/16 00:07]

PyTorch: setting up devices

[16/16 00:07]

PyTorch: setting up devices

[16/16 00:07]

PyTorch: setting up devices

[16/16 00:10]

PyTorch: setting up devices

[16/16 00:10]

PyTorch: setting up devices

[16/16 00:10]

PyTorch: setting up devices

[16/16 00:10]

PyTorch: setting up devices

[16/16 00:10]

PyTorch: setting up devices

[16/16 00:10]

PyTorch: setting up devices

[16/16 00:10]

PyTorch: setting up devices

[16/16 00:10]

PyTorch: setting up devices

[16/16 00:10]

Code

result_df

	domain	specific_accuracy	specific_combined_accuracy	specific_overlay_accuracy	full_accuracy	full_combined_accuracy	full_overlay_accuracy	base_accuracy	base_combined_accuracy	base_overlay_accuracy
0	electronics	0.958	0.952	0.957	0.781	0.795	0.787	0.768	0.774	0.768
1	kitchen	0.940	0.944	0.941	0.800	0.797	0.804	0.794	0.796	0.794
2	music	0.929	0.925	0.927	0.773	0.771	0.777	0.759	0.757	0.759
3	toys	0.910	0.904	0.908	0.828	0.833	0.827	0.836	0.837	0.836
4	video	0.944	0.949	0.947	0.745	0.722	0.733	0.737	0.721	0.737

Evaluation - BERT - Domain Specific Model

Here we review the domain specific model and evaluate how replacing the embedding layer affects performance.

	Domain	Domain Sentiment Model	with Replacement Embeddings
0	electronics	0.958	0.952
1	kitchen	0.94	0.944
2	music	0.929	0.925
3	toys	0.91	0.904
4	video	0.944	0.949

We can see that the domain specific model has high base accuracy. Replacing the embedding layer harms performance more often than it improves it. The difference when replacing the embeddings is not great.

Evaluation - BERT - General Model

Here we review the general sentiment model and evaluate how replacing the embedding layer affects performance.

	Domain	General Sentiment Model	with Replacement Embeddings
0	electronics	0.781	0.795
1	kitchen	0.8	0.797
2	music	0.773	0.771
3	toys	0.828	0.833
4	video	0.745	0.722

Replacing the embedding layer produces only a small change, and harms performance more often than improving it.

Evaluation - BERT - General Model with Frozen Embeddings

Here we review the general sentiment model that has been trained with a frozen embedding layer. The intent of this is to ensure that the retraining of the embeddings for the domain is consistent with how the sentiment classifier works with the base embedding.

	Domain	General Sentiment Model with Frozen Embeddings	with Replacement Embeddings
0	electronics	0.768	0.774
1	kitchen	0.794	0.796
2	music	0.759	0.757
3	toys	0.836	0.837
4	video	0.737	0.721

Once again the performance changes are marginal. It appears that this technique has not worked as I expected.

Conclusion - BERT

These results are not great. The use of the language model pretrained embedding layer does not result in a consistent improvement.

These results do show that the datasets significantly differ. I wonder if the amazon dataset could be considered sentiment?

One thing that I want to evaluate is the way that the domain specific embedding layer is trained. If I separate the adjustment to the weights then weight decay can apply strictly to the change to the embeddings. This would be interesting as it could then provide a concrete insight about which tokens have different meaning.