Domain Shift by Language Pre-Training of Embeddings

Can I shift the domain of a sentiment model by changing embeddings only?
Published

January 28, 2022

An NLP model performs a specific task. To do this it has expectations about the words and phrases used. This is why training a task specific model from a general language model is so good - the general language model has already learned the expectations, and then they can be used to perform the task.

This all becomes a problem when the expectations are wrong. If we consider science and cooking then the word chemical has very different meanings. In science it is merely a descriptive word, while describing food with the word chemical has strong negative connotations. A model that has expectations that are appropriate for scientific writing will not perform well when used on an article by a food critic.

The word chemical hasn’t changed it’s strict meaning - it still means a basic substance. Cooking can even be considered chemistry. Words are more than their dictionary definition though. The difference between the meanings is because Scientists and Cooks are in two different Domains (A sphere of activity, influence, or knowledge).

If we want to have a model that works well in one domain and then transfer it to another, we have to shift domains. This post is an exploration of that process.

Code
from pathlib import Path

PROJECT_NAME = "domain-shift"

DATA_FOLDER = Path("/data/blog/2022-01-28-domain-shift-by-embedding-replacement")
DATA_FOLDER.mkdir(parents=True, exist_ok=True)

Hypothesis

My idea is that a general purpose sentiment model that is trained without altering the embedding layer can be shifted to a specific domain by retraining only the embeddings of the original language model.

Experimental Setup

The task will be sentiment analysis. The general purpose sentiment model will be trained using the Sentiment 140 dataset (Go, Bhayani, and Huang 2009). This trained model will then be transformed into a domain specific model using the Multi-Domain Sentiment dataset {% cite blitzer-etal-2007-biographies %}. These datasets only have positive and negative sentiment text.

Go, Alec, Richa Bhayani, and Lei Huang. 2009. “Twitter Sentiment Classification Using Distant Supervision.” CS224N Project Report, Stanford 1 (12): 2009.

This experiment will evaluate several different models to see how they perform against the domain specific dataset with and without retraining.

Data Preparation

The datasets need to be restructured to have the text to classify and the target sentiment.

Data Preparation - Sentiment140

This has sentiment as an integer with values 0 (negative) and 4 (positive) along with several bits of metadata that are not interesting for this task.

Code
# from src/main/python/blog/domain_shift/data/sentiment140.py
from pathlib import Path

import pandas as pd


def load_sentiment140(path: Path) -> pd.DataFrame:
    df = pd.read_csv(
        path,
        names=["sentiment", "id", "date", "query", "user", "text"],
        encoding="ISO-8859-1",
    )
    df = df[["sentiment", "text"]].copy()

    # The sentiment column contains two values, 0 and 4.
    # There are 80,000 rows of each.
    # Example sentiment 0 row: my whole body feels itchy and like its on fire
    # Example sentiment 4 row: Happy 38th Birthday to my boo of alll time!!!
    df["sentiment"] = df.sentiment.map({0: "negative", 4: "positive"})

    return df
Code
#collapse
from pathlib import Path

GENERAL_DATASET = Path("/data/sentiment/sentiment140/sentiment140.zip")

general_df = load_sentiment140(GENERAL_DATASET)
general_df.to_parquet(
    "/data/sentiment/sentiment140/sentiment.gz.parquet",
    compression="gzip"
)

display(
    general_df.sentiment
        .value_counts()
        .to_frame()
)
general_df
sentiment
negative 800000
positive 800000
sentiment text
0 negative @switchfoot http://twitpic.com/2y1zl - Awww, t...
1 negative is upset that he can't update his Facebook by ...
2 negative @Kenichan I dived many times for the ball. Man...
3 negative my whole body feels itchy and like its on fire
4 negative @nationwideclass no, it's not behaving at all....
... ... ...
1599995 positive Just woke up. Having no school is the best fee...
1599996 positive TheWDB.com - Very cool to hear old Walt interv...
1599997 positive Are you ready for your MoJo Makeover? Ask me f...
1599998 positive Happy 38th Birthday to my boo of alll time!!! ...
1599999 positive happy #charitytuesday @theNSPCC @SparksCharity...

1600000 rows × 2 columns

Code
#collapse
import datasets

general_ds = (
    datasets.Dataset.from_pandas(general_df)
        .train_test_split(test_size=10_000)
)
general_ds.save_to_disk("/data/sentiment/sentiment140/sentiment.dataset")

Data Preparation - Multi-Domain Sentiment

This is encoded in an almost xml file that needs some preprocessing. The xml file mixes & which is illegal in XML with the correctly encoded &, there are characters that are out of range for the default pandas xml parser, and finally there is no root node for the file.

Once all of these have been fixed the dataset is quite rich with \(rating \in \{ 1, 2, 4, 5 \}\). I’m going to consider \(negative \in \{ 1, 2 \}\) and \(positive \in \{ 4, 5 \}\). I’m only taking the text of the review, not the title, and some of them have no text.

Finally the domains are not evenly distributed. To ensure that there is enough training and evaluation data the top 5 domains are being used.

Code
# from src/main/python/blog/domain_shift/data/multi_domain_sentiment.py
from pathlib import Path
from typing import Tuple, Union

import pandas as pd
from lxml import etree


def load_multi_domain_sentiment(folder: Path) -> pd.DataFrame:
    # Loading these files requires quite a lot of preprocessing. This is split
    # into individual cleaning methods which are composed in the load_file
    # method below. The last section reads all of the different reviews and
    # filters them to the top 5 by domain volume.

    def load_all_files(folder: Path) -> pd.DataFrame:
        files = sorted(folder.glob("*/all.review"))
        df = pd.concat([load_file(path) for path in files])
        df = filter_to_top_5_domains(df)
        return df

    def filter_to_top_5_domains(df: pd.DataFrame) -> pd.DataFrame:
        # The number of reviews for each domain vary, from a few hundred to over
        # ten thousand. To ensure that there is enough data to train a model we
        # will take the top 5 domains by volume.
        top_5_domains = df.domain.value_counts()[:5].index
        df = df[df.domain.isin(top_5_domains)]
        return df

    def load_file(path: Path) -> pd.DataFrame:
        df = read_file(path)
        df = split_helpful_column(df)
        df = clean_rating(df)
        df = clean_text(df)
        df = parse_date(df)
        df = drop_unrelated_columns(df)
        df = rating_to_sentiment(df)
        return df

    def read_file(path: Path) -> pd.DataFrame:
        # This reads the file from disk. There are three problems with the data
        # that need to be addressed before it can be loaded:

        # The data is stored in an xml-like structure where each data row is contained in a node.
        # There is no root node so the document is not valid xml.
        xml = path.read_text(encoding="ISO-8859-1")
        xml = f"<node>{xml}</node>"

        # Secondly the ampersand symbol is not consistently escaped, sometimes
        # appearing as &amp; and sometimes appearing as &.
        xml = xml.replace("&amp;", "&").replace("&", "&amp;")

        # Finally there are invalid characters in the document as the document
        # seems to lack a consistent encoding. It is possible that the xml like
        # structure is in ISO-8859-1 and the contents of each field are in
        # UTF-8?
        parser = etree.XMLParser(ns_clean=True, recover=True)
        tree = etree.fromstring(xml, parser=parser)
        xml = etree.tostring(tree, encoding="utf-8")

        # Pandas can load dataframes from xml. In order to get the character
        # re-encoding to work we must use the etree parser that we used to
        # reencode the xml, instead of the libxml parser (which is faster and
        # is the default).
        return pd.read_xml(xml, parser="etree")

    def split_helpful_column(df: pd.DataFrame) -> pd.DataFrame:
        # There is a "helpful" column which is a review of the review by other
        # users. A review that is more consistently marked as helpful may be
        # higher quality.

        # If this is present then it is a string of the form "N of M" where N
        # is the number of users that considered the review helpful. If no-one
        # has reviewed the column then this value is missing.

        def parse_helpful(value: Union[str, float]) -> Tuple[int, int]:
            if not isinstance(value, str):
                return (0, 0)
            helpful, total = value.split(" of ")
            return int(helpful), int(total)

        def get_helpful(row: Tuple[int, int]) -> int:
            return row[0]

        def get_unhelpful(row: Tuple[int, int]) -> int:
            helpful, total = row
            return total - helpful

        helpful_total = df.helpful.apply(parse_helpful)
        df["helpful"] = helpful_total.apply(get_helpful)
        df["unhelpful"] = helpful_total.apply(get_unhelpful)

        return df

    def clean_rating(df: pd.DataFrame) -> pd.DataFrame:
        # The rating column is the sentiment proxy for the text. Some rows are
        # missing a value for this, and so cannot be used. Once they have been
        # dropped the column can be converted from a float to an int.
        df = df.dropna(subset=["rating"]).copy()
        df["rating"] = df.rating.astype(int)
        return df

    def clean_text(df: pd.DataFrame) -> pd.DataFrame:
        # Some reviews only have a title and no text body. We are not
        # considering the title for this so any rows that are considered blank
        # or too short have to be dropped.
        df = df[df.review_text.str.len() > 10]
        return df

    def parse_date(df: pd.DataFrame) -> pd.DataFrame:
        df["date"] = pd.to_datetime(df.date)
        return df

    def drop_unrelated_columns(df: pd.DataFrame) -> pd.DataFrame:
        # The product_type column is the domain, the rating is the sentiment
        # and the review_text is the text.
        df = df[["product_type", "rating", "review_text"]].copy()
        df = df.rename(
            columns={
                "product_type": "domain",
                "review_text": "text",
            }
        )
        return df

    def rating_to_sentiment(df: pd.DataFrame) -> pd.DataFrame:
        # The rating contains values 1, 2, 4, and 5, being the number of stars
        # assigned to the review by the reviewer.
        df["sentiment"] = df.rating.map(
            {1: "negative", 2: "negative", 4: "positive", 5: "positive"}
        )
        df = df.drop(columns=["rating"])
        return df

    return load_all_files(folder)
Code
#collapse
from pathlib import Path

DOMAIN_DATASET_FOLDER = Path("/data/sentiment/multi-domain-sentiment/sorted_data")

domain_df = load_multi_domain_sentiment(DOMAIN_DATASET_FOLDER)
domain_df.to_parquet(
    "/data/sentiment/multi-domain-sentiment/sentiment-top-5.gz.parquet",
    compression="gzip"
)
display(
    domain_df[["domain", "sentiment"]]
        .value_counts()
        .to_frame()
        .rename(columns={0: "count"})
        .reset_index()
        .sort_values(by=["domain", "sentiment"], ascending=[True, True])
        .set_index(["domain", "sentiment"])
)
domain_df
count
domain sentiment
electronics negative 5048
positive 17959
kitchen & housewares negative 4119
positive 15737
music negative 2441
positive 14587
toys & games negative 2568
positive 10579
video negative 2587
positive 12764
domain text sentiment
0 electronics I have bought and returned three of these unit... negative
1 electronics I used a 25 pack of these doing DVD backups, a... negative
2 electronics I bought these discs at CompUSA because I need... negative
3 electronics The DVDs I burned successfully showed the movi... negative
4 electronics Please don't expect to get the cash back from ... negative
... ... ... ...
15346 video After watching this documentary, I was left th... positive
15347 video I finally made my first purchase from Amazon's... positive
15348 video Don't buy this disc unless you are a real Jack... negative
15349 video Oh my goodness, they've outlawed sex! That is ... positive
15350 video In this erotic science fiction film from the f... positive

88389 rows × 3 columns

Code
# from src/main/python/blog/domain_shift/data/balance_domain.py
import datasets
import pandas as pd


def make_product_dataset(domain_df: pd.DataFrame, domain: str) -> datasets.DatasetDict:
    """
    This creates a balanced dataset that is limited to the specified domain.
    """

    df = domain_df[domain_df.domain == domain]

    # sample the dataframe to balance the sentiment classes
    positive_df = df[df.sentiment == "positive"]
    negative_df = df[df.sentiment == "negative"]

    smaller_size = min(len(positive_df), len(negative_df))
    positive_df = positive_df.sample(n=smaller_size)
    negative_df = negative_df.sample(n=smaller_size)

    # recombine and shuffle
    df = pd.concat([positive_df, negative_df]).sample(frac=1)

    test_size = min(1_000, len(df) // 4)

    return datasets.Dataset.from_pandas(df).train_test_split(test_size=test_size)
Code
#collapse

electronics_ds = make_product_dataset(domain_df, "electronics")
electronics_ds.save_to_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-electronics.dataset"
)

kitchen_ds = make_product_dataset(domain_df, "kitchen & housewares")
kitchen_ds.save_to_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-kitchen.dataset"
)

music_ds = make_product_dataset(domain_df, "music")
music_ds.save_to_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-music.dataset"
)

toys_ds = make_product_dataset(domain_df, "toys & games")
toys_ds.save_to_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-toys.dataset"
)

video_ds = make_product_dataset(domain_df, "video")
video_ds.save_to_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-video.dataset"
)

Reload Data

This is more for me to make it easy to work with this notebook. By having this step I can run parts of this more easily.

Code
#collapse
import datasets

general_ds = datasets.load_from_disk(
    "/data/sentiment/sentiment140/sentiment.dataset"
)

electronics_ds = datasets.load_from_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-electronics.dataset"
)
kitchen_ds = datasets.load_from_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-kitchen.dataset"
)
music_ds = datasets.load_from_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-music.dataset"
)
toys_ds = datasets.load_from_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-toys.dataset"
)
video_ds = datasets.load_from_disk(
    "/data/sentiment/multi-domain-sentiment/sentiment-video.dataset"
)

Methods

Training the different models needs to be consistent and the easiest way to produce consistency is to use the same code. Here I am defining the different methods that are required to train and evaluate the models.

Methods - Training Functions

To consistently train the different models we have a set of methods:

  • train_classifier_full This trains a normal classifier on the dataset. The classifier can adjust any parameters in the entire model. This provides a baseline to measure against as this domain specific classifier should be the best achievable performance.

  • train_classifier_base This trains a classifier with a frozen embedding layer. The classifier can be adjusted to become domain specific by swapping out the embedding layer. This provides the base for the domain specific classifier.

  • train_language_model_embedding This trains an embedding by language model pretraining. The embedding layer is the only part of the model that can be adjusted. This can be swapped into the base classifier to make it domain specific.

  • get_embedding_parameters_bert This method returns all of the parameters in the model that form the embedding layer. The train_classifier_base and train_language_model_embedding methods use this to either freeze the embedding layer or freeze the model and unfreeze the embedding layer.

Code
# from src/main/python/blog/domain_shift/model/train_classifier.py
from pathlib import Path
from typing import Callable, Dict, List, Optional

import datasets
import torch
import wandb
from transformers import (
    AutoModel,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    Trainer,
    TrainingArguments,
)
from transformers.trainer_utils import EvalPrediction


def train_classifier_full(
    ds: datasets.Dataset,
    *,
    project_name: str,
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    metric: Callable[[EvalPrediction], Dict[str, float]],
    batch_size: int,
    epochs: float = 5,
    **settings,
) -> None:
    """
    This trains the classifier for the single purpose of classifying this
    dataset. The training process has full freedom to alter any and all
    parameters in the model. This should produce a model with the best
    performance possible.
    """
    train_classifier(
        ds=ds,
        train_name="full",
        project_name=project_name,
        model_name=model_name,
        dataset_name=dataset_name,
        data_folder=data_folder,
        metric=metric,
        batch_size=batch_size,
        epochs=epochs,
        **settings,
    )


def train_classifier_base(
    ds: datasets.Dataset,
    *,
    project_name: str,
    model_name: str,
    embedding_accessor: Callable[[AutoModel], List[torch.nn.Parameter]],
    dataset_name: str,
    data_folder: Path,
    metric: Callable[[EvalPrediction], Dict[str, float]],
    batch_size: int,
    epochs: float = 5,
    **settings,
) -> None:
    """
    This trains the classifier for the single purpose of classifying this
    dataset. The training process can alter any parameters except for the
    initial embedding layer. This should produce a model with good
    performance which is compatible with a retrained embedding layer.
    """

    def model_preparation(model: AutoModelForSequenceClassification) -> None:
        for parameter in embedding_accessor(model):
            parameter.requires_grad_(False)

    train_classifier(
        ds=ds,
        train_name="no-embedding",
        project_name=project_name,
        model_name=model_name,
        dataset_name=dataset_name,
        data_folder=data_folder,
        metric=metric,
        batch_size=batch_size,
        epochs=epochs,
        model_preparation=model_preparation,
        **settings,
    )


def train_classifier(
    ds: datasets.Dataset,
    *,
    train_name: str,
    project_name: str,
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    metric: Callable[[EvalPrediction], Dict[str, float]],
    batch_size: int,
    epochs: float = 5,
    model_preparation: Optional[
        Callable[[AutoModelForSequenceClassification], None]
    ] = None,
    **settings,
) -> None:
    """
    This trains the classifier for the single purpose of classifying this
    dataset. The model_preparation function, if provided, can alter the model
    to freeze or alter layers as appropriate.
    """

    # Set default values for training, which can be overridden with the settings
    training_arguments = {
        "per_device_train_batch_size": batch_size,
        "per_device_eval_batch_size": batch_size,
        "num_train_epochs": epochs,
        "learning_rate": 5e-5,
        "warmup_ratio": 0.06,
        "logging_steps": 1_000,
        "save_steps": 1_000,
        "eval_steps": 1_000,
        "metric_for_best_model": "accuracy",
        "greater_is_better": True,
    } | settings

    run_name = f"{train_name}-{model_name}-{dataset_name}-{batch_size}bs-{epochs}e"
    model_run_folder = data_folder / "runs" / run_name
    model_run_folder.mkdir(parents=True, exist_ok=True)
    best_model_folder = data_folder / "best-model" / run_name
    best_model_folder.mkdir(parents=True, exist_ok=True)

    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    if model_preparation is not None:
        model_preparation(model)
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    with wandb.init(
        project=project_name,
        name=run_name,
        mode="online",
    ):
        training_args = TrainingArguments(
            report_to=["wandb"],
            output_dir=model_run_folder / "output",
            logging_dir=model_run_folder / "output",
            overwrite_output_dir=True,
            evaluation_strategy="steps",
            load_best_model_at_end=True,
            **training_arguments,
        )

        trainer = Trainer(
            model=model,
            args=training_args,
            train_dataset=ds["train"],
            eval_dataset=ds["test"],
            tokenizer=tokenizer,
            compute_metrics=metric,
        )

        trainer.train()

    model.save_pretrained(best_model_folder)



# from src/main/python/blog/domain_shift/model/train_language_model.py
from pathlib import Path
from typing import Callable, Dict, List, Optional

import datasets
import torch
import wandb
from transformers import (
    AutoModel,
    AutoModelForMaskedLM,
    AutoTokenizer,
    DataCollatorForLanguageModeling,
    Trainer,
    TrainingArguments,
)
from transformers.trainer_utils import EvalPrediction


def train_language_model_embedding(
    ds: datasets.Dataset,
    *,
    project_name: str,
    model_name: str,
    embedding_accessor: Callable[[AutoModel], List[torch.nn.Parameter]],
    dataset_name: str,
    data_folder: Path,
    batch_size: int,
    epochs: float = 5,
    metric: Optional[Callable[[EvalPrediction], Dict[str, float]]] = None,
    **settings,
) -> None:
    """
    This trains the embedding layer of the language model using language model
    pretraining. This involves adjusting the model to better match the domain
    specific language use.
    """

    def model_preparation(model: AutoModelForMaskedLM) -> None:
        # disable gradient updates on the model
        model.requires_grad_(False)
        # enable gradient updates on the embedding
        for parameter in embedding_accessor(model):
            parameter.requires_grad_(True)

    train_language_model(
        ds=ds,
        train_name="embedding",
        project_name=project_name,
        model_name=model_name,
        dataset_name=dataset_name,
        data_folder=data_folder,
        batch_size=batch_size,
        epochs=epochs,
        metric=metric,
        model_preparation=model_preparation,
        **settings,
    )


def train_language_model(
    ds: datasets.Dataset,
    *,
    train_name: str,
    project_name: str,
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    batch_size: int,
    epochs: float = 5,
    metric: Optional[Callable[[EvalPrediction], Dict[str, float]]] = None,
    model_preparation: Optional[Callable[[AutoModelForMaskedLM], None]] = None,
    save_preparation: Optional[Callable[[AutoModelForMaskedLM], None]] = None,
    **settings,
) -> None:
    # Set default values for training, which can be overridden with the settings
    training_arguments = {
        "per_device_train_batch_size": batch_size,
        "per_device_eval_batch_size": batch_size,
        "num_train_epochs": epochs,
        "learning_rate": 5e-5,
        "warmup_ratio": 0.06,
        "logging_steps": 1_000,
        "save_steps": 1_000,
        "eval_steps": 1_000,
        "metric_for_best_model": "loss",
        "greater_is_better": False,
    } | settings

    run_name = f"{train_name}-{model_name}-{dataset_name}-{batch_size}bs-{epochs}e"
    model_run_folder = data_folder / "runs" / run_name
    model_run_folder.mkdir(parents=True, exist_ok=True)
    best_model_folder = data_folder / "best-model" / run_name
    best_model_folder.mkdir(parents=True, exist_ok=True)

    model = AutoModelForMaskedLM.from_pretrained(model_name)
    if model_preparation is not None:
        model_preparation(model)
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    non_text_columns = set(ds["train"].column_names) - set(["input_ids"])
    ds = ds.remove_columns(non_text_columns)

    # there is a problem running the evaluation over more than 100 rows
    test_ds = datasets.Dataset.from_dict(ds["test"][:100])

    with wandb.init(
        project=project_name,
        name=run_name,
        mode="online",
    ):
        training_args = TrainingArguments(
            report_to=["wandb"],
            output_dir=model_run_folder / "output",
            logging_dir=model_run_folder / "output",
            overwrite_output_dir=True,
            evaluation_strategy="steps",
            load_best_model_at_end=True,
            **training_arguments,
        )

        trainer = Trainer(
            model=model,
            args=training_args,
            train_dataset=ds["train"],
            eval_dataset=test_ds,
            data_collator=DataCollatorForLanguageModeling(
                tokenizer=tokenizer,
                mlm=True,
            ),
            tokenizer=tokenizer,
            compute_metrics=metric,
        )

        trainer.train()

    if save_preparation is not None:
        save_preparation(model)
    model.save_pretrained(best_model_folder)



# from src/main/python/blog/domain_shift/model/embedding.py
from typing import List

import torch
from transformers import BertModel


def get_embedding_parameters_bert(model: BertModel) -> List[torch.nn.Parameter]:
    # Given a classification model, base_model returns the core bert model
    # without the classification head. Given the core bert model, base_model
    # returns the core bert model again! This means this approach works with
    # any kind of bert model.
    return list(model.base_model.embeddings.parameters())

Methods - Evaluation

After training the model we need a way to evaluate it.

  • evaluate_classifier This evaluates a classification model without altering it.

  • evaluate_combined_classifier This evaluates a classification model made from a base model combined with the embedding layer of a pretrained language model.

Code
# from src/main/python/blog/domain_shift/model/evaluate.py
from pathlib import Path
from typing import Callable, Dict, List, Optional

import datasets
import torch
from transformers import (
    AutoModel,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    Trainer,
    TrainingArguments,
)
from transformers.trainer_utils import EvalPrediction


@torch.no_grad()
def evaluate_classifier(
    ds: datasets.Dataset,
    *,
    model_name: str,
    model: AutoModelForSequenceClassification,
    batch_size: int,
    data_folder: Path,
    metric: Optional[Callable[[EvalPrediction], Dict[str, float]]] = None,
) -> Dict[str, float]:
    model_run_folder = data_folder / "evaluation"
    model_run_folder.mkdir(parents=True, exist_ok=True)

    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model.eval()

    training_args = TrainingArguments(
        report_to=[],
        output_dir=model_run_folder / "output",
        logging_dir=model_run_folder / "output",
        overwrite_output_dir=True,
        num_train_epochs=1,
        per_device_eval_batch_size=batch_size,
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=ds["train"],
        eval_dataset=ds["test"],
        tokenizer=tokenizer,
        compute_metrics=metric,
    )

    return trainer.evaluate()


@torch.no_grad()
def evaluate_combined_classifier(
    ds: datasets.Dataset,
    *,
    model_name: str,
    base_model: AutoModelForSequenceClassification,
    embedding_model: AutoModel,
    embedding_accessor: Callable[[AutoModel], List[torch.nn.Parameter]],
    batch_size: int,
    data_folder: Path,
    metric: Optional[Callable[[EvalPrediction], Dict[str, float]]] = None,
) -> Dict[str, float]:
    model_run_folder = data_folder / "evaluation"
    model_run_folder.mkdir(parents=True, exist_ok=True)

    tokenizer = AutoTokenizer.from_pretrained(model_name)
    for model_parameter, embedding_parameter in zip(
        embedding_accessor(base_model),
        embedding_accessor(embedding_model),
    ):
        model_parameter.data = embedding_parameter.data
    base_model.eval()

    training_args = TrainingArguments(
        report_to=[],
        output_dir=model_run_folder / "output",
        logging_dir=model_run_folder / "output",
        overwrite_output_dir=True,
        num_train_epochs=1,
        per_device_eval_batch_size=batch_size,
    )

    trainer = Trainer(
        model=base_model,
        args=training_args,
        train_dataset=ds["train"],
        eval_dataset=ds["test"],
        tokenizer=tokenizer,
        compute_metrics=metric,
    )

    return trainer.evaluate()

To load the models we also have:

  • load_classifier_full This loads a classifier model created by train_classifier_full.

  • load_classifier_base This loads a classifier model created by train_classifier_base.

  • load_language_model_embedding This loads a language model created by train_language_model_embedding.

Code
# from src/main/python/blog/domain_shift/model/load.py
from pathlib import Path

from transformers import AutoModelForMaskedLM, AutoModelForSequenceClassification


def load_classifier_full(
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    batch_size: int,
    epochs: float = 5,
) -> AutoModelForSequenceClassification:
    return load_classifier(
        train_name="full",
        model_name=model_name,
        dataset_name=dataset_name,
        data_folder=data_folder,
        batch_size=batch_size,
        epochs=epochs,
    )


def load_classifier_base(
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    batch_size: int,
    epochs: float = 5,
) -> AutoModelForSequenceClassification:
    return load_classifier(
        train_name="no-embedding",
        model_name=model_name,
        dataset_name=dataset_name,
        data_folder=data_folder,
        batch_size=batch_size,
        epochs=epochs,
    )


def load_classifier(
    *,
    train_name: str,
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    batch_size: int,
    epochs: float = 5,
) -> AutoModelForSequenceClassification:
    run_name = f"{train_name}-{model_name}-{dataset_name}-{batch_size}bs-{epochs}e"
    best_model_folder = data_folder / "best-model" / run_name
    return AutoModelForSequenceClassification.from_pretrained(best_model_folder)


def load_language_model_embedding(
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    batch_size: int,
    epochs: float = 5,
) -> AutoModelForSequenceClassification:
    return load_language_model(
        train_name="embedding",
        model_name=model_name,
        dataset_name=dataset_name,
        data_folder=data_folder,
        batch_size=batch_size,
        epochs=epochs,
    )


def load_language_model_embedding_overlay(
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    batch_size: int,
    epochs: float = 5,
) -> AutoModelForSequenceClassification:
    return load_language_model(
        train_name="embedding-overlay",
        model_name=model_name,
        dataset_name=dataset_name,
        data_folder=data_folder,
        batch_size=batch_size,
        epochs=epochs,
    )


def load_language_model(
    *,
    train_name: str,
    model_name: str,
    dataset_name: str,
    data_folder: Path,
    batch_size: int,
    epochs: float = 5,
) -> AutoModelForMaskedLM:
    run_name = f"{train_name}-{model_name}-{dataset_name}-{batch_size}bs-{epochs}e"
    best_model_folder = data_folder / "best-model" / run_name
    return AutoModelForMaskedLM.from_pretrained(best_model_folder)



# from src/main/python/blog/domain_shift/model/layer/embedding_overlay.py
import torch
from torch import nn
from transformers.models.bert.modeling_bert import BertEmbeddings, BertModel


class EmbeddingOverlay(nn.Module):
    def __init__(self, embedding: BertEmbeddings, device: str) -> None:
        super().__init__()
        self.embedding = embedding
        self.base = embedding.word_embeddings.weight
        self.overlay = torch.zeros_like(self.base, device=device)
        embedding.word_embeddings.weight = nn.Parameter(
            torch.zeros_like(self.base, device=device)
        )

    @classmethod
    def update_model(cls, model: BertModel) -> None:
        embedding = cls(
            model.base_model.embeddings,
            device="cuda" if torch.cuda.is_available() else "cpu",
        )
        model.base_model.embeddings = embedding
        model.requires_grad_(False)
        embedding.overlay.requires_grad_(True)

    @staticmethod
    def restore_model(model: BertModel) -> None:
        embedding = model.base_model.embeddings.to_embedding()
        model.base_model.embeddings = embedding

    def forward(self, *args, **kwargs) -> torch.Tensor:
        return self.to_embedding().forward(*args, **kwargs)

    def to_embedding(self) -> BertEmbeddings:
        self.embedding.word_embeddings.weight = nn.Parameter(self.base + self.overlay)
        return self.embedding

Experiments

First we need to define the code to train and evaluate the models, then we can run the experiments.

Code
import blog.transformers_logging

Experiment - Define Metrics

We need a way to measure the performance of the model. Since this is a two class problem accuracy is a sufficient metric.

Code
# from src/main/python/blog/metrics/accuracy.py
from typing import Dict

from sklearn.metrics import accuracy_score
from transformers.trainer_utils import EvalPrediction


def metric_accuracy(model_output: EvalPrediction) -> Dict[str, float]:
    predictions = model_output.predictions.argmax(axis=1)
    targets = model_output.label_ids
    accuracy = accuracy_score(targets, predictions)
    return {"accuracy": accuracy}



# from src/main/python/blog/metrics/perplexity.py
import torch
import torch.nn.functional as F
from transformers.trainer_utils import EvalPrediction


def metric_perplexity_bert(model_output: EvalPrediction, vocab_size: int = 30_522):
    # This loss calculation comes directly from the BERT forward method
    labels = torch.tensor(model_output.label_ids)
    lm_logits = torch.tensor(model_output.predictions)

    loss = F.cross_entropy(lm_logits.view(-1, vocab_size), labels.view(-1))

    perplexity = torch.exp(loss)
    return {"perplexity": perplexity.item()}


def metric_perplexity_gpt2(model_output: EvalPrediction):
    # This loss calculation comes directly from the GPT2 forward method
    # that handles correctly offsetting the labels to match the positions that are predicting

    labels = torch.tensor(model_output.label_ids)
    lm_logits = torch.tensor(model_output.predictions)

    # Shift so that tokens < n predict n
    shift_logits = lm_logits[..., :-1, :].contiguous()
    shift_labels = labels[..., 1:].contiguous()
    # Flatten the tokens
    loss = F.cross_entropy(
        shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1)
    )

    perplexity = torch.exp(loss)
    return {"perplexity": perplexity.item()}

For the language model pretraining we need a perplexity measure.

Experiment - BERT

This is going to evaluate BERT for this task.

The first stage will be to evaluate a pure classifier trained on the general dataset and on each domain dataset. This will establish a baseline.

Then the base sentiment model will be trained which will have a frozen embedding layer. After that the masked language model can be pretrained with the domain specific text, only training the embedding layer. I can also try restricting that to having weight decay only over the alteration to the base embeddings.

Code
MODEL_NAME = "bert-base-uncased"

Experiment - BERT - Encode Datasets

The raw datasets need to be encoded using the BERT tokenizer.

Code
#collapse
#hide_output
from typing import Any, Dict
from transformers import BertTokenizerFast

tokenizer = BertTokenizerFast.from_pretrained(MODEL_NAME)

sentiment_index = {
    "negative": 0,
    "positive": 1,
}

def encode(row: Dict[str, Any]) -> Dict[str, Any]:
    return {
        "input_ids": tokenizer(row["text"], truncation=True).input_ids,
        "label": sentiment_index[row["sentiment"]],
    }

general_ds = general_ds.map(encode)
electronics_ds = electronics_ds.map(encode)
kitchen_ds = kitchen_ds.map(encode)
music_ds = music_ds.map(encode)
toys_ds = toys_ds.map(encode)
video_ds = video_ds.map(encode)

Experiment - BERT - Establish Baselines

This trains a separate model for each task to provide a baseline for comparison.

Code
#hide_output
train_classifier_full(
    ds=general_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="general",
    data_folder=DATA_FOLDER,
    metric=metric_accuracy,
    batch_size=64,
    epochs=5,
)
wandb: Currently logged in as: brandwatch-ml (use `wandb login --relogin` to force relogin)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.33
Syncing run full-bert-base-uncased-general-64bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/eyn03xlb
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220212_193000-eyn03xlb

Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[124220/124220 7:43:16, Epoch 5/5]
Step Training Loss Validation Loss Accuracy
1000 0.513200 0.403615 0.824300
2000 0.386900 0.369579 0.838800
3000 0.362800 0.352850 0.849500
4000 0.353500 0.352404 0.850100
5000 0.351500 0.345889 0.853700
6000 0.349400 0.340666 0.854200
7000 0.345300 0.344296 0.854400
8000 0.340700 0.352917 0.850300
9000 0.341600 0.341545 0.852500
10000 0.336900 0.332418 0.858500
11000 0.337300 0.334889 0.857700
12000 0.328600 0.343519 0.853100
13000 0.330900 0.335796 0.862200
14000 0.323400 0.325259 0.861300
15000 0.327800 0.337436 0.855200
16000 0.325600 0.340725 0.861600
17000 0.324700 0.329951 0.866400
18000 0.324500 0.316455 0.868000
19000 0.317800 0.314791 0.867000
20000 0.317200 0.312795 0.868500
21000 0.321100 0.312556 0.868100
22000 0.318400 0.315015 0.868300
23000 0.319700 0.313522 0.867100
24000 0.315900 0.311461 0.869200
25000 0.311100 0.322506 0.873000
26000 0.277000 0.310173 0.870000
27000 0.280600 0.319567 0.867000
28000 0.277400 0.314650 0.870900
29000 0.278500 0.311275 0.870000
30000 0.278100 0.318782 0.871400
31000 0.277300 0.306667 0.870800
32000 0.276500 0.309154 0.870300
33000 0.277400 0.320674 0.868100
34000 0.279300 0.318263 0.872000
35000 0.282800 0.307125 0.873200
36000 0.277600 0.320166 0.873300
37000 0.278900 0.309924 0.870800
38000 0.280500 0.312997 0.870400
39000 0.275800 0.317793 0.868100
40000 0.280100 0.305172 0.871300
41000 0.282300 0.312909 0.871100
42000 0.282300 0.305909 0.873700
43000 0.277500 0.310162 0.869600
44000 0.277100 0.313971 0.869600
45000 0.275900 0.319665 0.872500
46000 0.277900 0.318283 0.868900
47000 0.281000 0.313041 0.872800
48000 0.276200 0.300781 0.877500
49000 0.281300 0.307026 0.875800
50000 0.259300 0.325227 0.871600
51000 0.221100 0.325990 0.871500
52000 0.223000 0.337404 0.874800
53000 0.215900 0.340205 0.872400
54000 0.217100 0.325959 0.873400
55000 0.219700 0.329393 0.872000
56000 0.221900 0.325739 0.871300
57000 0.224400 0.327636 0.870800
58000 0.222300 0.341080 0.872400
59000 0.226100 0.332482 0.869100
60000 0.223500 0.323489 0.874300
61000 0.226700 0.316245 0.872000
62000 0.222300 0.325050 0.875100
63000 0.226700 0.325248 0.874400
64000 0.225100 0.322689 0.871700
65000 0.225300 0.321286 0.871200
66000 0.224800 0.334543 0.870800
67000 0.226700 0.326299 0.869100
68000 0.224900 0.329174 0.870800
69000 0.221800 0.323690 0.871900
70000 0.225800 0.318156 0.872000
71000 0.224500 0.326445 0.873700
72000 0.226000 0.328679 0.869900
73000 0.224500 0.316885 0.876000
74000 0.222900 0.316419 0.873000
75000 0.196700 0.378959 0.870200
76000 0.159600 0.375573 0.873600
77000 0.155600 0.379308 0.872300
78000 0.159400 0.365117 0.872600
79000 0.159700 0.387099 0.870200
80000 0.163400 0.381868 0.868100
81000 0.163000 0.369293 0.871300
82000 0.161500 0.361120 0.866600
83000 0.161300 0.381293 0.869200
84000 0.161400 0.381637 0.869200
85000 0.163700 0.378771 0.867900
86000 0.165200 0.372763 0.868500
87000 0.164500 0.372205 0.869500
88000 0.164000 0.387928 0.869300
89000 0.163700 0.366503 0.871200
90000 0.165600 0.371311 0.870200
91000 0.165700 0.368546 0.870400
92000 0.161900 0.390258 0.866100
93000 0.160700 0.373525 0.868200
94000 0.162000 0.359105 0.869300
95000 0.164800 0.380203 0.868400
96000 0.161400 0.366745 0.871500
97000 0.160300 0.379058 0.873300
98000 0.163600 0.377647 0.869500
99000 0.162400 0.376305 0.872400
100000 0.133200 0.441574 0.870600
101000 0.112200 0.450519 0.868300
102000 0.111800 0.461471 0.867600
103000 0.111000 0.453476 0.868500
104000 0.112100 0.464585 0.870000
105000 0.108200 0.461612 0.869700
106000 0.115700 0.450644 0.870300
107000 0.112700 0.463014 0.868500
108000 0.112600 0.454774 0.870100
109000 0.113800 0.455071 0.869800
110000 0.113400 0.474106 0.866800
111000 0.113100 0.450578 0.868300
112000 0.113700 0.453555 0.868100
113000 0.109200 0.453771 0.869300
114000 0.114500 0.449492 0.869100
115000 0.109200 0.450538 0.867900
116000 0.110700 0.463325 0.870000
117000 0.109900 0.464494 0.869500
118000 0.104600 0.462927 0.867900
119000 0.109000 0.455559 0.870600
120000 0.112200 0.450070 0.869400
121000 0.109800 0.447597 0.869500
122000 0.109600 0.451558 0.869200
123000 0.109600 0.451785 0.869400
124000 0.108400 0.454002 0.869700


Waiting for W&B process to finish, PID 2581287
Program ended successfully.
Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220212_193000-eyn03xlb/logs/debug.log
Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220212_193000-eyn03xlb/logs/debug-internal.log

Run summary:


train/loss0.1084
train/learning_rate0.0
train/epoch5.0
train/global_step124220
_runtime27799
_timestamp1644721999
_step248
eval/loss0.454
eval/accuracy0.8697
eval/runtime10.5481
eval/samples_per_second948.04
eval/steps_per_second14.884
train/train_runtime27796.6567
train/train_samples_per_second286.006
train/train_steps_per_second4.469
train/total_flos2.0687274460977792e+17
train/train_loss0.2234

Run history:


train/loss█▅▅▅▅▅▅▅▄▄▄▄▄▄▄▄▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁
train/learning_rate▂▅████▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▁▁▁
train/epoch▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/global_step▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_runtime▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_timestamp▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_step▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
eval/loss▅▃▃▂▂▃▂▂▁▁▁▁▂▁▂▁▂▂▂▂▂▂▂▂▄▅▄▄▄▅▄▄▇██▇▇█▇█
eval/accuracy▁▄▅▆▆▆▇▇▇▇▇▇▇█▇█▇▇▇█▇▇▇█▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
eval/runtime▁▅▆▆▇▆▇▇▇▇▇▇▇▇▇▇▇▇█▇██▇▇█▇█▇█▇███▇▇█▇██▇
eval/samples_per_second█▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▂▂▁▂▁▂▁▂▁▁▁▂▂▁▁▁▁▂
eval/steps_per_second█▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▂▂▁▂▁▂▁▂▁▁▁▂▂▁▁▁▁▂
train/train_runtime
train/train_samples_per_second
train/train_steps_per_second
train/total_flos
train/train_loss

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced full-bert-base-uncased-general-64bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/eyn03xlb
Code
#hide_output
train_classifier_full(
    ds=electronics_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-electronics",
    data_folder=DATA_FOLDER,
    metric=metric_accuracy,
    batch_size=16,
    epochs=5,
)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.33
Syncing run full-bert-base-uncased-domain-electronics-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/2cqwf0qp
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_033833-2cqwf0qp

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[2845/2845 20:24, Epoch 5/5]
Step Training Loss Validation Loss Accuracy
1000 0.258500 0.168365 0.950000
2000 0.068000 0.226074 0.958000


Waiting for W&B process to finish, PID 2608181
Program ended successfully.
Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_033833-2cqwf0qp/logs/debug.log
Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_033833-2cqwf0qp/logs/debug-internal.log

Run summary:


train/loss0.068
train/learning_rate2e-05
train/epoch5.0
train/global_step2845
_runtime1226
_timestamp1644724739
_step4
eval/loss0.22607
eval/accuracy0.958
eval/runtime8.4727
eval/samples_per_second118.026
eval/steps_per_second7.436
train/train_runtime1225.0684
train/train_samples_per_second37.124
train/train_steps_per_second2.322
train/total_flos9167977279601760.0
train/train_loss0.1197

Run history:


train/loss█▁
train/learning_rate█▁
train/epoch▁▁▅▅█
train/global_step▁▁▅▅█
_runtime▁▁▅▅█
_timestamp▁▁▅▅█
_step▁▃▅▆█
eval/loss▁█
eval/accuracy▁█
eval/runtime▁█
eval/samples_per_second█▁
eval/steps_per_second█▁
train/train_runtime
train/train_samples_per_second
train/train_steps_per_second
train/total_flos
train/train_loss

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced full-bert-base-uncased-domain-electronics-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/2cqwf0qp
Code
#hide_output
train_classifier_full(
    ds=kitchen_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-kitchen",
    data_folder=DATA_FOLDER,
    metric=metric_accuracy,
    batch_size=16,
    epochs=5,
)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.33
Syncing run full-bert-base-uncased-domain-kitchen-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/22i0d0ij
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_035925-22i0d0ij

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[2265/2265 14:05, Epoch 5/5]
Step Training Loss Validation Loss Accuracy
1000 0.208400 0.314208 0.928000
2000 0.040300 0.361787 0.940000


Waiting for W&B process to finish, PID 2609298
Program ended successfully.
Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_035925-22i0d0ij/logs/debug.log
Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_035925-22i0d0ij/logs/debug-internal.log

Run summary:


train/loss0.0403
train/learning_rate1e-05
train/epoch5.0
train/global_step2265
_runtime846
_timestamp1644725611
_step4
eval/loss0.36179
eval/accuracy0.94
eval/runtime7.5391
eval/samples_per_second132.642
eval/steps_per_second8.356
train/train_runtime845.3493
train/train_samples_per_second42.811
train/train_steps_per_second2.679
train/total_flos6108018316871280.0
train/train_loss0.11069

Run history:


train/loss█▁
train/learning_rate█▁
train/epoch▁▁▇▇█
train/global_step▁▁▇▇█
_runtime▁▁▆▇█
_timestamp▁▁▆▇█
_step▁▃▅▆█
eval/loss▁█
eval/accuracy▁█
eval/runtime█▁
eval/samples_per_second▁█
eval/steps_per_second▁█
train/train_runtime
train/train_samples_per_second
train/train_steps_per_second
train/total_flos
train/train_loss

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced full-bert-base-uncased-domain-kitchen-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/22i0d0ij
Code
#hide_output
train_classifier_full(
    ds=music_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-music",
    data_folder=DATA_FOLDER,
    metric=metric_accuracy,
    batch_size=16,
    epochs=5,
)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.33
Syncing run full-bert-base-uncased-domain-music-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/2ynufcz4
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_041355-2ynufcz4

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[1215/1215 10:40, Epoch 5/5]
Step Training Loss Validation Loss Accuracy
1000 0.165200 0.389923 0.929000


Waiting for W&B process to finish, PID 2610061
Program ended successfully.
Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_041355-2ynufcz4/logs/debug.log
Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_041355-2ynufcz4/logs/debug-internal.log

Run summary:


train/loss0.1652
train/learning_rate1e-05
train/epoch5.0
train/global_step1215
_runtime642
_timestamp1644726277
_step2
eval/loss0.38992
eval/accuracy0.929
eval/runtime11.275
eval/samples_per_second88.692
eval/steps_per_second5.588
train/train_runtime641.3766
train/train_samples_per_second30.263
train/train_steps_per_second1.894
train/total_flos4571028364769280.0
train/train_loss0.1399

Run history:


train/loss
train/learning_rate
train/epoch▁▁█
train/global_step▁▁█
_runtime▁▂█
_timestamp▁▂█
_step▁▅█
eval/loss
eval/accuracy
eval/runtime
eval/samples_per_second
eval/steps_per_second
train/train_runtime
train/train_samples_per_second
train/train_steps_per_second
train/total_flos
train/train_loss

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced full-bert-base-uncased-domain-music-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/2ynufcz4
Code
#hide_output
train_classifier_full(
    ds=toys_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-toys",
    data_folder=DATA_FOLDER,
    metric=metric_accuracy,
    batch_size=16,
    epochs=5,
)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.33
Syncing run full-bert-base-uncased-domain-toys-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/16f10f6f
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_042455-16f10f6f

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[1295/1295 07:28, Epoch 5/5]
Step Training Loss Validation Loss Accuracy
1000 0.178000 0.477703 0.910000


Waiting for W&B process to finish, PID 2610605
Program ended successfully.
Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_042455-16f10f6f/logs/debug.log
Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_042455-16f10f6f/logs/debug-internal.log

Run summary:


train/loss0.178
train/learning_rate1e-05
train/epoch5.0
train/global_step1295
_runtime450
_timestamp1644726745
_step2
eval/loss0.4777
eval/accuracy0.91
eval/runtime7.1618
eval/samples_per_second139.631
eval/steps_per_second8.797
train/train_runtime449.3178
train/train_samples_per_second46.025
train/train_steps_per_second2.882
train/total_flos3259571864878560.0
train/train_loss0.14495

Run history:


train/loss
train/learning_rate
train/epoch▁▁█
train/global_step▁▁█
_runtime▁▁█
_timestamp▁▁█
_step▁▅█
eval/loss
eval/accuracy
eval/runtime
eval/samples_per_second
eval/steps_per_second
train/train_runtime
train/train_samples_per_second
train/train_steps_per_second
train/total_flos
train/train_loss

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced full-bert-base-uncased-domain-toys-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/16f10f6f
Code
#hide_output
train_classifier_full(
    ds=video_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-video",
    data_folder=DATA_FOLDER,
    metric=metric_accuracy,
    batch_size=16,
    epochs=5,
)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.33
Syncing run full-bert-base-uncased-domain-video-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/iouz3zx9
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_043243-iouz3zx9

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[1305/1305 11:59, Epoch 5/5]
Step Training Loss Validation Loss Accuracy
1000 0.143700 0.310546 0.944000


Waiting for W&B process to finish, PID 2611053
Program ended successfully.
Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_043243-iouz3zx9/logs/debug.log
Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_043243-iouz3zx9/logs/debug-internal.log

Run summary:


train/loss0.1437
train/learning_rate1e-05
train/epoch5.0
train/global_step1305
_runtime721
_timestamp1644727485
_step2
eval/loss0.31055
eval/accuracy0.944
eval/runtime11.8683
eval/samples_per_second84.258
eval/steps_per_second5.308
train/train_runtime720.419
train/train_samples_per_second28.969
train/train_steps_per_second1.811
train/total_flos5153342463603840.0
train/train_loss0.11579

Run history:


train/loss
train/learning_rate
train/epoch▁▁█
train/global_step▁▁█
_runtime▁▁█
_timestamp▁▁█
_step▁▅█
eval/loss
eval/accuracy
eval/runtime
eval/samples_per_second
eval/steps_per_second
train/train_runtime
train/train_samples_per_second
train/train_steps_per_second
train/total_flos
train/train_loss

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced full-bert-base-uncased-domain-video-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/iouz3zx9

Experiment - BERT - Train Fixed Embedding Base

This trains a model for general sentiment classification with a frozen embedding layer. The resulting model will be used as the base for the domain adjusted models.

Code
#hide_output
train_classifier_base(
    ds=general_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    embedding_accessor=get_embedding_parameters_bert,
    dataset_name="general",
    data_folder=DATA_FOLDER,
    metric=metric_accuracy,
    batch_size=64,
    epochs=5,
)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.33
Syncing run no-embedding-bert-base-uncased-general-64bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/1c4qi8r3
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_044503-1c4qi8r3

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[124220/124220 7:29:53, Epoch 5/5]
Step Training Loss Validation Loss Accuracy
1000 0.511300 0.406170 0.822400
2000 0.388700 0.371869 0.837000
3000 0.364600 0.354540 0.848000
4000 0.355000 0.356526 0.849200
5000 0.353200 0.346191 0.852400
6000 0.350800 0.341020 0.855100
7000 0.345700 0.343301 0.853700
8000 0.341700 0.352299 0.853900
9000 0.341200 0.339812 0.855800
10000 0.337600 0.333934 0.858800
11000 0.337800 0.333922 0.858500
12000 0.328800 0.339086 0.856300
13000 0.331700 0.329456 0.863300
14000 0.322400 0.325021 0.860800
15000 0.326400 0.341783 0.855300
16000 0.326400 0.338934 0.862700
17000 0.323400 0.331051 0.866300
18000 0.325000 0.319770 0.865600
19000 0.315500 0.309761 0.870200
20000 0.318200 0.311332 0.869200
21000 0.320600 0.314540 0.866800
22000 0.317600 0.313203 0.869900
23000 0.319400 0.311689 0.869800
24000 0.315400 0.306948 0.870800
25000 0.310800 0.316544 0.871000
26000 0.280500 0.305228 0.873900
27000 0.284600 0.315119 0.869700
28000 0.282100 0.316112 0.871200
29000 0.281500 0.311524 0.870400
30000 0.280600 0.318128 0.872500
31000 0.280600 0.310752 0.871400
32000 0.278800 0.309930 0.873400
33000 0.281400 0.318493 0.872800
34000 0.283400 0.321827 0.872100
35000 0.286400 0.307348 0.872800
36000 0.280200 0.323695 0.870700
37000 0.282700 0.306045 0.871800
38000 0.284300 0.306637 0.869500
39000 0.279700 0.313403 0.869800
40000 0.282500 0.301105 0.873000
41000 0.283600 0.306601 0.874800
42000 0.285000 0.307430 0.873400
43000 0.279200 0.305863 0.871100
44000 0.277800 0.307858 0.871100
45000 0.279200 0.306109 0.871500
46000 0.280300 0.312159 0.870600
47000 0.282800 0.310434 0.875600
48000 0.281400 0.302623 0.878500
49000 0.282900 0.302531 0.880000
50000 0.264000 0.309589 0.873400
51000 0.229000 0.321463 0.873500
52000 0.230600 0.341251 0.871700
53000 0.227100 0.330379 0.872000
54000 0.227500 0.321092 0.871200
55000 0.229700 0.319525 0.873400
56000 0.231000 0.317524 0.872500
57000 0.234000 0.315405 0.871000
58000 0.230900 0.317642 0.871800
59000 0.233800 0.321154 0.872600
60000 0.231900 0.318218 0.874600
61000 0.233800 0.316618 0.875300
62000 0.232700 0.325526 0.875100
63000 0.235600 0.320183 0.873000
64000 0.235500 0.319495 0.871600
65000 0.234800 0.314187 0.873500
66000 0.232800 0.325829 0.873500
67000 0.234400 0.311478 0.871800
68000 0.232200 0.317831 0.873600
69000 0.232300 0.311483 0.874300
70000 0.235600 0.305324 0.877200
71000 0.234200 0.315522 0.872300
72000 0.232900 0.319899 0.868900
73000 0.234100 0.308927 0.874900
74000 0.230300 0.308656 0.872400
75000 0.208900 0.360322 0.872700
76000 0.175900 0.357876 0.870900
77000 0.172800 0.349695 0.872500
78000 0.174800 0.351596 0.874000
79000 0.173700 0.359032 0.873700
80000 0.177600 0.355052 0.872600
81000 0.177800 0.353396 0.869900
82000 0.177200 0.346814 0.870100
83000 0.174200 0.354160 0.874200
84000 0.177300 0.355552 0.868400
85000 0.180100 0.355578 0.869500
86000 0.178800 0.358164 0.869500
87000 0.177000 0.355007 0.871100
88000 0.180300 0.348180 0.872800
89000 0.176000 0.346942 0.869900
90000 0.179300 0.344493 0.871400
91000 0.181100 0.344836 0.870600
92000 0.180300 0.353251 0.872400
93000 0.175700 0.354549 0.870000
94000 0.176000 0.353133 0.870100
95000 0.179900 0.374774 0.868700
96000 0.173200 0.355910 0.870800
97000 0.176100 0.361503 0.872800
98000 0.176800 0.356372 0.869200
99000 0.176300 0.360096 0.874800
100000 0.148100 0.418238 0.870900
101000 0.129100 0.416172 0.870400
102000 0.131000 0.419957 0.868900
103000 0.129700 0.409167 0.869400
104000 0.127100 0.430586 0.872300
105000 0.126700 0.428001 0.867700
106000 0.130300 0.415673 0.868400
107000 0.132300 0.414793 0.868600
108000 0.128100 0.438585 0.869700
109000 0.129900 0.424922 0.869000
110000 0.128400 0.429560 0.868200
111000 0.129300 0.420508 0.869400
112000 0.128200 0.426265 0.869900
113000 0.126300 0.419459 0.868500
114000 0.130100 0.412851 0.869100
115000 0.127900 0.417601 0.869100
116000 0.129400 0.421481 0.869300
117000 0.126300 0.432261 0.868900
118000 0.122100 0.426714 0.868600
119000 0.127300 0.419353 0.869800
120000 0.127800 0.418641 0.868600
121000 0.123400 0.421648 0.867300
122000 0.127500 0.419938 0.867600
123000 0.125900 0.421461 0.867100
124000 0.123900 0.422832 0.867600

wandb: Network error (ReadTimeout), entering retry loop.
wandb: Network error resolved after 0:00:42.418510, resuming normal operation.

Waiting for W&B process to finish, PID 2611682
Program ended successfully.
Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_044503-1c4qi8r3/logs/debug.log
Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_044503-1c4qi8r3/logs/debug-internal.log

Run summary:


train/loss0.1239
train/learning_rate0.0
train/epoch5.0
train/global_step124220
_runtime26994
_timestamp1644754497
_step248
eval/loss0.42283
eval/accuracy0.8676
eval/runtime10.5191
eval/samples_per_second950.653
eval/steps_per_second14.925
train/train_runtime26993.4931
train/train_samples_per_second294.515
train/train_steps_per_second4.602
train/total_flos2.0687274460977792e+17
train/train_loss0.2321

Run history:


train/loss█▅▅▅▅▅▄▅▄▄▄▄▄▄▄▄▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁
train/learning_rate▂▅████▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▁▁▁
train/epoch▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/global_step▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_runtime▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_timestamp▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_step▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
eval/loss▆▄▃▃▂▃▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▁▁▁▄▄▄▄▃▄▅▄▇▇█▇▇█▇▇
eval/accuracy▁▄▅▆▆▆▇▇▇▇▇▇▇▇▇█▇▇▇█▇▇██▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
eval/runtime▄▃▃▂▃▂▃▂▂▂▂▃▄▄▂▃▅▅▆█▇▆▅▆▅▃▄▆▄▄▂▁▄▃▃▂▁▁▄▄
eval/samples_per_second▅▆▆▇▆▇▆▇▇▇▇▆▅▅▇▆▄▄▃▁▂▃▄▃▄▆▅▃▅▅▇█▅▆▆▇██▅▅
eval/steps_per_second▅▆▆▇▆▇▆▇▇▇▇▆▅▅▇▆▄▄▃▁▂▃▄▃▄▆▅▃▅▅▇█▅▆▆▇██▅▅
train/train_runtime
train/train_samples_per_second
train/train_steps_per_second
train/total_flos
train/train_loss

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced no-embedding-bert-base-uncased-general-64bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/1c4qi8r3

Since this is the model that will be adjusted we can check that the evaluate method produces consistent results. The best evaluation was this one:

Step Training Loss Validation Loss Accuracy
49000 0.282900 0.302531 0.880000
Code
evaluate_classifier(
    ds=general_ds,
    model_name=MODEL_NAME,
    model=load_classifier_base(
        model_name=MODEL_NAME,
        dataset_name="general",
        data_folder=DATA_FOLDER,
        batch_size=64,
        epochs=5,
    ),
    batch_size=64,
    data_folder=DATA_FOLDER,
    metric=metric_accuracy,
)
PyTorch: setting up devices
[157/157 00:09]
{'eval_loss': 0.30253100395202637,
 'eval_accuracy': 0.88,
 'eval_runtime': 9.1462,
 'eval_samples_per_second': 1093.35,
 'eval_steps_per_second': 17.166}

We haven’t run the training here so the training loss metric is not produced. The validation loss and accuracy match, accounting for rounding, so I am satisfied that the evaluation code works.

Experiment - BERT - Language Model Pretraining

This adjust the embeddings for a model to match the domain distribution.

Code
#hide_output
train_language_model_embedding(
    ds=electronics_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    embedding_accessor=get_embedding_parameters_bert,
    dataset_name="domain-electronics",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.33
Syncing run embedding-bert-base-uncased-domain-electronics-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/17z6ree0
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_124044-17z6ree0

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[2845/2845 19:10, Epoch 5/5]
Step Training Loss Validation Loss Perplexity
1000 2.525900 2.186024 9.104887
2000 2.339300 2.050233 7.756434


Waiting for W&B process to finish, PID 2634489
Program ended successfully.
Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_124044-17z6ree0/logs/debug.log
Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_124044-17z6ree0/logs/debug-internal.log

Run summary:


train/loss2.3393
train/learning_rate2e-05
train/epoch5.0
train/global_step2845
_runtime1152
_timestamp1644757196
_step4
eval/loss2.05023
eval/perplexity7.75643
eval/runtime4.6223
eval/samples_per_second21.634
eval/steps_per_second1.514
train/train_runtime1151.0397
train/train_samples_per_second39.512
train/train_steps_per_second2.472
train/total_flos9171244212184800.0
train/train_loss2.39209

Run history:


train/loss█▁
train/learning_rate█▁
train/epoch▁▁▅▅█
train/global_step▁▁▅▅█
_runtime▁▁▅▅█
_timestamp▁▁▅▅█
_step▁▃▅▆█
eval/loss█▁
eval/perplexity█▁
eval/runtime█▁
eval/samples_per_second▁█
eval/steps_per_second▁█
train/train_runtime
train/train_samples_per_second
train/train_steps_per_second
train/total_flos
train/train_loss

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced embedding-bert-base-uncased-domain-electronics-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/17z6ree0
Code
#hide_output
train_language_model_embedding(
    ds=kitchen_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    embedding_accessor=get_embedding_parameters_bert,
    dataset_name="domain-kitchen",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.33
Syncing run embedding-bert-base-uncased-domain-kitchen-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/3clte4rs
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_130021-3clte4rs

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[2265/2265 13:03, Epoch 5/5]
Step Training Loss Validation Loss Perplexity
1000 2.462700 2.134396 8.371171
2000 2.259400 2.161706 8.692562


Waiting for W&B process to finish, PID 2635617
Program ended successfully.
Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_130021-3clte4rs/logs/debug.log
Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_130021-3clte4rs/logs/debug-internal.log

Run summary:


train/loss2.2594
train/learning_rate1e-05
train/epoch5.0
train/global_step2265
_runtime785
_timestamp1644758006
_step4
eval/loss2.16171
eval/perplexity8.69256
eval/runtime4.6087
eval/samples_per_second21.698
eval/steps_per_second1.519
train/train_runtime784.1761
train/train_samples_per_second46.15
train/train_steps_per_second2.888
train/total_flos6110194858484400.0
train/train_loss2.34641

Run history:


train/loss█▁
train/learning_rate█▁
train/epoch▁▁▇▇█
train/global_step▁▁▇▇█
_runtime▁▁▆▇█
_timestamp▁▁▆▇█
_step▁▃▅▆█
eval/loss▁█
eval/perplexity▁█
eval/runtime█▁
eval/samples_per_second▁▁
eval/steps_per_second▁▁
train/train_runtime
train/train_samples_per_second
train/train_steps_per_second
train/total_flos
train/train_loss

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced embedding-bert-base-uncased-domain-kitchen-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/3clte4rs
Code
#hide_output
train_language_model_embedding(
    ds=music_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    embedding_accessor=get_embedding_parameters_bert,
    dataset_name="domain-music",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.33
Syncing run embedding-bert-base-uncased-domain-music-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/2g7nseso
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_131348-2g7nseso

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[1215/1215 10:03, Epoch 5/5]
Step Training Loss Validation Loss Perplexity
1000 2.479700 2.364102 10.497963


Waiting for W&B process to finish, PID 2636367
Program ended successfully.
Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_131348-2g7nseso/logs/debug.log
Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_131348-2g7nseso/logs/debug-internal.log

Run summary:


train/loss2.4797
train/learning_rate1e-05
train/epoch5.0
train/global_step1215
_runtime605
_timestamp1644758633
_step2
eval/loss2.3641
eval/perplexity10.49796
eval/runtime5.052
eval/samples_per_second19.794
eval/steps_per_second1.386
train/train_runtime604.3056
train/train_samples_per_second32.12
train/train_steps_per_second2.011
train/total_flos4572657212774400.0
train/train_loss2.46521

Run history:


train/loss
train/learning_rate
train/epoch▁▁█
train/global_step▁▁█
_runtime▁▁█
_timestamp▁▁█
_step▁▅█
eval/loss
eval/perplexity
eval/runtime
eval/samples_per_second
eval/steps_per_second
train/train_runtime
train/train_samples_per_second
train/train_steps_per_second
train/total_flos
train/train_loss

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced embedding-bert-base-uncased-domain-music-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/2g7nseso
Code
#hide_output
train_language_model_embedding(
    ds=toys_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    embedding_accessor=get_embedding_parameters_bert,
    dataset_name="domain-toys",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.33
Syncing run embedding-bert-base-uncased-domain-toys-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/3ubu4oj0
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_132411-3ubu4oj0

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[1295/1295 06:56, Epoch 5/5]
Step Training Loss Validation Loss Perplexity
1000 2.387300 2.285632 9.708434


Waiting for W&B process to finish, PID 2636916
Program ended successfully.
Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_132411-3ubu4oj0/logs/debug.log
Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_132411-3ubu4oj0/logs/debug-internal.log

Run summary:


train/loss2.3873
train/learning_rate1e-05
train/epoch5.0
train/global_step1295
_runtime418
_timestamp1644759069
_step2
eval/loss2.28563
eval/perplexity9.70843
eval/runtime4.7261
eval/samples_per_second21.159
eval/steps_per_second1.481
train/train_runtime416.9974
train/train_samples_per_second49.593
train/train_steps_per_second3.106
train/total_flos3260733386248800.0
train/train_loss2.35955

Run history:


train/loss
train/learning_rate
train/epoch▁▁█
train/global_step▁▁█
_runtime▁▁█
_timestamp▁▁█
_step▁▅█
eval/loss
eval/perplexity
eval/runtime
eval/samples_per_second
eval/steps_per_second
train/train_runtime
train/train_samples_per_second
train/train_steps_per_second
train/total_flos
train/train_loss

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced embedding-bert-base-uncased-domain-toys-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/3ubu4oj0
Code
#hide_output
train_language_model_embedding(
    ds=video_ds,
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    embedding_accessor=get_embedding_parameters_bert,
    dataset_name="domain-video",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.33
Syncing run embedding-bert-base-uncased-domain-video-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/2v07t57q
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_133126-2v07t57q

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[1305/1305 11:22, Epoch 5/5]
Step Training Loss Validation Loss Perplexity
1000 2.390500 2.200149 8.946708


Waiting for W&B process to finish, PID 2637293
Program ended successfully.
Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_133126-2v07t57q/logs/debug.log
Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220213_133126-2v07t57q/logs/debug-internal.log

Run summary:


train/loss2.3905
train/learning_rate1e-05
train/epoch5.0
train/global_step1305
_runtime684
_timestamp1644759770
_step2
eval/loss2.20015
eval/perplexity8.94671
eval/runtime5.1369
eval/samples_per_second19.467
eval/steps_per_second1.363
train/train_runtime683.0861
train/train_samples_per_second30.553
train/train_steps_per_second1.91
train/total_flos5155178814403200.0
train/train_loss2.36121

Run history:


train/loss
train/learning_rate
train/epoch▁▁█
train/global_step▁▁█
_runtime▁▁█
_timestamp▁▁█
_step▁▅█
eval/loss
eval/perplexity
eval/runtime
eval/samples_per_second
eval/steps_per_second
train/train_runtime
train/train_samples_per_second
train/train_steps_per_second
train/total_flos
train/train_loss

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 2 other file(s)

Synced embedding-bert-base-uncased-domain-video-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/2v07t57q

Experiment - BERT - Language Model Pretraining Overlay

This uses an overlay on the word embeddings to adjust them to the domain distribution. The overlay means that weight decay will act only on the adjustment instead of the pretrained model embedding.

Code
#hide_output
train_language_model(
    ds=electronics_ds,
    train_name="embedding-overlay",
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-electronics",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
    model_preparation=EmbeddingOverlay.update_model,
    save_preparation=EmbeddingOverlay.restore_model,
)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.33
Syncing run embedding-overlay-bert-base-uncased-domain-electronics-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/ligwgdfe
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_214813-ligwgdfe

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[2845/2845 17:53, Epoch 5/5]
Step Training Loss Validation Loss Perplexity
1000 2.791700 2.655851 14.515713
2000 2.784700 2.461272 11.607203


Waiting for W&B process to finish, PID 2844382
Program ended successfully.
Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_214813-ligwgdfe/logs/debug.log
Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_214813-ligwgdfe/logs/debug-internal.log

Run summary:


train/loss2.7847
train/learning_rate2e-05
train/epoch5.0
train/global_step2845
_runtime1075
_timestamp1645049168
_step4
eval/loss2.46127
eval/perplexity11.6072
eval/runtime4.5188
eval/samples_per_second22.13
eval/steps_per_second1.549
train/train_runtime1073.6786
train/train_samples_per_second42.359
train/train_steps_per_second2.65
train/total_flos1.1680412853012192e+16
train/train_loss2.7854

Run history:


train/loss█▁
train/learning_rate█▁
train/epoch▁▁▅▅█
train/global_step▁▁▅▅█
_runtime▁▁▅▅█
_timestamp▁▁▅▅█
_step▁▃▅▆█
eval/loss█▁
eval/perplexity█▁
eval/runtime█▁
eval/samples_per_second▁█
eval/steps_per_second▁█
train/train_runtime
train/train_samples_per_second
train/train_steps_per_second
train/total_flos
train/train_loss

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 1 other file(s)

Synced embedding-overlay-bert-base-uncased-domain-electronics-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/ligwgdfe
Code
#hide_output
train_language_model(
    ds=kitchen_ds,
    train_name="embedding-overlay",
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-kitchen",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
    model_preparation=EmbeddingOverlay.update_model,
    save_preparation=EmbeddingOverlay.restore_model,
)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.33
Syncing run embedding-overlay-bert-base-uncased-domain-kitchen-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/1l77hyg0
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_220718-1l77hyg0

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[2265/2265 12:08, Epoch 5/5]
Step Training Loss Validation Loss Perplexity
1000 2.751600 2.558401 12.753925
2000 2.722900 2.673736 14.132344


Waiting for W&B process to finish, PID 2845425
Program ended successfully.
Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_220718-1l77hyg0/logs/debug.log
Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_220718-1l77hyg0/logs/debug-internal.log

Run summary:


train/loss2.7229
train/learning_rate1e-05
train/epoch5.0
train/global_step2265
_runtime730
_timestamp1645049968
_step4
eval/loss2.67374
eval/perplexity14.13234
eval/runtime4.4861
eval/samples_per_second22.291
eval/steps_per_second1.56
train/train_runtime728.6837
train/train_samples_per_second49.665
train/train_steps_per_second3.108
train/total_flos7781888357593776.0
train/train_loss2.73636

Run history:


train/loss█▁
train/learning_rate█▁
train/epoch▁▁▇▇█
train/global_step▁▁▇▇█
_runtime▁▁▇▇█
_timestamp▁▁▇▇█
_step▁▃▅▆█
eval/loss▁█
eval/perplexity▁█
eval/runtime█▁
eval/samples_per_second▁█
eval/steps_per_second▁▁
train/train_runtime
train/train_samples_per_second
train/train_steps_per_second
train/total_flos
train/train_loss

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 1 other file(s)

Synced embedding-overlay-bert-base-uncased-domain-kitchen-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/1l77hyg0
Code
#hide_output
train_language_model(
    ds=music_ds,
    train_name="embedding-overlay",
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-music",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
    model_preparation=EmbeddingOverlay.update_model,
    save_preparation=EmbeddingOverlay.restore_model,
)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.33
Syncing run embedding-overlay-bert-base-uncased-domain-music-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/1bc3396y
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_222004-1bc3396y

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[1215/1215 09:27, Epoch 5/5]
Step Training Loss Validation Loss Perplexity
1000 2.698700 2.667158 14.097964


Waiting for W&B process to finish, PID 2846045
Program ended successfully.
Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_222004-1bc3396y/logs/debug.log
Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_222004-1bc3396y/logs/debug-internal.log

Run summary:


train/loss2.6987
train/learning_rate1e-05
train/epoch5.0
train/global_step1215
_runtime570
_timestamp1645050574
_step2
eval/loss2.66716
eval/perplexity14.09796
eval/runtime5.3364
eval/samples_per_second18.739
eval/steps_per_second1.312
train/train_runtime568.3681
train/train_samples_per_second34.15
train/train_steps_per_second2.138
train/total_flos5823694456805376.0
train/train_loss2.69958

Run history:


train/loss
train/learning_rate
train/epoch▁▁█
train/global_step▁▁█
_runtime▁▁█
_timestamp▁▁█
_step▁▅█
eval/loss
eval/perplexity
eval/runtime
eval/samples_per_second
eval/steps_per_second
train/train_runtime
train/train_samples_per_second
train/train_steps_per_second
train/total_flos
train/train_loss

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 1 other file(s)

Synced embedding-overlay-bert-base-uncased-domain-music-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/1bc3396y
Code
#hide_output
train_language_model(
    ds=toys_ds,
    train_name="embedding-overlay",
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-toys",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
    model_preparation=EmbeddingOverlay.update_model,
    save_preparation=EmbeddingOverlay.restore_model,
)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.33
Syncing run embedding-overlay-bert-base-uncased-domain-toys-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/3fsvsvap
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_222958-3fsvsvap

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[1295/1295 06:31, Epoch 5/5]
Step Training Loss Validation Loss Perplexity
1000 2.684900 2.626420 13.626650


Waiting for W&B process to finish, PID 2846570
Program ended successfully.
Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_222958-3fsvsvap/logs/debug.log
Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_222958-3fsvsvap/logs/debug-internal.log

Run summary:


train/loss2.6849
train/learning_rate1e-05
train/epoch5.0
train/global_step1295
_runtime393
_timestamp1645050991
_step2
eval/loss2.62642
eval/perplexity13.62665
eval/runtime4.6422
eval/samples_per_second21.542
eval/steps_per_second1.508
train/train_runtime392.1879
train/train_samples_per_second52.73
train/train_steps_per_second3.302
train/total_flos4152840255238752.0
train/train_loss2.68775

Run history:


train/loss
train/learning_rate
train/epoch▁▁█
train/global_step▁▁█
_runtime▁▁█
_timestamp▁▁█
_step▁▅█
eval/loss
eval/perplexity
eval/runtime
eval/samples_per_second
eval/steps_per_second
train/train_runtime
train/train_samples_per_second
train/train_steps_per_second
train/total_flos
train/train_loss

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 1 other file(s)

Synced embedding-overlay-bert-base-uncased-domain-toys-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/3fsvsvap
Code
#hide_output
train_language_model(
    ds=video_ds,
    train_name="embedding-overlay",
    project_name=PROJECT_NAME,
    model_name=MODEL_NAME,
    dataset_name="domain-video",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
    metric=metric_perplexity_bert,
    model_preparation=EmbeddingOverlay.update_model,
    save_preparation=EmbeddingOverlay.restore_model,
)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
wandb: wandb version 0.12.10 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
Tracking run with wandb version 0.10.33
Syncing run embedding-overlay-bert-base-uncased-domain-video-16bs-5e to Weights & Biases (Documentation).
Project page: https://wandb.ai/brandwatch-ml/domain-shift
Run page: https://wandb.ai/brandwatch-ml/domain-shift/runs/1wwaq6od
Run data is saved locally in /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_223653-1wwaq6od

PyTorch: setting up devices
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[1305/1305 10:43, Epoch 5/5]
Step Training Loss Validation Loss Perplexity
1000 2.609000 2.516007 12.213467


Waiting for W&B process to finish, PID 2846975
Program ended successfully.
Find user logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_223653-1wwaq6od/logs/debug.log
Find internal logs for this run at: /home/matthew/Programming/Blog/blog/_notebooks/wandb/run-20220216_223653-1wwaq6od/logs/debug-internal.log

Run summary:


train/loss2.609
train/learning_rate1e-05
train/epoch5.0
train/global_step1305
_runtime646
_timestamp1645051659
_step2
eval/loss2.51601
eval/perplexity12.21347
eval/runtime5.1202
eval/samples_per_second19.53
eval/steps_per_second1.367
train/train_runtime643.9198
train/train_samples_per_second32.411
train/train_steps_per_second2.027
train/total_flos6565588647539328.0
train/train_loss2.60432

Run history:


train/loss
train/learning_rate
train/epoch▁▁█
train/global_step▁▁█
_runtime▁▁█
_timestamp▁▁█
_step▁▅█
eval/loss
eval/perplexity
eval/runtime
eval/samples_per_second
eval/steps_per_second
train/train_runtime
train/train_samples_per_second
train/train_steps_per_second
train/total_flos
train/train_loss

Synced 6 W&B file(s), 1 media file(s), 0 artifact file(s) and 1 other file(s)

Synced embedding-overlay-bert-base-uncased-domain-video-16bs-5e: https://wandb.ai/brandwatch-ml/domain-shift/runs/1wwaq6od

Evaluation - BERT

Now that the models have been trained we can evaluate the different models against each dataset.

Code
#collapse
from typing import Dict, Union
import datasets

def cross_evaluation(
    ds: datasets.Dataset,
    model_name: str,
    domain: str,
    classifier_batch_size: int,
    domain_batch_size: int,
    lm_batch_size: int,
    epochs: int
) -> Dict[str, Union[str, float]]:
    specific_name = f"full-{model_name}-domain-{domain}-{domain_batch_size}bs-{epochs}e"
    full_name = f"full-{model_name}-general-{classifier_batch_size}bs-{epochs}e"
    no_embedding_name = f"no-embedding-{model_name}-general-{classifier_batch_size}bs-{epochs}e"
    embedding_name = f"embedding-{model_name}-domain-{domain}-{lm_batch_size}bs-{epochs}e"

    specific_results = evaluate_classifier(
        ds=ds,
        model_name=model_name,
        model=load_classifier_full(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=domain_batch_size,
            epochs=epochs,
        ),
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )
    specific_combined_results = evaluate_combined_classifier(
        ds=ds,
        model_name=model_name,
        base_model=load_classifier_full(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=domain_batch_size,
            epochs=epochs,
        ),
        embedding_model=load_language_model_embedding(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=lm_batch_size,
            epochs=epochs,
        ),
        embedding_accessor=get_embedding_parameters_bert,
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )
    specific_overlay_results = evaluate_combined_classifier(
        ds=ds,
        model_name=model_name,
        base_model=load_classifier_full(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=domain_batch_size,
            epochs=epochs,
        ),
        embedding_model=load_language_model_embedding_overlay(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=lm_batch_size,
            epochs=epochs,
        ),
        embedding_accessor=get_embedding_parameters_bert,
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )
    
    full_results = evaluate_classifier(
        ds=ds,
        model_name=model_name,
        model=load_classifier_full(
            model_name=model_name,
            dataset_name="general",
            data_folder=DATA_FOLDER,
            batch_size=classifier_batch_size,
            epochs=epochs,
        ),
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )
    full_combined_results = evaluate_combined_classifier(
        ds=ds,
        model_name=model_name,
        base_model=load_classifier_full(
            model_name=model_name,
            dataset_name="general",
            data_folder=DATA_FOLDER,
            batch_size=classifier_batch_size,
            epochs=epochs,
        ),
        embedding_model=load_language_model_embedding(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=lm_batch_size,
            epochs=epochs,
        ),
        embedding_accessor=get_embedding_parameters_bert,
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )
    full_overlay_results = evaluate_combined_classifier(
        ds=ds,
        model_name=model_name,
        base_model=load_classifier_full(
            model_name=model_name,
            dataset_name="general",
            data_folder=DATA_FOLDER,
            batch_size=classifier_batch_size,
            epochs=epochs,
        ),
        embedding_model=load_language_model_embedding_overlay(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=lm_batch_size,
            epochs=epochs,
        ),
        embedding_accessor=get_embedding_parameters_bert,
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )
    
    no_embedding_results = evaluate_classifier(
        ds=ds,
        model_name=model_name,
        model=load_classifier_base(
            model_name=model_name,
            dataset_name="general",
            data_folder=DATA_FOLDER,
            batch_size=classifier_batch_size,
            epochs=epochs,
        ),
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )
    base_combined_results = evaluate_combined_classifier(
        ds=ds,
        model_name=model_name,
        base_model=load_classifier_base(
            model_name=model_name,
            dataset_name="general",
            data_folder=DATA_FOLDER,
            batch_size=classifier_batch_size,
            epochs=epochs,
        ),
        embedding_model=load_language_model_embedding(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=lm_batch_size,
            epochs=epochs,
        ),
        embedding_accessor=get_embedding_parameters_bert,
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )
    base_overlay_results = evaluate_combined_classifier(
        ds=ds,
        model_name=model_name,
        base_model=load_classifier_base(
            model_name=model_name,
            dataset_name="general",
            data_folder=DATA_FOLDER,
            batch_size=classifier_batch_size,
            epochs=epochs,
        ),
        embedding_model=load_language_model_embedding_overlay(
            model_name=model_name,
            dataset_name=f"domain-{domain}",
            data_folder=DATA_FOLDER,
            batch_size=lm_batch_size,
            epochs=epochs,
        ),
        embedding_accessor=get_embedding_parameters_bert,
        batch_size=64,
        data_folder=DATA_FOLDER,
        metric=metric_accuracy,
    )

    return {
        "domain": domain,
        "specific_accuracy": specific_results["eval_accuracy"],
        "specific_combined_accuracy": specific_combined_results["eval_accuracy"],
        "specific_overlay_accuracy": specific_overlay_results["eval_accuracy"],
        "full_accuracy": full_results["eval_accuracy"],
        "full_combined_accuracy": full_combined_results["eval_accuracy"],
        "full_overlay_accuracy": full_overlay_results["eval_accuracy"],
        "base_accuracy": no_embedding_results["eval_accuracy"],
        "base_combined_accuracy": base_combined_results["eval_accuracy"],
        "base_overlay_accuracy": base_overlay_results["eval_accuracy"],
    }
Code
base_model=load_classifier_base(
    model_name=MODEL_NAME,
    dataset_name="general",
    data_folder=DATA_FOLDER,
    batch_size=64,
    epochs=5,
)
embedding_model=load_language_model_embedding_overlay(
    model_name=MODEL_NAME,
    dataset_name=f"domain-electronics",
    data_folder=DATA_FOLDER,
    batch_size=16,
    epochs=5,
)
Code
import torch
torch.all(torch.eq(base_model.bert.embeddings.word_embeddings.weight, embedding_model.bert.embeddings.word_embeddings.weight))
tensor(True)
Code
base_model
(BertForSequenceClassification(
   (bert): BertModel(
     (embeddings): BertEmbeddings(
       (word_embeddings): Embedding(30522, 768, padding_idx=0)
       (position_embeddings): Embedding(512, 768)
       (token_type_embeddings): Embedding(2, 768)
       (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
       (dropout): Dropout(p=0.1, inplace=False)
     )
     (encoder): BertEncoder(
       (layer): ModuleList(
         (0): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (1): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (2): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (3): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (4): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (5): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (6): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (7): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (8): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (9): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (10): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
         (11): BertLayer(
           (attention): BertAttention(
             (self): BertSelfAttention(
               (query): Linear(in_features=768, out_features=768, bias=True)
               (key): Linear(in_features=768, out_features=768, bias=True)
               (value): Linear(in_features=768, out_features=768, bias=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
             (output): BertSelfOutput(
               (dense): Linear(in_features=768, out_features=768, bias=True)
               (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
               (dropout): Dropout(p=0.1, inplace=False)
             )
           )
           (intermediate): BertIntermediate(
             (dense): Linear(in_features=768, out_features=3072, bias=True)
           )
           (output): BertOutput(
             (dense): Linear(in_features=3072, out_features=768, bias=True)
             (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
             (dropout): Dropout(p=0.1, inplace=False)
           )
         )
       )
     )
     (pooler): BertPooler(
       (dense): Linear(in_features=768, out_features=768, bias=True)
       (activation): Tanh()
     )
   )
   (dropout): Dropout(p=0.1, inplace=False)
   (classifier): Linear(in_features=768, out_features=2, bias=True)
 ),)
Code
#hide_output
import pandas as pd

result_df = pd.DataFrame([
    cross_evaluation(
        ds=ds,
        model_name=MODEL_NAME,
        domain=domain,
        classifier_batch_size=64,
        domain_batch_size=16,
        lm_batch_size=16,
        epochs=5,
    )
    for ds, domain in [
        (electronics_ds, "electronics"),
        (kitchen_ds, "kitchen"),
        (music_ds, "music"),
        (toys_ds, "toys"),
        (video_ds, "video"),
    ]
])
PyTorch: setting up devices
[16/16 00:09]
PyTorch: setting up devices
[16/16 00:09]
PyTorch: setting up devices
[16/16 00:09]
PyTorch: setting up devices
[16/16 00:09]
PyTorch: setting up devices
[16/16 00:09]
PyTorch: setting up devices
[16/16 00:09]
PyTorch: setting up devices
[16/16 00:09]
PyTorch: setting up devices
[16/16 00:09]
PyTorch: setting up devices
[16/16 00:09]
PyTorch: setting up devices
[16/16 00:08]
PyTorch: setting up devices
[16/16 00:08]
PyTorch: setting up devices
[16/16 00:08]
PyTorch: setting up devices
[16/16 00:08]
PyTorch: setting up devices
[16/16 00:08]
PyTorch: setting up devices
[16/16 00:08]
PyTorch: setting up devices
[16/16 00:08]
PyTorch: setting up devices
[16/16 00:08]
PyTorch: setting up devices
[16/16 00:08]
PyTorch: setting up devices
[16/16 00:10]
PyTorch: setting up devices
[16/16 00:10]
PyTorch: setting up devices
[16/16 00:10]
PyTorch: setting up devices
[16/16 00:10]
PyTorch: setting up devices
[16/16 00:10]
PyTorch: setting up devices
[16/16 00:10]
PyTorch: setting up devices
[16/16 00:10]
PyTorch: setting up devices
[16/16 00:10]
PyTorch: setting up devices
[16/16 00:10]
PyTorch: setting up devices
[16/16 00:07]
PyTorch: setting up devices
[16/16 00:07]
PyTorch: setting up devices
[16/16 00:07]
PyTorch: setting up devices
[16/16 00:07]
PyTorch: setting up devices
[16/16 00:07]
PyTorch: setting up devices
[16/16 00:07]
PyTorch: setting up devices
[16/16 00:07]
PyTorch: setting up devices
[16/16 00:07]
PyTorch: setting up devices
[16/16 00:07]
PyTorch: setting up devices
[16/16 00:10]
PyTorch: setting up devices
[16/16 00:10]
PyTorch: setting up devices
[16/16 00:10]
PyTorch: setting up devices
[16/16 00:10]
PyTorch: setting up devices
[16/16 00:10]
PyTorch: setting up devices
[16/16 00:10]
PyTorch: setting up devices
[16/16 00:10]
PyTorch: setting up devices
[16/16 00:10]
PyTorch: setting up devices
[16/16 00:10]
Code
result_df
domain specific_accuracy specific_combined_accuracy specific_overlay_accuracy full_accuracy full_combined_accuracy full_overlay_accuracy base_accuracy base_combined_accuracy base_overlay_accuracy
0 electronics 0.958 0.952 0.957 0.781 0.795 0.787 0.768 0.774 0.768
1 kitchen 0.940 0.944 0.941 0.800 0.797 0.804 0.794 0.796 0.794
2 music 0.929 0.925 0.927 0.773 0.771 0.777 0.759 0.757 0.759
3 toys 0.910 0.904 0.908 0.828 0.833 0.827 0.836 0.837 0.836
4 video 0.944 0.949 0.947 0.745 0.722 0.733 0.737 0.721 0.737

Evaluation - BERT - Domain Specific Model

Here we review the domain specific model and evaluate how replacing the embedding layer affects performance.

Domain Domain Sentiment Model with Replacement Embeddings
0 electronics 0.958 0.952
1 kitchen 0.94 0.944
2 music 0.929 0.925
3 toys 0.91 0.904
4 video 0.944 0.949

We can see that the domain specific model has high base accuracy. Replacing the embedding layer harms performance more often than it improves it. The difference when replacing the embeddings is not great.

Evaluation - BERT - General Model

Here we review the general sentiment model and evaluate how replacing the embedding layer affects performance.

Domain General Sentiment Model with Replacement Embeddings
0 electronics 0.781 0.795
1 kitchen 0.8 0.797
2 music 0.773 0.771
3 toys 0.828 0.833
4 video 0.745 0.722

Replacing the embedding layer produces only a small change, and harms performance more often than improving it.

Evaluation - BERT - General Model with Frozen Embeddings

Here we review the general sentiment model that has been trained with a frozen embedding layer. The intent of this is to ensure that the retraining of the embeddings for the domain is consistent with how the sentiment classifier works with the base embedding.

Domain General Sentiment Model with Frozen Embeddings with Replacement Embeddings
0 electronics 0.768 0.774
1 kitchen 0.794 0.796
2 music 0.759 0.757
3 toys 0.836 0.837
4 video 0.737 0.721

Once again the performance changes are marginal. It appears that this technique has not worked as I expected.

Conclusion - BERT

These results are not great. The use of the language model pretrained embedding layer does not result in a consistent improvement.

These results do show that the datasets significantly differ. I wonder if the amazon dataset could be considered sentiment?

One thing that I want to evaluate is the way that the domain specific embedding layer is trained. If I separate the adjustment to the weights then weight decay can apply strictly to the change to the embeddings. This would be interesting as it could then provide a concrete insight about which tokens have different meaning.