Code
import blog.transformers_logging
July 12, 2022
The XLM-RoBERTa model appears to be a good candidate for the multilingual prompt internalization task. When trained the model frequently loses the ability to differentiate between different words or phrases in the text. This leads to it returning predictions for different tokens that are extremely similar.
I believe that the training set is the source of this problem. Due to my inability to correctly associate different nouns across the English sentence and the translation each training row has only a single noun in it. This means that the training process does not punish the model for producing the same output for every token, as it only checks a single token.
To fix this I can think of the following options:
Make the model a double head model where one head is used for masked language modelling. This would ensure that the model retains an ability to distinguish detail at different points in the sentence.
Add a loss function which checks that there is variation across the output tokens for a row. A number of other tokens could be compared to the noun token with something like cosine similarity loss where the model should maximize the difference.
Ideally a better training set could be found which would have noun markup and association. There are several datasets available on the Statistical Machine Translation website, so reviewing them would be productive.
Until then I am going to overcomplicate the model to fix a deficiency in the dataset.
The conversational model that I recently retrained used the GPT2DoubleHeadsModel which has a classification head and a language modelling head. By reviewing this code I should be able to split the head of the XLM-RoBERTa model so that one can be used for language describing, and one can be used for masked language modelling. It’s likely that the training process will involve submitting the input twice as the masked language modelling requires providing masked inputs, while the language describing process needs all of the input to provide context.
# from src/main/python/blog/prompt_internalization/multilingual/double_head/model.py
import logging
from dataclasses import dataclass
from typing import Optional, Tuple, Union
import torch
from transformers import RobertaConfig, RobertaPreTrainedModel
from transformers.models.roberta.modeling_roberta import RobertaLMHead, RobertaModel
logger = logging.getLogger(__name__)
@dataclass
class RobertaDoubleHeadsOutput:
ld_logits: torch.FloatTensor
lm_logits: torch.FloatTensor
hidden_states: Optional[Tuple[torch.FloatTensor]] = None
attentions: Optional[Tuple[torch.FloatTensor]] = None
# Heavily copied from RobertaForMaskedLM
class RobertaDoubleHeadsModel(RobertaPreTrainedModel):
def __init__(self, config: RobertaConfig) -> None:
super().__init__(config)
if config.is_decoder:
logger.warning(
"If you want to use `RobertaDoubleHeadsModel` make sure "
"`config.is_decoder=False` for bi-directional self-attention."
)
self.roberta = RobertaModel(config, add_pooling_layer=False)
self.lm_head = RobertaLMHead(config)
self.ld_head = RobertaLMHead(config)
# The LM head weights require special treatment only when they are tied
# with the word embeddings
self.update_keys_to_ignore(config, ["lm_head.decoder.weight"])
# Initialize weights and apply final processing
self.post_init()
def copy_lm_weights_to_ld(self) -> None:
# copy the language modelling weights to the language describing head
print("initializing language describing head with language modelling weights")
self.ld_head.load_state_dict(self.lm_head.state_dict())
def forward(
self,
input_ids: Optional[torch.LongTensor] = None,
attention_mask: Optional[torch.FloatTensor] = None,
*,
token_type_ids: Optional[torch.LongTensor] = None,
position_ids: Optional[torch.LongTensor] = None,
head_mask: Optional[torch.FloatTensor] = None,
inputs_embeds: Optional[torch.FloatTensor] = None,
encoder_hidden_states: Optional[torch.FloatTensor] = None,
encoder_attention_mask: Optional[torch.FloatTensor] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
) -> Union[Tuple[torch.Tensor], RobertaDoubleHeadsOutput]:
# all loss calculations will be in the trainer
return_dict = (
return_dict if return_dict is not None else self.config.use_return_dict
)
outputs = self.roberta(
input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
position_ids=position_ids,
head_mask=head_mask,
inputs_embeds=inputs_embeds,
encoder_hidden_states=encoder_hidden_states,
encoder_attention_mask=encoder_attention_mask,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
)
sequence_output = outputs[0]
ld_output = self.ld_head(sequence_output)
lm_output = self.lm_head(sequence_output)
if not return_dict:
return (ld_output, lm_output) + outputs[2:]
return RobertaDoubleHeadsOutput(
ld_logits=ld_output,
lm_logits=lm_output,
hidden_states=outputs.hidden_states,
attentions=outputs.attentions,
)
This now needs to perform the language modelling task as well as the language describing task. Evaluation also needs to change as the model response uses different fields.
# from src/main/python/blog/prompt_internalization/multilingual/double_head/trainer.py
from itertools import starmap
from typing import Any, Dict, Tuple, Union
import torch
import torch.nn.functional as F
from transformers import AutoModelForMaskedLM, AutoTokenizer, Trainer, TrainingArguments
from .model import RobertaDoubleHeadsOutput
class DoubleHeadsPromptInternalizationTrainingArguments(TrainingArguments):
def __init__(
self,
*args,
temperature: float = 2.0,
mean_prediction: bool = True,
alpha: float = 0.5, # loss is alpha * lm_loss + (1 - alpha) * ld_loss
**kwargs,
) -> None:
assert 0 <= alpha <= 1
super().__init__(*args, **kwargs)
self.temperature = temperature
self.mean_prediction = mean_prediction
self.alpha = alpha
class DoubleHeadsPromptInternalizationTrainer(Trainer):
def __init__(
self,
*args,
teacher_model: AutoModelForMaskedLM = None,
tokenizer: AutoTokenizer = None,
**kwargs,
) -> None:
super().__init__(*args, **kwargs)
self.teacher = teacher_model
self._move_model_to_device(self.teacher, self.model.device)
self.teacher.eval()
self.mask_token_id = tokenizer.mask_token_id
self.vocab_size = tokenizer.vocab_size
def compute_loss(
self,
model: AutoModelForMaskedLM,
inputs: Dict[str, Union[torch.Tensor, Any]],
return_outputs: bool = False,
) -> Union[Tuple[torch.Tensor, torch.Tensor], torch.Tensor]:
outputs: RobertaDoubleHeadsOutput = model(
input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"]
)
if self.args.mean_prediction:
ld_predictions = self._student_predictions_mean(
outputs=outputs, labels=inputs["ld_labels"]
)
else:
ld_predictions = self._student_predictions_first(
outputs=outputs, labels=inputs["ld_labels"]
)
ld_targets = self._teacher_predictions(
input_ids=inputs["teacher_input_ids"],
attention_mask=inputs["teacher_attention_mask"],
)
ld_loss = self._ld_loss(predictions=ld_predictions, targets=ld_targets)
lm_loss = self._lm_loss(
predictions=outputs.lm_logits, targets=inputs["lm_labels"]
)
loss = self.args.alpha * lm_loss + (1 - self.args.alpha) * ld_loss
if not return_outputs:
return loss
# This directly calculates the kl_div and overlap metrics.
# It's much faster to do this using CUDA operations instead of waiting for cpu numpy.
with torch.inference_mode():
kl_div = F.kl_div(
input=F.log_softmax(ld_predictions.to(torch.float32), dim=-1),
target=F.softmax(ld_targets.to(torch.float32), dim=-1),
reduction="none",
log_target=False,
)
kl_div = kl_div.sum(dim=1)
overlap = starmap(
torch.isin,
zip(
ld_predictions.argsort(descending=True)[:, :10],
ld_targets.argsort(descending=True)[:, :10],
),
)
overlap = map(torch.sum, overlap)
overlap = torch.tensor(list(overlap), device=self.model.device)
overlap = overlap / 10
# calculating exact cross entropy values per row is hard with unattended tokens
# I can just repeat the rows to get it working with the mean in compute_metrics
# This will reshape the metrics to be [batch_size, 2] which will then
# get correctly passed to the metric calculation
metric_output = torch.cat(
[
kl_div[:, None],
overlap[:, None],
lm_loss.broadcast_to(inputs["input_ids"].shape[0], 1),
],
dim=1,
)
return loss, metric_output
@torch.inference_mode()
def _teacher_predictions(
self, input_ids: torch.Tensor, attention_mask: torch.Tensor
) -> torch.Tensor:
outputs_teacher = self.teacher(
input_ids=input_ids,
attention_mask=attention_mask,
)
mask_indices = input_ids == self.mask_token_id
teacher_predictions = outputs_teacher.logits[mask_indices]
return teacher_predictions
def _student_predictions_mean(
self, outputs: RobertaDoubleHeadsOutput, labels: torch.Tensor
) -> torch.Tensor:
# When calculating this it is very important to avoid breaking back propagation.
# torch.cat will break back propagation, so the prediction is added per row to a holder
logits = outputs.ld_logits
predictions = torch.zeros(logits.shape[0], device=logits.device)
for index, (start, length) in enumerate(labels):
prediction = logits[index, start : start + length]
prediction = prediction.mean(dim=0)
predictions[index] += prediction
return predictions
def _student_predictions_first(
self,
outputs: RobertaDoubleHeadsOutput,
labels: torch.Tensor,
) -> torch.Tensor:
return outputs.ld_logits[range(outputs.ld_logits.shape[0]), labels[:, 0]]
def _ld_loss(
self, predictions: torch.Tensor, targets: torch.Tensor
) -> torch.Tensor:
predictions = F.log_softmax(
predictions.to(torch.float32) / self.args.temperature, dim=-1
)
targets = F.softmax(targets.to(torch.float32) / self.args.temperature, dim=-1)
loss = F.kl_div(
input=predictions,
target=targets,
reduction="batchmean",
log_target=False,
)
return loss * (self.args.temperature**2)
def _lm_loss(
self, predictions: torch.Tensor, targets: torch.Tensor
) -> torch.Tensor:
return F.cross_entropy(
predictions.view(-1, self.vocab_size),
targets.view(-1),
)
# from src/main/python/blog/prompt_internalization/multilingual/double_head/collator.py
from typing import Any, Dict, List
import torch
from transformers import AutoTokenizer, DataCollatorForLanguageModeling
class DoubleHeadsCollator:
"""
The teacher inputs need to be padded and have an associated attention mask.
The student inputs need to be masked.
The student needs two labels -
lm_labels for language modelling, and
ld_labels for language describing
"""
def __init__(self, tokenizer: AutoTokenizer) -> None:
self.tokenizer = tokenizer
self.collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer,
mlm=True,
mlm_probability=0.15,
return_tensors="pt",
)
def __call__(self, features: List[Dict[str, Any]]) -> Dict[str, Any]:
teacher_inputs = self._teacher_inputs(features)
student_inputs = self._student_inputs(features)
batch = {**teacher_inputs, **student_inputs}
return batch
def _teacher_inputs(self, features: List[Dict[str, Any]]) -> Dict[str, List[Any]]:
teacher_inputs = [{"input_ids": row["teacher_input_ids"]} for row in features]
teacher_batch = self.tokenizer.pad(
teacher_inputs,
padding=True,
return_tensors="pt",
)
return {
"teacher_input_ids": teacher_batch["input_ids"],
"teacher_attention_mask": teacher_batch["attention_mask"],
}
def _student_inputs(self, features: List[Dict[str, Any]]) -> Dict[str, List[Any]]:
student_inputs = [{"input_ids": row["input_ids"]} for row in features]
student_labels = torch.tensor(
[row["labels"][0] for row in features],
dtype=torch.long,
)
lm_inputs = self.collator(student_inputs)
return {
"input_ids": lm_inputs["input_ids"],
"attention_mask": lm_inputs["attention_mask"],
"ld_labels": student_labels,
"lm_labels": lm_inputs["labels"],
}
# from src/main/python/blog/prompt_internalization/multilingual/double_head/metrics.py
from typing import Dict
from transformers import EvalPrediction
def compute_metrics(model_output: EvalPrediction) -> Dict[str, float]:
kl_div = model_output.predictions[:, 0].mean()
overlap = model_output.predictions[:, 1].mean()
cross_entropy = model_output.predictions[:, 2].mean()
return {
"kl_div": kl_div,
"overlap": overlap,
"cross_entropy": cross_entropy,
}
# from src/main/python/blog/prompt_internalization/multilingual/double_head/train.py
from pathlib import Path
from typing import Optional
import datasets
from transformers import AutoModelForMaskedLM, AutoTokenizer
from .collator import DoubleHeadsCollator
from .metrics import compute_metrics
from .model import RobertaDoubleHeadsModel
from .trainer import (
DoubleHeadsPromptInternalizationTrainer,
DoubleHeadsPromptInternalizationTrainingArguments,
)
DATASET_FOLDER = Path("/data/tatoeba/2022-06-18/dataset/")
MODEL_FOLDER = Path("/data/prompt-internalization/multilingual/")
RUN_FOLDER = Path("/tmp/runs")
MODEL_FOLDER.mkdir(parents=True, exist_ok=True)
RUN_FOLDER.mkdir(parents=True, exist_ok=True)
def train(
*,
model_name: str = "xlm-roberta-base",
dataset_name: str = "xlm-roberta",
batch_size: int = 64,
learning_rate: float = 1e-4,
temperature: float = 2,
alpha: float = 0.5,
fp16: bool = False,
mean_prediction: bool = False,
epochs: Optional[float] = 2,
max_steps: int = -1,
evaluation_steps: int = 500,
) -> Path:
run_name = "-".join(
[
f"{model_name}",
f"e{epochs}" if max_steps == -1 else f"ms{max_steps}",
f"bs{batch_size}",
f"lr{learning_rate}",
f"t{temperature}",
f"a{alpha}",
]
+ (["fp16"] if fp16 else [])
+ (["mean"] if mean_prediction else [])
)
print(f"Starting {run_name}")
train_ds = datasets.load_from_disk(DATASET_FOLDER / f"{dataset_name}-train.dataset")
test_ds = datasets.load_from_disk(DATASET_FOLDER / f"{dataset_name}-test.dataset")
training_args = DoubleHeadsPromptInternalizationTrainingArguments(
report_to="none",
output_dir=RUN_FOLDER,
num_train_epochs=epochs,
max_steps=max_steps,
seed=33,
# number of steps before moving evaluation results from GPU to CPU see
# https://discuss.huggingface.co/t/cuda-out-of-memory-when-using-trainer-with-compute-metrics/2941
eval_accumulation_steps=5,
#
# hyperparameters
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
fp16=fp16,
temperature=temperature,
alpha=alpha,
mean_prediction=mean_prediction,
learning_rate=learning_rate,
#
# evaluation settings
evaluation_strategy="steps",
logging_steps=evaluation_steps,
eval_steps=evaluation_steps,
save_steps=evaluation_steps,
#
# checkpoint settings
logging_dir=RUN_FOLDER / "logs",
save_total_limit=2,
load_best_model_at_end=True,
metric_for_best_model="overlap",
greater_is_better=True,
remove_unused_columns=False,
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
teacher_model = AutoModelForMaskedLM.from_pretrained(model_name)
student_model = RobertaDoubleHeadsModel.from_pretrained(model_name)
student_model.copy_lm_weights_to_ld()
data_collator = DoubleHeadsCollator(tokenizer=tokenizer)
trainer = DoubleHeadsPromptInternalizationTrainer(
model=student_model,
args=training_args,
teacher_model=teacher_model,
train_dataset=train_ds,
eval_dataset=test_ds,
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics,
)
trainer.train()
student_model.save_pretrained(MODEL_FOLDER / run_name)
return MODEL_FOLDER / run_name
# from src/main/python/blog/prompt_internalization/multilingual/double_head/evaluate.py
from pathlib import Path
from typing import Dict, List, Optional, Tuple
import torch
from transformers import AutoTokenizer
from .model import RobertaDoubleHeadsModel
def evaluate(
model_name: str,
model_path: Path,
ignore_tokens: Optional[Dict[str, List[int]]] = None,
) -> None:
if ignore_tokens is None:
ignore_tokens = {}
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = RobertaDoubleHeadsModel.from_pretrained(model_path)
model.eval()
bass_evaluation(model=model, tokenizer=tokenizer, ignore_tokens=ignore_tokens)
friday_evaluation(model=model, tokenizer=tokenizer, ignore_tokens=ignore_tokens)
malibu_evaluation(model=model, tokenizer=tokenizer, ignore_tokens=ignore_tokens)
football_evaluation(model=model, tokenizer=tokenizer, ignore_tokens=ignore_tokens)
def bass_evaluation(
model: RobertaDoubleHeadsModel,
tokenizer: AutoTokenizer,
ignore_tokens: Dict[str, List[int]],
) -> None:
first_phrase = "We spotted a large bass in the ocean."
second_phrase = "The bass player did not receive the acknowledgment she deserves."
third_phrase = "The black sea bass, is a member of the wreckfish family."
first_description = get_predictions(
model=model,
tokenizer=tokenizer,
text=first_phrase,
noun="bass",
ignore_tokens=ignore_tokens,
)
second_description = get_predictions(
model=model,
tokenizer=tokenizer,
text=second_phrase,
noun="bass",
ignore_tokens=ignore_tokens,
)
third_description = get_predictions(
model=model,
tokenizer=tokenizer,
text=third_phrase,
noun="bass",
ignore_tokens=ignore_tokens,
)
print("=== BASS EVALUATION ===")
print(f"First Phrase is: {first_phrase} Target is: bass")
print(f"Second Phrase is: {second_phrase} Target is: bass")
print(f"Third Phrase is: {third_phrase} Target is: bass")
print()
for key in first_description:
first_words = first_description[key]
second_words = second_description[key]
third_words = third_description[key]
first_second_overlap = set(first_words) & set(second_words)
first_third_overlap = set(first_words) & set(third_words)
second_third_overlap = set(second_words) & set(third_words)
print(f"First {key} description is: {', '.join(first_words)}")
print(f"Second {key} description is: {', '.join(second_words)}")
print(f"Third {key} description is: {', '.join(third_words)}")
print(
f"First & Second: {sorted(first_second_overlap)} ({len(first_second_overlap)})"
)
print(
f"First & Third: {sorted(first_third_overlap)} ({len(first_third_overlap)})"
)
print(
f"Second & Third: {sorted(second_third_overlap)} ({len(second_third_overlap)})"
)
print()
def friday_evaluation(
model: RobertaDoubleHeadsModel,
tokenizer: AutoTokenizer,
ignore_tokens: Dict[str, List[int]],
) -> None:
spanish_text = "Friday es mi canción favorita."
english_text = "Friday is my favourite song."
spanish_description = get_predictions(
model=model,
tokenizer=tokenizer,
text=spanish_text,
noun="Friday",
ignore_tokens=ignore_tokens,
)
english_description = get_predictions(
model=model,
tokenizer=tokenizer,
text=english_text,
noun="Friday",
ignore_tokens=ignore_tokens,
)
print("=== FRIDAY EVALUATION ===")
print(f"Spanish Phrase is: {spanish_text}")
print(f"English Phrase is: {english_text}")
print()
for key in spanish_description:
spanish_words = spanish_description[key]
english_words = english_description[key]
overlap = set(spanish_words) & set(english_words)
difference = set(spanish_words) ^ set(english_words)
print(f"Spanish {key} description is: {', '.join(spanish_words)}")
print(f"English {key} description is: {', '.join(english_words)}")
print(f"Overlap is: {', '.join(sorted(overlap))} ({len(overlap)})")
print(f"Difference is: {', '.join(sorted(difference))} ({len(difference)})")
print()
def malibu_evaluation(
model: RobertaDoubleHeadsModel,
tokenizer: AutoTokenizer,
ignore_tokens: Dict[str, List[int]],
) -> None:
text = "I like to drive my Malibu while drinking Malibu."
first_description = get_predictions(
model=model,
tokenizer=tokenizer,
text=text,
noun="Malibu",
ignore_tokens=ignore_tokens,
)
second_description = get_predictions(
model=model,
tokenizer=tokenizer,
text=text,
noun="Malibu",
index=1,
ignore_tokens=ignore_tokens,
)
print("=== MALIBU EVALUATION ===")
print(f"Phrase is: {text}")
print()
for key in first_description:
first_words = first_description[key]
second_words = second_description[key]
overlap = set(first_words) & set(second_words)
difference = set(first_words) ^ set(second_words)
print(f"First Malibu (car) {key} description is: {', '.join(first_words)}")
print(f"Second Malibu (drink) {key} description is: {', '.join(second_words)}")
print(f"Overlap is: {', '.join(sorted(overlap))} ({len(overlap)})")
print(f"Difference is: {', '.join(sorted(difference))} ({len(difference)})")
print()
def football_evaluation(
model: RobertaDoubleHeadsModel,
tokenizer: AutoTokenizer,
ignore_tokens: Dict[str, List[int]],
) -> None:
spanish_phrase = (
"Retiremos el equipo de la cancha, "
"Boca no merece jugar esta copa que "
"hace tiempo viene siendo desprestigiada.\n"
"Ya no se juega al futbol."
)
english_phrase = (
"Let's remove the team from the field, "
"Boca does not deserve to play this cup that "
"has long been discredited. "
"Football is no longer played."
)
print("=== FOOTBALL EVALUATION ===")
print(f"Spanish Phrase is: {spanish_phrase}")
print(f"English Phrase is: {english_phrase}")
print()
for spanish_noun, english_noun in [
["equipo", "team"],
["Boca", "Boca"],
["copa", "cup"],
["tiempo", "long"],
["futbol", "Football"],
]:
spanish_description = get_predictions(
model=model,
tokenizer=tokenizer,
text=spanish_phrase,
noun=spanish_noun,
ignore_tokens=ignore_tokens,
)
english_description = get_predictions(
model=model,
tokenizer=tokenizer,
text=english_phrase,
noun=english_noun,
ignore_tokens=ignore_tokens,
)
print(f"Spanish word is: {spanish_noun}, English word is: {english_noun}")
for key in spanish_description:
spanish_words = spanish_description[key]
english_words = english_description[key]
overlap = set(spanish_words) & set(english_words)
difference = set(spanish_words) ^ set(english_words)
print(f"Spanish {key} description is: {', '.join(spanish_words)}")
print(f"English {key} description is: {', '.join(english_words)}")
print(f"Overlap is: {', '.join(sorted(overlap))} ({len(overlap)})")
print(f"Difference is: {', '.join(sorted(difference))} ({len(difference)})")
print()
@torch.inference_mode()
def get_predictions(
*,
model: RobertaDoubleHeadsModel,
tokenizer: AutoTokenizer,
text: str,
noun: str,
ignore_tokens: Dict[str, List[int]],
index: int = 0,
) -> List[List[str]]:
tokens = tokenizer(text, return_tensors="pt")
start, _end = get_noun(
tokenizer=tokenizer, tokens=tokens.input_ids[0], noun=noun, index=index
)
output = model(**tokens)
predictions = output.ld_logits[0, start]
return get_filtered_tokens(
tokenizer=tokenizer, predictions=predictions, ignore_tokens=ignore_tokens
)
def get_noun(
tokenizer: AutoTokenizer, tokens: torch.Tensor, noun: str, index: int
) -> Tuple[int, int]:
length = tokens.shape[0]
current_index = index
for start_index in range(length):
word = tokenizer.decode(tokens[start_index]).strip()
if not noun.startswith(word):
continue
for end_index in range(start_index + 1, length):
word = tokenizer.decode(tokens[start_index:end_index]).strip()
if not noun == word:
continue
if current_index > 0:
current_index -= 1
else:
return start_index, end_index
raise AssertionError(f"Did not find {noun}[{index}] in {tokenizer.decode(tokens)}")
def get_filtered_tokens(
tokenizer: AutoTokenizer,
predictions: torch.Tensor,
ignore_tokens: Dict[str, List[int]],
) -> Dict[str, List[str]]:
return {
"none": get_tokens(
tokenizer=tokenizer, predictions=predictions, ignore_tokens=[]
),
**{
key: get_tokens(
tokenizer=tokenizer, predictions=predictions, ignore_tokens=tokens
)
for key, tokens in ignore_tokens.items()
},
}
def get_tokens(
tokenizer: AutoTokenizer, predictions: torch.Tensor, ignore_tokens: List[int]
) -> List[str]:
predictions = predictions.clone()
predictions[ignore_tokens] = predictions.min()
tokens = predictions.argsort(descending=True)[:10]
return [word.strip() for word in tokenizer.batch_decode(tokens)]
Now to try training a model with this. As before we will start with XLM-RoBERTa-base and move on to XLM-RoBERTa-large.
Starting xlm-roberta-base-e2-bs32-lr0.0001-t2-a0.5
initializing language describing head with language modelling weights
/home/matthew/.cache/pypoetry/virtualenvs/blog-HrtMnrOS-py3.10/lib/python3.10/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
Step | Training Loss | Validation Loss | Kl Div | Overlap | Cross Entropy |
---|---|---|---|---|---|
1000 | 1.511200 | 2.791588 | 0.438695 | 0.597767 | 5.104256 |
2000 | 1.265800 | 2.933053 | 0.375079 | 0.622868 | 5.453454 |
3000 | 1.165200 | 3.201085 | 0.359807 | 0.631752 | 6.016028 |
4000 | 1.037100 | 3.280622 | 0.339257 | 0.644167 | 6.192915 |
5000 | 0.981800 | 3.002947 | 0.329574 | 0.650252 | 5.650171 |
6000 | 0.944000 | 3.063303 | 0.319838 | 0.654045 | 5.778174 |
The default value for alpha (the ratio between language modelling loss and language describing loss) may well be off. The cross entropy loss over the validation set appears to be around 10x the language describing loss so I may want to put alpha down to ~0.1.
=== BASS EVALUATION ===
First Phrase is: We spotted a large bass in the ocean. Target is: bass
Second Phrase is: The bass player did not receive the acknowledgment she deserves. Target is: bass
Third Phrase is: The black sea bass, is a member of the wreckfish family. Target is: bass
First none description is: Location, Type, Description, Name, Color, Status, Owner, Material, Area, Size
Second none description is: Description, Type, Name, Status, Position, Title, Owner, Language, Location, Rating
Third none description is: Owner, Name, Type, Color, Description, Status, Country, Race, Age, Animal
First & Second: ['Description', 'Location', 'Name', 'Owner', 'Status', 'Type'] (6)
First & Third: ['Color', 'Description', 'Name', 'Owner', 'Status', 'Type'] (6)
Second & Third: ['Description', 'Name', 'Owner', 'Status', 'Type'] (5)
First top 10 description is: Color, Area, Size, View, Title, Feature, Application, Cat, Position, Weight
Second top 10 description is: Position, Title, Rating, Feature, Model, Color, Application, Instrument, Details, Motor
Third top 10 description is: Color, Race, Animal, Cat, Weight, Title, Model, Size, Family, Food
First & Second: ['Application', 'Color', 'Feature', 'Position', 'Title'] (5)
First & Third: ['Cat', 'Color', 'Size', 'Title', 'Weight'] (5)
Second & Third: ['Color', 'Model', 'Title'] (3)
First top .5 description is: Area, View, Feature, Application, Cat, Theme, Subject, Category, Source, Weather
Second top .5 description is: Rating, Feature, Model, Application, Instrument, Motor, Level, Style, Control, Item
Third top .5 description is: Race, Animal, Cat, Model, Family, Food, Gen, Subject, Feature, Profile
First & Second: ['Application', 'Feature'] (2)
First & Third: ['Cat', 'Feature', 'Subject'] (3)
Second & Third: ['Feature', 'Model'] (2)
=== FRIDAY EVALUATION ===
Spanish Phrase is: Friday es mi canción favorita.
English Phrase is: Friday is my favourite song.
Spanish none description is: Name, Description, Owner, Location, Comment, Title, Color, Status, Country, Tags
English none description is: Description, Tag, Tags, Name, Album, Country, Title, Status, Language, Theme
Overlap is: Country, Description, Name, Status, Tags, Title (6)
Difference is: Album, Color, Comment, Language, Location, Owner, Tag, Theme (8)
Spanish top 10 description is: Comment, Title, Color, Tags, Details, Photo, Date, Text, Album, Keyword
English top 10 description is: Tag, Tags, Album, Title, Theme, Keyword, Details, Music, Label, Labels
Overlap is: Album, Details, Keyword, Tags, Title (5)
Difference is: Color, Comment, Date, Label, Labels, Music, Photo, Tag, Text, Theme (10)
Spanish top .5 description is: Comment, Text, Album, Video, Subject, Year, Birthday, Motor, Credit, Home
English top .5 description is: Album, Theme, Music, Label, Labels, Motto, Song, Comments, Video, Category
Overlap is: Album, Video (2)
Difference is: Birthday, Category, Comment, Comments, Credit, Home, Label, Labels, Motor, Motto, Music, Song, Subject, Text, Theme, Year (16)
=== MALIBU EVALUATION ===
Phrase is: I like to drive my Malibu while drinking Malibu.
First Malibu (car) none description is: Country, Location, Land, Language, Status, Type, Region, Area, Name, City
Second Malibu (drink) none description is: Country, Location, Land, Language, Region, Status, Type, Area, Name, City
Overlap is: Area, City, Country, Land, Language, Location, Name, Region, Status, Type (10)
Difference is: (0)
First Malibu (car) top 10 description is: Land, Region, Area, City, Service, Address, State, Keyword, Style, Nation
Second Malibu (drink) top 10 description is: Land, Region, Area, City, Service, Food, Market, State, Address, Nation
Overlap is: Address, Area, City, Land, Nation, Region, Service, State (8)
Difference is: Food, Keyword, Market, Style (4)
First Malibu (car) top .5 description is: Region, Area, City, Service, State, Style, Nation, Food, Culture, Theme
Second Malibu (drink) top .5 description is: Region, Area, City, Service, Food, Market, State, Nation, Culture, Style
Overlap is: Area, City, Culture, Food, Nation, Region, Service, State, Style (9)
Difference is: Market, Theme (2)
=== FOOTBALL EVALUATION ===
Spanish Phrase is: Retiremos el equipo de la cancha, Boca no merece jugar esta copa que hace tiempo viene siendo desprestigiada.
Ya no se juega al futbol.
English Phrase is: Let's remove the team from the field, Boca does not deserve to play this cup that has long been discredited. Football is no longer played.
Spanish word is: equipo, English word is: team
Spanish none description is: Description, Type, Location, Name, Title, Owner, Game, Status, Sport, Position
English none description is: Type, Description, Location, Name, Title, Owner, Game, Material, Status, Position
Overlap is: Description, Game, Location, Name, Owner, Position, Status, Title, Type (9)
Difference is: Material, Sport (2)
Spanish top 10 description is: Title, Game, Sport, Position, Style, Theme, Rating, Category, Application, Size
English top 10 description is: Title, Game, Position, Size, Color, Team, Category, Sport, Rating, Brand
Overlap is: Category, Game, Position, Rating, Size, Sport, Title (7)
Difference is: Application, Brand, Color, Style, Team, Theme (6)
Spanish top .5 description is: Game, Sport, Style, Theme, Rating, Category, Application, Sports, Organization, Team
English top .5 description is: Game, Team, Category, Sport, Rating, Group, Organization, Style, Theme, Application
Overlap is: Application, Category, Game, Organization, Rating, Sport, Style, Team, Theme (9)
Difference is: Group, Sports (2)
Spanish word is: Boca, English word is: Boca
Spanish none description is: Owner, Name, Comment, Photo, Location, Brand, Color, Description, Author, ID
English none description is: Owner, Name, Location, Company, Color, Author, Photo, Family, Address, Comment
Overlap is: Author, Color, Comment, Location, Name, Owner, Photo (7)
Difference is: Address, Brand, Company, Description, Family, ID (6)
Spanish top 10 description is: Comment, Photo, Brand, Color, Author, ID, Details, Title, Nick, Song
English top 10 description is: Company, Color, Author, Photo, Family, Address, Comment, Motor, Person, User
Overlap is: Author, Color, Comment, Photo (4)
Difference is: Address, Brand, Company, Details, Family, ID, Motor, Nick, Person, Song, Title, User (12)
Spanish top .5 description is: Comment, Author, Nick, Song, Singer, Motor, Car, Company, About, Family
English top .5 description is: Company, Author, Family, Comment, Motor, Person, User, Member, Nick, Vehicle
Overlap is: Author, Comment, Company, Family, Motor, Nick (6)
Difference is: About, Car, Member, Person, Singer, Song, User, Vehicle (8)
Spanish word is: copa, English word is: cup
Spanish none description is: Type, Title, Game, Description, Sport, Sports, Location, Style, Status, Name
English none description is: Type, Description, Title, Game, Sport, Material, Sports, Style, Name, Status
Overlap is: Description, Game, Name, Sport, Sports, Status, Style, Title, Type (9)
Difference is: Location, Material (2)
Spanish top 10 description is: Title, Game, Sport, Sports, Style, Category, Series, Keyword, Theme, Color
English top 10 description is: Title, Game, Sport, Sports, Style, Series, Color, Category, Size, Theme
Overlap is: Category, Color, Game, Series, Sport, Sports, Style, Theme, Title (9)
Difference is: Keyword, Size (2)
Spanish top .5 description is: Game, Sport, Sports, Style, Category, Series, Theme, Games, Tip, Application
English top .5 description is: Game, Sport, Sports, Style, Series, Category, Theme, Application, Item, Feature
Overlap is: Application, Category, Game, Series, Sport, Sports, Style, Theme (8)
Difference is: Feature, Games, Item, Tip (4)
Spanish word is: tiempo, English word is: long
Spanish none description is: Age, Duration, Description, Time, Weight, Size, Rating, Year, Location, Date
English none description is: Age, Long, long, Weight, Status, Year, Life, Rating, Description, age
Overlap is: Age, Description, Rating, Weight, Year (5)
Difference is: Date, Duration, Life, Location, Long, Size, Status, Time, age, long (10)
Spanish top 10 description is: Duration, Time, Weight, Size, Rating, Year, Date, Range, Alter, Life
English top 10 description is: Long, long, Weight, Year, Life, Rating, age, Duration, Title, Race
Overlap is: Duration, Life, Rating, Weight, Year (5)
Difference is: Alter, Date, Long, Race, Range, Size, Time, Title, age, long (10)
Spanish top .5 description is: Duration, Rating, Year, Range, Alter, Life, Price, Game, Period, Race
English top .5 description is: Long, long, Year, Life, Rating, age, Duration, Race, Price, Quality
Overlap is: Duration, Life, Price, Race, Rating, Year (6)
Difference is: Alter, Game, Long, Period, Quality, Range, age, long (8)
Spanish word is: futbol, English word is: Football
Spanish none description is: Sport, Sports, Game, Type, Style, Hobby, Football, Description, Games, Country
English none description is: Sport, Sports, Type, Game, Description, Football, Style, Hobby, Title, Category
Overlap is: Description, Football, Game, Hobby, Sport, Sports, Style, Type (8)
Difference is: Category, Country, Games, Title (4)
Spanish top 10 description is: Sport, Sports, Game, Style, Hobby, Football, Games, Theme, Application, Series
English top 10 description is: Sport, Sports, Game, Football, Style, Hobby, Title, Category, Theme, Games
Overlap is: Football, Game, Games, Hobby, Sport, Sports, Style, Theme (8)
Difference is: Application, Category, Series, Title (4)
Spanish top .5 description is: Sport, Sports, Game, Style, Hobby, Football, Games, Theme, Application, Series
English top .5 description is: Sport, Sports, Game, Football, Style, Hobby, Category, Theme, Games, Application
Overlap is: Application, Football, Game, Games, Hobby, Sport, Sports, Style, Theme (9)
Difference is: Category, Series (2)
Starting xlm-roberta-large-e2-bs8-lr0.0001-t2-a0.5
initializing language describing head with language modelling weights
/home/matthew/.cache/pypoetry/virtualenvs/blog-HrtMnrOS-py3.10/lib/python3.10/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
Step | Training Loss | Validation Loss | Kl Div | Overlap | Cross Entropy |
---|---|---|---|---|---|
1000 | 4.485700 | nan | 1.408290 | 0.383782 | nan |
2000 | 4.765600 | 5.876565 | 1.386561 | 0.399743 | 10.665840 |
3000 | 4.725900 | 6.066762 | 1.386035 | 0.392967 | 11.045374 |
4000 | 4.655300 | 6.264757 | 1.401647 | 0.378980 | 11.443541 |
5000 | 4.663600 | 6.235996 | 1.399746 | 0.361958 | 11.384928 |
6000 | 4.618500 | 6.168671 | 1.380730 | 0.380718 | 11.256695 |
7000 | 4.573900 | 6.172040 | 1.381824 | 0.366568 | 11.266456 |
8000 | 4.593900 | 6.212361 | 1.382200 | 0.380718 | 11.342365 |
9000 | 4.533700 | 6.132387 | 1.382104 | 0.380718 | 11.183510 |
10000 | 4.573600 | 6.185318 | 1.373810 | 0.366568 | 11.290386 |
11000 | 4.520700 | 6.170875 | 1.390312 | 0.388165 | 11.261891 |
12000 | 4.564600 | nan | 1.379688 | 0.380718 | nan |
13000 | 4.494800 | 6.219527 | 1.394951 | 0.387430 | 11.360277 |
14000 | 4.483600 | 6.078000 | 1.388268 | 0.380718 | 11.073941 |
15000 | 4.480900 | 6.189854 | 1.384314 | 0.382721 | 11.301337 |
16000 | 4.417300 | 6.161162 | 1.378112 | 0.399743 | 11.245730 |
17000 | 4.422100 | 6.119100 | 1.384519 | 0.376409 | 11.160753 |
18000 | 4.454800 | 6.121497 | 1.380128 | 0.383782 | 11.166265 |
19000 | 4.441200 | 6.126425 | 1.390149 | 0.378980 | 11.174656 |
20000 | 4.441400 | 6.090848 | 1.388660 | 0.399743 | 11.104525 |
21000 | 4.442400 | 6.121112 | 1.377978 | 0.383782 | 11.165845 |
22000 | 4.450800 | 6.118627 | 1.385170 | 0.399743 | 11.159862 |
23000 | 4.468500 | 6.133310 | 1.378750 | 0.380718 | 11.189851 |
24000 | 4.451300 | 6.110619 | 1.376852 | 0.399743 | 11.145279 |
=== BASS EVALUATION ===
First Phrase is: We spotted a large bass in the ocean. Target is: bass
Second Phrase is: The bass player did not receive the acknowledgment she deserves. Target is: bass
Third Phrase is: The black sea bass, is a member of the wreckfish family. Target is: bass
First none description is: Name, Type, Location, Owner, Description, Age, Title, Category, Other, Author
Second none description is: Name, Type, Location, Owner, Description, Age, Title, Category, Other, Author
Third none description is: Name, Type, Location, Owner, Description, Age, Title, Category, Other, Author
First & Second: ['Age', 'Author', 'Category', 'Description', 'Location', 'Name', 'Other', 'Owner', 'Title', 'Type'] (10)
First & Third: ['Age', 'Author', 'Category', 'Description', 'Location', 'Name', 'Other', 'Owner', 'Title', 'Type'] (10)
Second & Third: ['Age', 'Author', 'Category', 'Description', 'Location', 'Name', 'Other', 'Owner', 'Title', 'Type'] (10)
First top 10 description is: Other, Author, Product, Color, Place, Service, Tags, Country, Job, Language
Second top 10 description is: Other, Author, Product, Color, Place, Service, Tags, Country, Job, Language
Third top 10 description is: Other, Author, Product, Color, Place, Service, Tags, Country, Job, Language
First & Second: ['Author', 'Color', 'Country', 'Job', 'Language', 'Other', 'Place', 'Product', 'Service', 'Tags'] (10)
First & Third: ['Author', 'Color', 'Country', 'Job', 'Language', 'Other', 'Place', 'Product', 'Service', 'Tags'] (10)
Second & Third: ['Author', 'Color', 'Country', 'Job', 'Language', 'Other', 'Place', 'Product', 'Service', 'Tags'] (10)
First top .5 description is: Color, Family, Subject, Size, Model, Company, Position, Style, Race, Photo
Second top .5 description is: Color, Family, Subject, Size, Model, Company, Position, Style, Race, Photo
Third top .5 description is: Color, Family, Subject, Size, Model, Company, Position, Style, Race, Photo
First & Second: ['Color', 'Company', 'Family', 'Model', 'Photo', 'Position', 'Race', 'Size', 'Style', 'Subject'] (10)
First & Third: ['Color', 'Company', 'Family', 'Model', 'Photo', 'Position', 'Race', 'Size', 'Style', 'Subject'] (10)
Second & Third: ['Color', 'Company', 'Family', 'Model', 'Photo', 'Position', 'Race', 'Size', 'Style', 'Subject'] (10)
=== FRIDAY EVALUATION ===
Spanish Phrase is: Friday es mi canción favorita.
English Phrase is: Friday is my favourite song.
Spanish none description is: Name, Type, Location, Owner, Description, Age, Title, Category, Other, Author
English none description is: Name, Type, Location, Owner, Description, Age, Title, Category, Other, Author
Overlap is: Age, Author, Category, Description, Location, Name, Other, Owner, Title, Type (10)
Difference is: (0)
Spanish top 10 description is: Other, Author, Product, Color, Place, Service, Tags, Country, Job, Language
English top 10 description is: Other, Author, Product, Color, Place, Service, Tags, Country, Job, Language
Overlap is: Author, Color, Country, Job, Language, Other, Place, Product, Service, Tags (10)
Difference is: (0)
Spanish top .5 description is: Color, Family, Subject, Size, Model, Company, Position, Style, Race, Photo
English top .5 description is: Color, Family, Subject, Size, Model, Company, Position, Style, Race, Photo
Overlap is: Color, Company, Family, Model, Photo, Position, Race, Size, Style, Subject (10)
Difference is: (0)
=== MALIBU EVALUATION ===
Phrase is: I like to drive my Malibu while drinking Malibu.
First Malibu (car) none description is: Name, Type, Location, Owner, Description, Age, Title, Category, Other, Author
Second Malibu (drink) none description is: Name, Type, Location, Owner, Description, Age, Title, Category, Other, Author
Overlap is: Age, Author, Category, Description, Location, Name, Other, Owner, Title, Type (10)
Difference is: (0)
First Malibu (car) top 10 description is: Other, Author, Product, Color, Place, Service, Tags, Country, Job, Language
Second Malibu (drink) top 10 description is: Other, Author, Product, Color, Place, Service, Tags, Country, Job, Language
Overlap is: Author, Color, Country, Job, Language, Other, Place, Product, Service, Tags (10)
Difference is: (0)
First Malibu (car) top .5 description is: Color, Family, Subject, Size, Model, Company, Position, Style, Race, Photo
Second Malibu (drink) top .5 description is: Color, Family, Subject, Size, Model, Company, Position, Style, Race, Photo
Overlap is: Color, Company, Family, Model, Photo, Position, Race, Size, Style, Subject (10)
Difference is: (0)
=== FOOTBALL EVALUATION ===
Spanish Phrase is: Retiremos el equipo de la cancha, Boca no merece jugar esta copa que hace tiempo viene siendo desprestigiada.
Ya no se juega al futbol.
English Phrase is: Let's remove the team from the field, Boca does not deserve to play this cup that has long been discredited. Football is no longer played.
Spanish word is: equipo, English word is: team
Spanish none description is: Name, Type, Location, Owner, Description, Age, Title, Category, Other, Author
English none description is: Name, Type, Location, Owner, Description, Age, Title, Category, Other, Author
Overlap is: Age, Author, Category, Description, Location, Name, Other, Owner, Title, Type (10)
Difference is: (0)
Spanish top 10 description is: Other, Author, Product, Color, Place, Service, Tags, Country, Job, Language
English top 10 description is: Other, Author, Product, Color, Place, Service, Tags, Country, Job, Language
Overlap is: Author, Color, Country, Job, Language, Other, Place, Product, Service, Tags (10)
Difference is: (0)
Spanish top .5 description is: Color, Family, Subject, Size, Model, Company, Position, Style, Race, Photo
English top .5 description is: Color, Family, Subject, Size, Model, Company, Position, Style, Race, Photo
Overlap is: Color, Company, Family, Model, Photo, Position, Race, Size, Style, Subject (10)
Difference is: (0)
Spanish word is: Boca, English word is: Boca
Spanish none description is: Name, Type, Location, Owner, Description, Age, Title, Category, Other, Author
English none description is: Name, Type, Location, Owner, Description, Age, Title, Category, Other, Author
Overlap is: Age, Author, Category, Description, Location, Name, Other, Owner, Title, Type (10)
Difference is: (0)
Spanish top 10 description is: Other, Author, Product, Color, Place, Service, Tags, Country, Job, Language
English top 10 description is: Other, Author, Product, Color, Place, Service, Tags, Country, Job, Language
Overlap is: Author, Color, Country, Job, Language, Other, Place, Product, Service, Tags (10)
Difference is: (0)
Spanish top .5 description is: Color, Family, Subject, Size, Model, Company, Position, Style, Race, Photo
English top .5 description is: Color, Family, Subject, Size, Model, Company, Position, Style, Race, Photo
Overlap is: Color, Company, Family, Model, Photo, Position, Race, Size, Style, Subject (10)
Difference is: (0)
Spanish word is: copa, English word is: cup
Spanish none description is: Name, Type, Location, Owner, Description, Age, Title, Category, Other, Author
English none description is: Name, Type, Location, Owner, Description, Age, Title, Category, Other, Author
Overlap is: Age, Author, Category, Description, Location, Name, Other, Owner, Title, Type (10)
Difference is: (0)
Spanish top 10 description is: Other, Author, Product, Color, Place, Service, Tags, Country, Job, Language
English top 10 description is: Other, Author, Product, Color, Place, Service, Tags, Country, Job, Language
Overlap is: Author, Color, Country, Job, Language, Other, Place, Product, Service, Tags (10)
Difference is: (0)
Spanish top .5 description is: Color, Family, Subject, Size, Model, Company, Position, Style, Race, Photo
English top .5 description is: Color, Family, Subject, Size, Model, Company, Position, Style, Race, Photo
Overlap is: Color, Company, Family, Model, Photo, Position, Race, Size, Style, Subject (10)
Difference is: (0)
Spanish word is: tiempo, English word is: long
Spanish none description is: Name, Type, Location, Owner, Description, Age, Title, Category, Other, Author
English none description is: Name, Type, Location, Owner, Description, Age, Title, Category, Other, Author
Overlap is: Age, Author, Category, Description, Location, Name, Other, Owner, Title, Type (10)
Difference is: (0)
Spanish top 10 description is: Other, Author, Product, Color, Place, Service, Tags, Country, Job, Language
English top 10 description is: Other, Author, Product, Color, Place, Service, Tags, Country, Job, Language
Overlap is: Author, Color, Country, Job, Language, Other, Place, Product, Service, Tags (10)
Difference is: (0)
Spanish top .5 description is: Color, Family, Subject, Size, Model, Company, Position, Style, Race, Photo
English top .5 description is: Color, Family, Subject, Size, Model, Company, Position, Style, Race, Photo
Overlap is: Color, Company, Family, Model, Photo, Position, Race, Size, Style, Subject (10)
Difference is: (0)
Spanish word is: futbol, English word is: Football
Spanish none description is: Name, Type, Location, Owner, Description, Age, Title, Category, Other, Author
English none description is: Name, Type, Location, Owner, Description, Age, Title, Category, Other, Author
Overlap is: Age, Author, Category, Description, Location, Name, Other, Owner, Title, Type (10)
Difference is: (0)
Spanish top 10 description is: Other, Author, Product, Color, Place, Service, Tags, Country, Job, Language
English top 10 description is: Other, Author, Product, Color, Place, Service, Tags, Country, Job, Language
Overlap is: Author, Color, Country, Job, Language, Other, Place, Product, Service, Tags (10)
Difference is: (0)
Spanish top .5 description is: Color, Family, Subject, Size, Model, Company, Position, Style, Race, Photo
English top .5 description is: Color, Family, Subject, Size, Model, Company, Position, Style, Race, Photo
Overlap is: Color, Company, Family, Model, Photo, Position, Race, Size, Style, Subject (10)
Difference is: (0)
The large model has collapsed terribly.
Starting xlm-roberta-base-e2-bs32-lr0.0001-t2-a0.1
initializing language describing head with language modelling weights
/home/matthew/.cache/pypoetry/virtualenvs/blog-HrtMnrOS-py3.10/lib/python3.10/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
Step | Training Loss | Validation Loss | Kl Div | Overlap | Cross Entropy |
---|---|---|---|---|---|
1000 | 0.701300 | 0.856570 | 0.335526 | 0.646788 | 5.372933 |
2000 | 0.538200 | 0.878939 | 0.303974 | 0.664985 | 5.863934 |
3000 | 0.494800 | 0.856900 | 0.291485 | 0.675407 | 5.824556 |
4000 | 0.440000 | 0.870678 | 0.282658 | 0.683110 | 5.997384 |
5000 | 0.420700 | 0.854028 | 0.277266 | 0.687389 | 5.925179 |
6000 | 0.405200 | 0.836515 | 0.266587 | 0.693247 | 5.836579 |
Could not locate the tokenizer configuration file, will try to use the model config instead.
=== BASS EVALUATION ===
First Phrase is: We spotted a large bass in the ocean. Target is: bass
Second Phrase is: The bass player did not receive the acknowledgment she deserves. Target is: bass
Third Phrase is: The black sea bass, is a member of the wreckfish family. Target is: bass
First none description is: Location, Description, Type, Name, Area, Status, Color, Material, View, Country
Second none description is: Description, Name, Type, Status, Title, Owner, Location, Details, Position, Material
Third none description is: Type, Name, Description, Color, Owner, Weight, Status, Size, Country, Age
First & Second: ['Description', 'Location', 'Material', 'Name', 'Status', 'Type'] (6)
First & Third: ['Color', 'Country', 'Description', 'Name', 'Status', 'Type'] (6)
Second & Third: ['Description', 'Name', 'Owner', 'Status', 'Type'] (5)
First top 10 description is: Area, Color, View, Size, Title, Position, Category, Application, Cat, Land
Second top 10 description is: Title, Details, Position, Rating, Application, Contact, Item, Information, Service, Model
Third top 10 description is: Color, Weight, Size, Race, Food, Model, Cat, Style, Profile, Feature
First & Second: ['Application', 'Position', 'Title'] (3)
First & Third: ['Cat', 'Color', 'Size'] (3)
Second & Third: ['Model'] (1)
First top .5 description is: Area, View, Category, Application, Cat, Feature, Subject, Views, Theme, Source
Second top .5 description is: Rating, Application, Item, Information, Service, Model, Feature, Image, Comment, Driver
Third top .5 description is: Race, Food, Model, Cat, Style, Profile, Feature, Animal, Gene, Subject
First & Second: ['Application', 'Feature'] (2)
First & Third: ['Cat', 'Feature', 'Subject'] (3)
Second & Third: ['Feature', 'Model'] (2)
=== FRIDAY EVALUATION ===
Spanish Phrase is: Friday es mi canción favorita.
English Phrase is: Friday is my favourite song.
Spanish none description is: Name, Comment, Owner, Description, Title, Country, Photo, Tags, Color, Location
English none description is: Description, Tags, Title, Tag, Name, Comment, Album, Comments, Country, Status
Overlap is: Comment, Country, Description, Name, Tags, Title (6)
Difference is: Album, Color, Comments, Location, Owner, Photo, Status, Tag (8)
Spanish top 10 description is: Comment, Title, Photo, Tags, Color, Keyword, Video, Address, Details, Date
English top 10 description is: Tags, Title, Tag, Comment, Album, Comments, Details, Keyword, Labels, Photo
Overlap is: Comment, Details, Keyword, Photo, Tags, Title (6)
Difference is: Address, Album, Color, Comments, Date, Labels, Tag, Video (8)
Spanish top .5 description is: Comment, Video, Album, Text, Subject, Source, Home, Comments, Cat, Email
English top .5 description is: Comment, Album, Comments, Labels, Theme, Video, ..., Label, More, Text
Overlap is: Album, Comment, Comments, Text, Video (5)
Difference is: ..., Cat, Email, Home, Label, Labels, More, Source, Subject, Theme (10)
=== MALIBU EVALUATION ===
Phrase is: I like to drive my Malibu while drinking Malibu.
First Malibu (car) none description is: Name, Type, Owner, Description, Color, Material, Product, Brand, Country, Food
Second Malibu (drink) none description is: Name, Type, Country, Owner, Food, Description, Product, Language, Color, Brand
Overlap is: Brand, Color, Country, Description, Food, Name, Owner, Product, Type (9)
Difference is: Language, Material (2)
First Malibu (car) top 10 description is: Color, Product, Brand, Food, Style, Application, Cat, Keyword, Model, Theme
Second Malibu (drink) top 10 description is: Food, Product, Color, Brand, Application, Keyword, Land, Style, Theme, Cat
Overlap is: Application, Brand, Cat, Color, Food, Keyword, Product, Style, Theme (9)
Difference is: Land, Model (2)
First Malibu (car) top .5 description is: Food, Style, Application, Cat, Model, Theme, Tip, Motor, Category, Animal
Second Malibu (drink) top .5 description is: Food, Application, Style, Theme, Cat, Service, Model, Animal, Category, Root
Overlap is: Animal, Application, Cat, Category, Food, Model, Style, Theme (8)
Difference is: Motor, Root, Service, Tip (4)
=== FOOTBALL EVALUATION ===
Spanish Phrase is: Retiremos el equipo de la cancha, Boca no merece jugar esta copa que hace tiempo viene siendo desprestigiada.
Ya no se juega al futbol.
English Phrase is: Let's remove the team from the field, Boca does not deserve to play this cup that has long been discredited. Football is no longer played.
Spanish word is: equipo, English word is: team
Spanish none description is: Description, Type, Name, Title, Location, Owner, Game, Status, Age, Country
English none description is: Type, Description, Name, Title, Game, Location, Sport, Owner, Sports, Status
Overlap is: Description, Game, Location, Name, Owner, Status, Title, Type (8)
Difference is: Age, Country, Sport, Sports (4)
Spanish top 10 description is: Title, Game, Organization, Rating, Category, Team, Theme, Color, Brand, Sport
English top 10 description is: Title, Game, Sport, Sports, Category, Theme, Application, Style, Team, Organization
Overlap is: Category, Game, Organization, Sport, Team, Theme, Title (7)
Difference is: Application, Brand, Color, Rating, Sports, Style (6)
Spanish top .5 description is: Game, Organization, Rating, Category, Team, Theme, Sport, Application, Style, Sports
English top .5 description is: Game, Sport, Sports, Category, Theme, Application, Style, Team, Organization, Sponsor
Overlap is: Application, Category, Game, Organization, Sport, Sports, Style, Team, Theme (9)
Difference is: Rating, Sponsor (2)
Spanish word is: Boca, English word is: Boca
Spanish none description is: Owner, Name, Color, Location, Song, Comment, Nick, Title, Author, ID
English none description is: Owner, Name, Color, Author, Family, Details, Location, Comment, Company, Photo
Overlap is: Author, Color, Comment, Location, Name, Owner (6)
Difference is: Company, Details, Family, ID, Nick, Photo, Song, Title (8)
Spanish top 10 description is: Color, Song, Comment, Nick, Title, Author, ID, Photo, Details, Content
English top 10 description is: Color, Author, Family, Details, Comment, Company, Photo, Police, Nick, ID
Overlap is: Author, Color, Comment, Details, ID, Nick, Photo (7)
Difference is: Company, Content, Family, Police, Song, Title (6)
Spanish top .5 description is: Song, Comment, Nick, Author, Content, Singer, Logo, Family, Car, Service
English top .5 description is: Author, Family, Comment, Company, Police, Nick, User, Person, Car, Customer
Overlap is: Author, Car, Comment, Family, Nick (5)
Difference is: Company, Content, Customer, Logo, Person, Police, Service, Singer, Song, User (10)
Spanish word is: copa, English word is: cup
Spanish none description is: Type, Game, Title, Description, Sport, Status, Sports, Name, Location, Age
English none description is: Type, Game, Sport, Sports, Title, Description, Style, Application, Category, Theme
Overlap is: Description, Game, Sport, Sports, Title, Type (6)
Difference is: Age, Application, Category, Location, Name, Status, Style, Theme (8)
Spanish top 10 description is: Game, Title, Sport, Sports, Category, Application, Theme, Series, Games, Style
English top 10 description is: Game, Sport, Sports, Title, Style, Application, Category, Theme, Series, Games
Overlap is: Application, Category, Game, Games, Series, Sport, Sports, Style, Theme, Title (10)
Difference is: (0)
Spanish top .5 description is: Game, Sport, Sports, Category, Application, Theme, Series, Games, Style, Rating
English top .5 description is: Game, Sport, Sports, Style, Application, Category, Theme, Series, Games, Football
Overlap is: Application, Category, Game, Games, Series, Sport, Sports, Style, Theme (9)
Difference is: Football, Rating (2)
Spanish word is: tiempo, English word is: long
Spanish none description is: Age, Description, Duration, Year, Game, Time, Location, Status, Date, Rating
English none description is: long, Long, Age, age, Year, Country, Price, Title, Far, Racing
Overlap is: Age, Year (2)
Difference is: Country, Date, Description, Duration, Far, Game, Location, Long, Price, Racing, Rating, Status, Time, Title, age, long (16)
Spanish top 10 description is: Duration, Year, Game, Time, Date, Rating, Size, Season, Weight, Race
English top 10 description is: long, Long, age, Year, Price, Title, Far, Racing, Weight, Contract
Overlap is: Weight, Year (2)
Difference is: Contract, Date, Duration, Far, Game, Long, Price, Race, Racing, Rating, Season, Size, Time, Title, age, long (16)
Spanish top .5 description is: Duration, Year, Game, Rating, Season, Race, Level, Application, Price, Games
English top .5 description is: long, Long, age, Year, Price, Far, Racing, Contract, Brown, Song
Overlap is: Price, Year (2)
Difference is: Application, Brown, Contract, Duration, Far, Game, Games, Level, Long, Race, Racing, Rating, Season, Song, age, long (16)
Spanish word is: futbol, English word is: Football
Spanish none description is: Sport, Sports, Game, Type, Style, Hobby, Football, Games, Application, Country
English none description is: Sport, Sports, Game, Type, Style, Football, Hobby, Theme, Application, Games
Overlap is: Application, Football, Game, Games, Hobby, Sport, Sports, Style, Type (9)
Difference is: Country, Theme (2)
Spanish top 10 description is: Sport, Sports, Game, Style, Hobby, Football, Games, Application, Theme, Title
English top 10 description is: Sport, Sports, Game, Style, Football, Hobby, Theme, Application, Games, Category
Overlap is: Application, Football, Game, Games, Hobby, Sport, Sports, Style, Theme (9)
Difference is: Category, Title (2)
Spanish top .5 description is: Sport, Sports, Game, Style, Hobby, Football, Games, Application, Theme, Series
English top .5 description is: Sport, Sports, Game, Style, Football, Hobby, Theme, Application, Games, Category
Overlap is: Application, Football, Game, Games, Hobby, Sport, Sports, Style, Theme (9)
Difference is: Category, Series (2)
Reviewing these evaluations is becoming quite difficult. In each one I can see things that I do not like and things that I do. A more objective way to assess the quality of the model is required.