Bird Brain Mammogram Classifier

Someone linked me a 2015 paper on using pigeons to classify mammograms. I want to reproduce this using a neural network
Published

September 24, 2021

My friends know of my interest in deep learning and someone linked me a paper about using pigeons to classify mammograms to identify breast cancer (Levenson 2015). When I saw this I immediately wanted to recreate this test using a computer vision model. The trained pigeon accuracy reaches 85% after 15 days - can I do better than that?

Levenson, Elizabeth A. AND Navarro, Richard M. AND Krupinski. 2015. “Pigeons (Columba Livia) as Trainable Observers of Pathology and Radiology Breast Cancer Images.” PLOS ONE 10 (11): 1–21. https://doi.org/10.1371/journal.pone.0141357.

Dataset

The dataset was 144 images at three levels of magnification. This is quite a small number for training a model from an imagenet base. It would be good to find a model that has been pretrained on medical images.

The other problem with replicating this study is that the images are provided in pdf files. I have to extract them from that and ensure that they are correctly labelled.

PDF Image Extraction

Looking at the pdf there are 288 images that are split into set A and set B. Lets start by extracting the images and labelling them.

Code
from pathlib import Path

DATA_FOLDER = Path("/data/blog/2021-09-24-bird-brain/")
IMAGE_PDF = DATA_FOLDER / "images.pdf"

IMAGES_FOLDER = DATA_FOLDER / "images"
IMAGES_FOLDER.mkdir(parents=True, exist_ok=True)
Code
import fitz

pages = [
    "4x-normal-a",
    "4x-normal-b",
    "4x-cancer-a",
    "4x-cancer-b",
    "10x-normal-a",
    "10x-normal-b",
    "10x-cancer-a",
    "10x-cancer-b",
    "20x-normal-a",
    "20x-normal-b",
    "20x-cancer-a",
    "20x-cancer-b",
    "10x-normal-monochrome-a",
    "10x-normal-monochrome-b",
    "10x-cancer-monochrome-a",
    "10x-cancer-monochrome-b",
    "10x-normal-monochrome-115-a",
    "10x-normal-monochrome-115-b",
    "10x-cancer-monochrome-115-a",
    "10x-cancer-monochrome-115-b",
    "10x-normal-monochrome-127-a",
    "10x-normal-monochrome-127-b",
    "10x-cancer-monochrome-127-a",
    "10x-cancer-monochrome-127-b",
]

doc = fitz.open(IMAGE_PDF)
assert len(doc) == len(pages)
for page, name in enumerate(pages):
    for index, image in enumerate(doc.getPageImageList(page)):
        path = IMAGES_FOLDER / f"{name}-{index:02d}.png"
        xref = image[0]
        pix = fitz.Pixmap(doc, xref)
        if pix.n >= 5: # CMYK: convert to RGB first
            pix = fitz.Pixmap(fitz.csRGB, pix)
        pix.writePNG(path)
        pix = None

We can have a quick look at one to ensure it has loaded correctly:

Code
from PIL import Image

image_file = sorted(IMAGES_FOLDER.glob("*.png"))[0]
image = Image.open(image_file)
print(f"Loaded {image_file.name} ({image.height}, {image.width})")
image
Loaded 10x-cancer-a-00.png (288, 288)

This can be seen as the first image on page 7, so it looks like the extraction has worked perfectly. Let’s check the image dimensions.

Code
from PIL import Image

{(image.width, image.height) for image in [Image.open(path) for path in IMAGES_FOLDER.glob("*.png")]}
{(288, 287), (288, 288), (288, 294), (384, 384)}

These are probably close enough? I think that the bigger problem is the variance in zoom level.


FastAI Databunch

I want to create a databunch out of these images. I can tell from the path what the label is so it might well be possible to just create the databunch with one of the builder methods.

The next thing is to create a labelled df from this.

Code
from fastai.vision.all import (
    ImageDataLoaders,
    aug_transforms,
    Resize,
)

def image_label(path: str) -> int:
    return "cancer" if "cancer" in path else "normal"

dls = ImageDataLoaders.from_name_func(
    path=IMAGES_FOLDER,
    fnames=sorted(IMAGES_FOLDER.glob("*.png")),
    label_func=image_label,
    item_tfms=[Resize(224)], # 224 for resnet
    batch_tfms=aug_transforms(
        mult=1.0,
        do_flip=True,
        flip_vert=True,
        max_rotate=10.0,
        min_zoom=1.0,
        max_zoom=1.1,
        max_lighting=0.2,
        max_warp=0., # all images are fixed perspective
        p_affine=0.75,
        p_lighting=0.75,
        xtra_tfms=[Resize(224)],
        size=None,
        mode='bilinear',
        pad_mode='reflection',
        align_corners=True,
        batch=False,
        min_scale=1.0,
    ),
)
Code
dls.show_batch()


FastAI Training

Let’s give it a go.

Code
from fastai.vision.all import (
    cnn_learner,
    error_rate,
    accuracy,
    resnet34
)

learn = cnn_learner(dls, resnet34, metrics=[accuracy, error_rate])
Code
learn.lr_find()
/home/matthew/.cache/pypoetry/virtualenvs/blog-HrtMnrOS-py3.9/lib/python3.9/site-packages/fastai/callback/schedule.py:269: UserWarning: color is redundantly defined by the 'color' keyword argument and the fmt string "ro" (-> color='r'). The keyword argument will take precedence.
  ax.plot(val, idx, 'ro', label=nm, c=color)
SuggestedLRs(valley=0.0008317637839354575)

Code
learn.fine_tune(epochs=5, freeze_epochs=2, base_lr=0.0008)
epoch train_loss valid_loss accuracy error_rate time
0 1.084662 0.609927 0.771930 0.228070 00:02
1 1.015930 0.627861 0.736842 0.263158 00:02
epoch train_loss valid_loss accuracy error_rate time
0 0.709828 0.471480 0.824561 0.175439 00:02
1 0.605629 0.371742 0.912281 0.087719 00:02
2 0.573627 0.277116 0.912281 0.087719 00:02
3 0.500686 0.233057 0.912281 0.087719 00:02
4 0.443576 0.211415 0.929825 0.070175 00:02

So this claims 92% accurcy. I could recreate the original data split to test if the data split was mean on the pigeons. Either way this seems pretty good?

Code
learn.show_results()

I would never ship this though. The 8% error rate given the uneven distribution of classes (far more people are fine) means that most cancer predictions would be false positives.


Experiment Replication

So the original experiment has a bunch of refinement that means this original “replication” is rather poor. The changes in magnification level do not mean that there were associated changes in the source material - so image 1 at 4x resolution is the same as image 1 at 10x resolution.

You can clearly see that this is the same image at different levels of magnification. Since the dataset is split into A and B I can establish that there is no replication of the test set in the training set if I just use one side to train and the other to validate.

Code
from fastai.vision.all import (
    ImageDataLoaders,
    aug_transforms,
    Resize,
)
import torch

def image_label(path: str) -> int:
    return "cancer" if "cancer" in path else "normal"

item_transforms = [Resize(224)] # 224 for resnet
batch_transforms = aug_transforms(
    mult=1.0,
    do_flip=True,
    flip_vert=True,
    max_rotate=10.0,
    min_zoom=1.0,
    max_zoom=1.1,
    max_lighting=0.2,
    max_warp=0., # all images are fixed perspective
    p_affine=0.75,
    p_lighting=0.75,
    xtra_tfms=[Resize(224)],
    size=None,
    mode='bilinear',
    pad_mode='reflection',
    align_corners=True,
    batch=False,
    min_scale=1.0,
)

train_ds = ImageDataLoaders.from_name_func(
    path=IMAGES_FOLDER,
    fnames=sorted(IMAGES_FOLDER.glob("*-a-*.png")),
    label_func=image_label,
    item_tfms=item_transforms,
    batch_tfms=batch_transforms,
    valid_pct=0.,
).train
valid_ds = ImageDataLoaders.from_name_func(
    path=IMAGES_FOLDER,
    fnames=sorted(IMAGES_FOLDER.glob("*-b-*.png")),
    label_func=image_label,
    item_tfms=item_transforms,
    batch_tfms=batch_transforms,
    valid_pct=0.,
).train

dls = ImageDataLoaders(
    train_ds,
    valid_ds,
    path=IMAGES_FOLDER,
    device=torch.device("cuda")
)
Code
dls.train.n, dls.valid.n
(144, 144)
Code
dls.train.show_batch()

Code
dls.valid.show_batch()

Code
from fastai.vision.all import (
    cnn_learner,
    error_rate,
    accuracy,
    resnet34
)

learn = cnn_learner(dls, resnet34, metrics=[accuracy, error_rate])
learn.lr_find()
SuggestedLRs(valley=0.0012022644514217973)

Code
learn.fine_tune(epochs=10, freeze_epochs=5, base_lr=0.0012)
epoch train_loss valid_loss accuracy error_rate time
0 1.210299 0.611843 0.648438 0.351562 00:02
1 1.056809 0.486527 0.796875 0.203125 00:02
2 0.943421 0.422140 0.835938 0.164062 00:02
3 0.842579 0.305688 0.875000 0.125000 00:02
4 0.759867 0.342617 0.867188 0.132812 00:02
epoch train_loss valid_loss accuracy error_rate time
0 0.318117 0.338621 0.882812 0.117188 00:02
1 0.263593 0.223039 0.914062 0.085938 00:02
2 0.273602 0.352680 0.882812 0.117188 00:02
3 0.235429 0.256570 0.890625 0.109375 00:02
4 0.213778 0.328133 0.890625 0.109375 00:02
5 0.190749 0.325845 0.914062 0.085938 00:02
6 0.173745 0.304393 0.921875 0.078125 00:02
7 0.165102 0.269543 0.921875 0.078125 00:02
8 0.150334 0.298939 0.921875 0.078125 00:02
9 0.137877 0.309571 0.914062 0.085938 00:02
Code
learn.show_results()

This gets to the same accuracy even with the good split approach. I had to train it a bit longer though. I think that the original training run could’ve gone for more epochs to get a higher peak performance.