FiftyOne Dataset Zoo

See how easy it is to get some images from this zoo of different datasets
Published

October 18, 2021

I want to try out using yolo + clip for object detection. To do this I need images that have bounding boxes around the objects within. Normally the open images dataset is good for this but I haven’t downloaded it recently and I’ve got less than 500Gb of download left for the month, so it would be hard to fit that in and use internet normally.

While it’s possible to download the images using aws there is also the fiftyone dataset zoo. I’m going to trigger the download for some of the images while I check out the zoo. If you want to download from aws you can find the urls here.

During that download I’m going to look at the library. One immediate problem is that the latest version of kaleido fails to install in Python 3.9. I can’t find any ticket about this so I’ve resorted to downgrading it to version 0.2.1. After doing that the library installed successfully.

Code
#hide_output
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")
dataset.persistent = True
Dataset already downloaded
Loading existing dataset 'quickstart'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use
Code
dataset
Name:        quickstart
Media type:  image
Num samples: 200
Persistent:  False
Tags:        ['validation']
Sample fields:
    id:           fiftyone.core.fields.ObjectIdField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    uniqueness:   fiftyone.core.fields.FloatField
    predictions:  fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
Code
dataset.first().filepath
'/home/matthew/fiftyone/quickstart/data/000880.jpg'

I kinda hate libraries that put things in my home folder. So that’s already a black mark against this. I want to be able to customize the download location.

Code
import fiftyone

fiftyone.config.dataset_zoo_dir = "/data/fiftyone"
fiftyone.config.dataset_zoo_dir
'/data/fiftyone'
Subprocess ['/home/matthew/.cache/pypoetry/virtualenvs/blog-HrtMnrOS-py3.9/lib/python3.9/site-packages/fiftyone/db/bin/mongod', '--dbpath', '/home/matthew/.fiftyone/var/lib/mongo', '--logpath', '/home/matthew/.fiftyone/var/lib/mongo/log/mongo.log', '--port', '0', '--nounixsocket'] exited with error -6:
{"t":{"$date":"2021-10-18T21:19:50.809Z"},"s":"I",  "c":"CONTROL",  "id":20697,   "ctx":"main","msg":"Renamed existing log file","attr":{"oldLogPath":"/home/matthew/.fiftyone/var/lib/mongo/log/mongo.log","newLogPath":"/home/matthew/.fiftyone/var/lib/mongo/log/mongo.log.2021-10-18T21-19-50"}}
Code
#hide_output
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")
dataset.persistent = True
Dataset already downloaded
Loading existing dataset 'quickstart'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use
Code
dataset.first().filepath
'/home/matthew/fiftyone/quickstart/data/000880.jpg'

I’ve opened a bug about this as it directly contradicts the documented behaviour. The configuration modification is documented here. The specific field is documented here. I have updated the field and the update has applied, yet it is ignored by the dataset.

So that’s a hard pass on this library.