How to use on custom dataset? #3

sarmientoj24 · 2021-07-06T07:35:48Z

For example, I only want to use one class in OpenImages, say the car class and I would want to use the InceptionV3 embedding.

How do I use it on custom dataset?

TDeVries · 2021-07-06T12:31:46Z

It should be fairly easy to apply to custom datasets. You need to create a dataset object that contains your data and then pass it to the select_instances function. See https://github.com/uoguelph-mlrg/instance_selection_for_gans#applying-instance-selection-to-your-own-dataset or https://github.com/uoguelph-mlrg/instance_selection_for_gans/blob/master/instance_selection.py#L91. To use the InceptionV3 embedding pass 'inceptionv3' as the embedding arg.

If you only want a single class from a larger dataset (like OpenImages) you may need to create a custom dataset object that only loads images from that class. If all car images are in a single folder you could maybe use an ImageFolder dataset (https://pytorch.org/vision/stable/datasets.html#torchvision.datasets.ImageFolder).

sarmientoj24 · 2021-07-07T09:10:13Z

For the custom dataset, is it fine if I only have one class that I have already loaded from, say, a folder?

TDeVries · 2021-07-07T14:07:15Z

Yes, that should work fine.

kc-puttagunta · 2021-07-21T12:47:43Z

Yes, that should work fine.

is there a place from where I can access this subset of selected instances? I would like to observe these images and pass them on to StyleGAN2 as my training dataset.
I have an unlabelled custom dataset in a single folder and I have applied the select_instances function to it with inceptionV3 embeddings but I don't see any output or the original folder being reduced.
Thanks!

TDeVries · 2021-07-21T13:34:26Z

The select_instances function returns a Subset dataset object, so if you want to view the images it selected you could iterate through it to generate some samples sheets or even save each image separately to file, if that's what you are looking for. It works by keeping track of the indices from the original dataset that it needs for the reduced dataset, so your original folder won't be changed, and it doesn't save any new files.

If you want to save the indices that it selected you can pass the function a file path ending in .pkl (https://github.com/uoguelph-mlrg/instance_selection_for_gans/blob/master/instance_selection.py#L95), and that will save a pickle file of indices. However, that might be a bit tricky to interpret, since the ordering is with reference to the file paths in the original dataset object, which may not line up with how files are listed in your directory. You might be able to dig into the dataset attributes to find a list of file paths though. For example if your original dataset is a DatasetFolder or ImageFolder object then it should have a self.samples attribute containing file paths which lines up with the selected indices.

kc-puttagunta · 2021-08-14T13:52:09Z

The select_instances function returns a Subset dataset object, so if you want to view the images it selected you could iterate through it to generate some samples sheets or even save each image separately to file, if that's what you are looking for. It works by keeping track of the indices from the original dataset that it needs for the reduced dataset, so your original folder won't be changed, and it doesn't save any new files.

If you want to save the indices that it selected you can pass the function a file path ending in .pkl (https://github.com/uoguelph-mlrg/instance_selection_for_gans/blob/master/instance_selection.py#L95), and that will save a pickle file of indices. However, that might be a bit tricky to interpret, since the ordering is with reference to the file paths in the original dataset object, which may not line up with how files are listed in your directory. You might be able to dig into the dataset attributes to find a list of file paths though. For example if your original dataset is a DatasetFolder or ImageFolder object then it should have a self.samples attribute containing file paths which lines up with the selected indices.

thank you for a prompt response. this was very helpful.
I am applying your code to solve a generation problem in the low-data regime and it has proven somewhat useful in bringing about convergence. I have a few questions to discuss around identifying clusters within the data to more precisely retain high density data belonging to a single homogenous cluster. currently, the retention ratio seems to be somewhat arbitrary and can only be validated after the generation task.
if this piques your interest, are you available for a chat sometime? :)
thanks in advance!

kc-puttagunta · 2021-08-14T13:54:45Z

also, would it help to extracts embeddings from more recent and deeper architectures that perform better than V3 and others?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use on custom dataset? #3

How to use on custom dataset? #3

sarmientoj24 commented Jul 6, 2021

TDeVries commented Jul 6, 2021

sarmientoj24 commented Jul 7, 2021

TDeVries commented Jul 7, 2021

kc-puttagunta commented Jul 21, 2021

TDeVries commented Jul 21, 2021

kc-puttagunta commented Aug 14, 2021

kc-puttagunta commented Aug 14, 2021

How to use on custom dataset? #3

How to use on custom dataset? #3

Comments

sarmientoj24 commented Jul 6, 2021

TDeVries commented Jul 6, 2021

sarmientoj24 commented Jul 7, 2021

TDeVries commented Jul 7, 2021

kc-puttagunta commented Jul 21, 2021

TDeVries commented Jul 21, 2021

kc-puttagunta commented Aug 14, 2021

kc-puttagunta commented Aug 14, 2021