Минималистичный набор данных кокоса для сегментации экземпляра: не удается создать файл git один
/ 03 августа 2020

Я пытаюсь преобразовать набор данных для сегментации, например, только с помощью pycocotools. Исходный набор данных состоит из пар серого (слева) + наземная истина (посередине). Изображение Groundtruth дает два двоичных экземпляра (красный):

enter image description here The convertion to COCO format must generate one dictionary saved as a json file for each grey-scaled image.

As I didn't understand how to use imantics or pycococreator, I try to generate a minimalist example with one image and two instances by hand. The whole thing is available in a notebook . Здесь было выбрано 130-е изображение и словарь был составлен следующим образом:

N = 130
NUM_CATEGORIES = 2 # chrom:1, background :0
grey = data[N,:,:,0]
# dictionnary for image 130
## path to greyscaled image
dataset_root = os.path.join('.','dataset','shapes','train')
subset ='shapes_train'+'2020'
#annotations = 'annotations'
image_id = "{:04d}".format(N) #Possible bug here since
grey_file_name = os.path.join(image_id+'.png')
path_to_grey = os.path.join(dataset_root,subset, grey_file_name)

dict_to_130 = {}
dict_to_130['file_name']= path_to_grey

## grey shape
dict_to_130['height']= grey.shape[0]
dict_to_130['width']= grey.shape[1]

## the image id could be different from its index, here choose id=index=N
dict_to_130['image_id'] = N

### Prepare the dicts for annotation
#### bounding boxes : theres two instances in image 130:First instance
dict_to_130['annotations']= []
annotation_instance_01_dict = {}
Bbox_0130_01 = mask_to_bbox_corners(mask1, mode='XYXY')

print("     ", type(Bbox_0130_01), type(Bbox_0130_01[0]))

annotation_instance_01_dict['bbox'] = Bbox_0130_01
annotation_instance_01_dict['bbox_mode']=0 #XYXY
annotation_instance_01_dict['category_id'] = NUM_CATEGORIES-1

annotation_instance_01_dict['segmentation']=None # A dict is used, How to handle several instances?
mask1 = mask1 > 0
### rle_instance_1 is a dict
### <byte> type issue !!!

rle_instance_1 = encode(np.asarray(mask1, order="F"))

print("rle_instance1 ",rle_instance_1)
print("rle_instance1['counts'] is of type:",type(rle_instance_1['counts']))

print("rle_instance1 ",rle_instance_1['counts'].decode("utf-8"))

counts_byte_to_utf8 = rle_instance_1['counts'].decode("utf-8")
rle_instance_1['counts'] = counts_byte_to_utf8
annotation_instance_01_dict['segmentation'] = rle_instance_1


#### bounding boxes : theres two instances in image 130: second instance

annotation_instance_02_dict = {}
Bbox_0130_02 = mask_to_bbox_corners(mask2, mode='XYXY')
print("     ", type(Bbox_0130_02))
annotation_instance_02_dict['bbox'] = Bbox_0130_02
annotation_instance_02_dict['bbox_mode']=0 #XYXY
annotation_instance_02_dict['category_id'] = NUM_CATEGORIES-1

annotation_instance_02_dict['segmentation']=None # A dict is used, How to handle several instances?
mask2 = mask2 > 0
### rle_instance_1 is a dict
rle_instance_2 = encode(np.asarray(mask2, order="F"))

### <byte> type issue !!!
rle_instance_2['counts'] = rle_instance_2['counts'].decode("utf-8")
annotation_instance_02_dict['segmentation'] = rle_instance_2

Можно посмотреть в словаре:

print("    ", type(dict_to_130['height']), dict_to_130['height'])
print("    ", type(dict_to_130['width']), dict_to_130['width'])
print("    ", type(dict_to_130['image_id']), dict_to_130['image_id'])

print("    ",dict_to_130['annotations'][0]['segmentation']['size'],"---",type(dict_to_130['annotations'][0]['segmentation']['size'][0]))
print("    ",dict_to_130['annotations'][0]['segmentation']['counts'])
print("    ",type(dict_to_130['annotations'][0]['segmentation']['counts']))

Что дает:

dict_keys(['file_name', 'height', 'width', 'image_id', 'annotations'])
     <class 'int'> 190
     <class 'int'> 189
     <class 'int'> 130
<class 'list'>
[{'bbox': [98, 61, 131, 124], 'bbox_mode': 0, 'category_id': 1, 'segmentation': {'size': [190, 189], 'counts': 'cXb0:_57K5K6QKUOa4[1I3L4M3N2O1N2O0O2O001O001O00000O11N10O02O0O1N3J5D=I7J7E;I9HY`:'}}, {'bbox': [98, 61, 131, 124], 'bbox_mode': 0, 'category_id': 1, 'segmentation': {'size': [190, 189], 'counts': 'oU46f52^JI\\5c0I2N2N101N101O0000000O100O1000000O010O010O10O100000O11O00001O0000O02O00O02O000O100000000000O10O101O00O1000O0101O0000O10001OO11N10000O100O10O02O00O1O1000000O101N100000O02O000000O010001N010000000000000000O100000000000000O11O01OO10000000O10O1001O000000010OO10O10000000001O001N101O1N2N3M8D[JOUU4'}}]
dict_keys(['bbox', 'bbox_mode', 'category_id', 'segmentation'])
{'size': [190, 189], 'counts': 'cXb0:_57K5K6QKUOa4[1I3L4M3N2O1N2O0O2O001O001O00000O11N10O02O0O1N3J5D=I7J7E;I9HY`:'}
{'size': [190, 189], 'counts': 'cXb0:_57K5K6QKUOa4[1I3L4M3N2O1N2O0O2O001O001O00000O11N10O02O0O1N3J5D=I7J7E;I9HY`:'}
dict_keys(['size', 'counts'])
     [190, 189] --- <class 'int'>
     <class 'str'>

Словарь затем сохраняется как файл json (из записной книжки colab):

with open(os.path.join('../gdrive','My Drive','Science','Data Science','dataset','shapes','train','annotations','instances_0130_data.json'), 'w') as f:
    json.dump(dict_to_130, f)

Проблема возникает, когда я пытаюсь проверить, является ли файл json действительным набором данных coco :

#import pycocotools.coco as coco
from pycocotools.coco import COCO
dataDir= os.path.join('../gdrive','My Drive','Science','Data Science','dataset','shapes','train')

Здесь pycocotools жалуется на следующее:

loading annotations into memory...
Done (t=0.00s)
creating index...


KeyError                                  Traceback (most recent call last)

<ipython-input-19-bea8d533e4f4> in <module>()
----> 1 coco=COCO(annFile)

1 frames

/usr/local/lib/python3.6/dist-packages/pycocotools/coco.py in createIndex(self)
     95         if 'annotations' in self.dataset:
     96             for ann in self.dataset['annotations']:
---> 97                 imgToAnns[ann['image_id']].append(ann)
     98                 anns[ann['id']] = ann

KeyError: 'image_id'