如何为视网膜训练准备我的图像和注释？

Question

我按照这个tutorial在coco数据集上训练对象检测模型。本教程包含下载和使用coco dataset及其注释并将其转换为TFRecord的步骤。

我需要使用自己的自定义数据进行训练，我使用labelimg工具进行注释，该工具生成包含（w，h，xmin，ymin，xmax，ymax）图像的xml文件。

但是coco数据集具有JSON格式，带有用于创建TFRecord的图像分割字段。

是否需要对培训resnet，retinanet进行细分？

那么，任何人都可以指导我从没有分段值的XML注释创建JSON注释的过程吗？

XML：

<annotation>
    <folder>frames</folder>
    <filename>83.jpg</filename>
    <path>/home/tdadmin/Downloads/large/f/frames/83.jpg</path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>640</width>
        <height>480</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>person</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>246</xmin>
            <ymin>48</ymin>
            <xmax>350</xmax>
            <ymax>165</ymax>
        </bndbox>
    </object>
</annotation>

Answer 1

你现在正在做的事情类似于我以前做过的一个项目。所以我有一些建议给你。

当我训练我的Mask RCNN模型时，我使用了VGG Image Annotator（you can easily find that on Google）。通过使用该工具，可以轻松创建json注释文件。然后将其插入训练中。

希望能帮到你。如果您仍有疑问，请随时对此发表评论。

罗文

Answer 2

注释格式实际上并不重要。我以前用txt文件创建了tfrecord。要创建自定义tfrecord，您必须像create_custom_tf_record.py中显示的那样编写自己的this folder。

但由于您使用的是类似注释的coco，因此您可以使用create_coco_tf_record.py文件。你需要自己实现的重要事情是annotations_list。 annotations_list只是一个字典，所以你的目标是将你的xml文件解析成包含键值对的字典，然后将正确的值传递给feature_dict，然后从tf.train.Example构造feature_dict。一旦你有了tf.train.Example created，就可以轻松创建tfrecord。

因此，对于您的确切示例，首先解析xml文件。

import xml.etree.ElementTree as ET
tree = ET.parse('annotations.xml')

然后从annotaions_list构建tree，如下所示：

annotations_list = {}
it = tree.iter()
for key in it:
    annotations_list[str(key.tag)] = key.text

然后你可以从feature_dict创建annotations_list

feature_dict = {
  'image/height':
      dataset_util.int64_feature(annotatios_list['height']),
  'image/width':
      dataset_util.int64_feature(...),
  'image/filename':
      dataset_util.bytes_feature(...),
  'image/source_id':
      dataset_util.bytes_feature(...),
  'image/key/sha256':
      dataset_util.bytes_feature(...),
  'image/encoded':
      dataset_util.bytes_feature(...),
  'image/format':
      dataset_util.bytes_feature(...),
  'image/object/bbox/xmin':
      dataset_util.float_list_feature(...),
  'image/object/bbox/xmax':
      dataset_util.float_list_feature(...),
  'image/object/bbox/ymin':
      dataset_util.float_list_feature(...),
  'image/object/bbox/ymax':
      dataset_util.float_list_feature(...),
  'image/object/class/text':
      dataset_util.bytes_list_feature(....),
  'image/object/is_crowd':
      dataset_util.int64_list_feature(...),
  'image/object/area':
      dataset_util.float_list_feature(...),
  }

只需要确保feature_dict提交的文件对应于annotations_list和label_map中的正确字段。

您可能想知道为什么feature_dict中的这些字段是必要的，根据官方文档using your own dataset，以下文件是必要的，其他是可选的。

'image/height': dataset_util.int64_feature(height),
  'image/width': dataset_util.int64_feature(width),
  'image/filename': dataset_util.bytes_feature(filename),
  'image/source_id': dataset_util.bytes_feature(filename),
  'image/encoded': dataset_util.bytes_feature(encoded_image_data),
  'image/format': dataset_util.bytes_feature(image_format),
  'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
  'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
  'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
  'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
  'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
  'image/object/class/label': dataset_util.int64_list_feature(classes),

如何为视网膜训练准备我的图像和注释？

问题描述投票：0回答：2

2个回答

最新问题

如何为视网膜训练准备我的图像和注释？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2