在上传 IMG 文件时,我正在考虑通过 Pascal VOC 文件上传相应的边界框数据。我如何在 Vertex AI 上做到这一点,或者我需要对 Pascal VOC 数据进行任何转换吗?
您可以使用适用于 Vertex-AI 的 AutoML Vision 进行对象检测。 为 AutoML Vision 中的对象检测准备训练数据集需要数据采用 .csv 格式。 您可以参考此链接将您的 Pascal VOC 文件转换为 Cloud AutoML Vision csv。
准备和格式化数据以进行对象检测:
有关如何在 AutoML Vision 上构建对象检测模型的更多详细信息,请遵循此快速入门。
您可以参考此链接将您的 Pascal VOC 文件转换为云端 AutoML Vision csv。
这需要升级 roboflow.com 的会员资格。以下代码可在您的计算机上本地实现目的!
注意:它要求您将图像和注释(xml 文件)存储在 GCS 存储桶中。
from google.cloud import storage
import os
import csv
import xml.etree.ElementTree as ET
os.environ['GOOGLE_APPLICATION_CREDENTIALS']= 'path-to-credentials-json-file'
bucket_name = 'your-bucket-name'
client = storage.Client()
bucket = client.get_bucket(bucket_name)
blobs = client.list_blobs(bucket_name,prefix=None,delimiter='/')
#download all xmls files to local directory or one can paste all the xmls in current working directory and comment downloading part
print('Downloading xml files...')
xml_files = [blob.name for blob in blobs if blob.name.endswith('.xml')]
[bucket.blob(xml).download_to_filename(xml) for xml in xml_files]
allowed_extensions = ['jpg','png','jpeg','gif','bmp','ico']
csv_file_name = 'vertex_ai_annos.csv'
csvfile = open(csv_file_name,'w', newline='')
csvwriter = csv.writer(csvfile, escapechar=' ', quoting=csv.QUOTE_NONE)
blobs = client.list_blobs(bucket_name,prefix=None,delimiter='/') #reinitialize list_blobs iterator
for blob in blobs:
ext = blob.name.split('.')[-1]
if ext in allowed_extensions:
tree = ET.parse(f"{blob.name.split('.')[0]}.xml")
width = int(tree.find('size').find('width').text)
height = int(tree.find('size').find('height').text)
objs = tree.findall('object')
for obj in objs:
label = obj.find('name').text
xmin = int(obj.find('bndbox').find('xmin').text)/width
ymin = int(obj.find('bndbox').find('ymin').text)/height
xmax = int(obj.find('bndbox').find('xmax').text)/width
ymax = int(obj.find('bndbox').find('ymax').text)/height
data = f"gs://{bucket_name}/{blob.name},{label},{xmin},{ymin},,,{xmax},{ymax},,"
# print(data)
csvwriter.writerow([f'gs://{bucket_name}/{blob.name},{label},{xmin},{ymin},,,{xmax},{ymax},,'])
print(f'{csv_file_name} has been created!')
import os
import csv
import xml.etree.ElementTree as ET
# Local directory where your XML files are located
local_xml_directory = 'pathtoxmlfiles'
# Google Cloud Storage bucket name
bucket_name = 'googlebucketname'
# The path to the folder in the bucket where the images are stored
bucket_image_folder = 'pathwherefilesstoreinbucket'
xml_files = [f for f in os.listdir(local_xml_directory) if f.endswith('.xml')]
allowed_extensions = ['jpg', 'png', 'jpeg', 'gif', 'bmp', 'ico']
csv_file_name = 'vertex_ai_annos.csv'
csvfile = open(csv_file_name, 'w', newline='')
csvwriter = csv.writer(csvfile, escapechar=' ', quoting=csv.QUOTE_NONE)
for xml_file in xml_files:
xml_file_path = os.path.join(local_xml_directory, xml_file)
tree = ET.parse(xml_file_path)
root = tree.getroot()
image_filename = root.find('filename').text
image_gcs_url = f'gs://{bucket_name}/{bucket_image_folder}{image_filename}'
size = root.find('size')
width = int(size.find('width').text)
height = int(size.find('height').text)
objects = root.findall('object')
for obj in objects:
label = obj.find('name').text
bndbox = obj.find('bndbox')
xmin = int(bndbox.find('xmin').text) / width
ymin = int(bndbox.find('ymin').text) / height
xmax = int(bndbox.find('xmax').text) / width
ymax = int(bndbox.find('ymax').text) / height
csvwriter.writerow([image_gcs_url, label, xmin, ymin, xmax, ymin, xmax, ymax, xmin, ymax])
print(f'{csv_file_name} has been created!')
此代码用于生成本地 CSV 文件,您可以将所有 Pascal VOC XML 文件直接上传到云顶点