我目前正在使用 Amazon PersonPath22 数据集。在
dataset/personpath22/raw_data/pathtrack/pathtrack_release/train
中有一堆包含图像及其各自注释的文件夹。注释位于(相对于前面所述的路径)./SOME_VIDEO_NAME/det
位于名为 det_rcnn.txt
的文件中。我对注释数据格式感到困惑。
在 github 存储库上,他们说他们使用 gluon-cv 注释格式,但我认为那是针对不同的图像文件夹。
对象的线看起来像
1.0,-1.0,1053.8172607421875,198.0821990966797,106.62109375,275.42665100097656,0.9953873753547668,-1.0,-1.0,-1.0
。我的假设是<frame> <idk> <bbox min x> <bbox min y> <bbox width> <bbox height> <confidence> <X> <Y> <Z>
。但是当我使用这些注释参数并将它们转换为 COCO JSON 时,边界框与图像无法正确匹配。
下面是用于解析和可视化 personpath22 视频和注释的简单代码脚本
import cv2
import pandas as pd
# File paths
annotation_file = '/home/juma/Downloads/person_path_22_data/person_path_22/person_path_22-test/uid_vid_00008.mp4/gt/gt.txt'
video_file = '/media/juma/data/mot-data/tracking-dataset/dataset/personpath22/raw_data/uid_vid_00008.mp4'
# Read annotations
annotations = pd.read_csv(annotation_file, header=None)
annotations.columns = ['frame', 'id', 'x', 'y', 'w',
'h', 'confidence', 'class', 'visibility', 'misc']
print("annotations", annotations['class'].value_counts())
# Open video
cap = cv2.VideoCapture(video_file)
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('output.avi', fourcc, 20.0,
(int(cap.get(3)), int(cap.get(4))))
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
current_frame = int(cap.get(cv2.CAP_PROP_POS_FRAMES))
# Get annotations for the current frame
frame_annotations = annotations[annotations['frame'] == current_frame]
# Draw bounding boxes
for _, row in frame_annotations.iterrows():
x, y, w, h = int(row['x']), int(row['y']), int(row['w']), int(row['h'])
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.putText(frame, f"ID:{row['class']} {int(row['id'])}", (x, y - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
# Write the frame
out.write(frame)
# Optional: Display the frame
cv2.imshow('Frame', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release everything
cap.release()
out.release()
cv2.destroyAllWindows()