ValueError:无法将输入数组从形状 (3024,3024,3) 广播到形状 (3024,3024)

问题描述 投票:0回答:1

我有这个运行良好的代码:

import cv2
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from matplotlib import rcParams
import numpy as np
import os
from PIL import Image
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor


# Set directories for generation images and edit images
base_image_dir = os.path.join("IMG_4297.png")
mask_dir = os.path.join("masks")
edit_image_dir = os.path.join("03_edits")

# Point to your downloaded SAM model
sam_model_filepath = "../segment-anything/segment_anything/sam_vit_h_4b8939.pth"
#sam_model_filepath = "./sam_vit_h_4b8939.pth"

# Initiate SAM model
sam = sam_model_registry["default"](checkpoint=sam_model_filepath)

# Function to display mask using matplotlib
def show_mask(mask, ax):
    color = np.array([30 / 255, 144 / 255, 255 / 255, 0.6])
    h, w = mask.shape[-2:]
    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    ax.imshow(mask_image)


# Function to display where we've "clicked"
def show_points(coords, labels, ax, marker_size=375):
    pos_points = coords[labels == 1]
    neg_points = coords[labels == 0]
    ax.scatter(
        pos_points[:, 0],
        pos_points[:, 1],
        color="green",
        marker="*",
        s=marker_size,
        edgecolor="white",
        linewidth=1.25,
    )
    ax.scatter(
        neg_points[:, 0],
        neg_points[:, 1],
        color="red",
        marker="*",
        s=marker_size,
        edgecolor="white",
        linewidth=1.25,
    )


# Load chosen image using opencv
image = cv2.imread("./IMG_4297.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Display our chosen image
plt.figure(figsize=(10, 10))
plt.imshow(image)
plt.axis("on")
plt.show()

# Set the pixel coordinates for our "click" to assign masks
input_point = np.array([[525, 325]])
input_label = np.array([1])

# Display the point we've clicked on
plt.figure(figsize=(10, 10))
plt.imshow(image)
show_points(input_point, input_label, plt.gca())
plt.axis("on")
plt.show()

# Initiate predictor with Segment Anything model
predictor = SamPredictor(sam)
predictor.set_image(image)

# Use the predictor to gather masks for the point we clicked
masks, scores, logits = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    multimask_output=True,
)

# Check the shape - should be three masks of the same dimensions as our image
masks.shape

# Display the possible masks we can select along with their confidence
for i, (mask, score) in enumerate(zip(masks, scores)):
    plt.figure(figsize=(10, 10))
    plt.imshow(image)
    show_mask(mask, plt.gca())
    show_points(input_point, input_label, plt.gca())
    plt.title(f"Mask {i+1}, Score: {score:.3f}", fontsize=18)
    plt.axis("off")
    plt.show()

# Choose which mask you'd like to use
chosen_mask = masks[1]

# We'll now reverse the mask so that it is clear and everything else is white
chosen_mask = chosen_mask.astype("uint8")
chosen_mask[chosen_mask != 0] = 255
chosen_mask[chosen_mask == 0] = 1
chosen_mask[chosen_mask == 255] = 0
chosen_mask[chosen_mask == 1] = 255

# create a base blank mask
width = 1512
height = 1512
mask = Image.new("RGBA", (width, height), (0, 0, 0, 1))  # create an opaque image mask

# Convert mask back to pixels to add our mask replacing the third dimension
pix = np.array(mask)
pix[:, :, 3] = chosen_mask

# Convert pixels back to an RGBA image and display
new_mask = Image.fromarray(pix, "RGBA")
new_mask

# We'll save this mask for re-use for our edit
new_mask.save(os.path.join(mask_dir, "new_mask.png"))

但我正在尝试使用稍微不同的程序/AI语言模型的后半部分:

import numpy as np
from lang_sam.utils import draw_image
from PIL import Image
from lang_sam import LangSAM
from heic2png import HEIC2PNG

if __name__ == '__main__':
    heic_img = HEIC2PNG('/Users/Downloads/IMG_4316.heic', quality=70)  # Specify the quality of the converted image
    heic_img.save()  # The converted image will be saved as `test.png`

model = LangSAM()
image_pil = Image.open("/Users/Downloads/IMG_4316.png").convert("RGB")
text_prompt = "wall"
masks, boxes, phrases, logits = model.predict(image_pil, text_prompt)

masks.shape

labels = [f"{phrase} {logit:.2f}" for phrase, logit in zip(phrases, logits)]
image_array = np.asarray(image_pil)
image = draw_image(image_array, masks, boxes, labels)
image = Image.fromarray(np.uint8(image)).convert("RGB")
image.show()

chosen_mask = np.array(image).astype("uint8")
chosen_mask[chosen_mask != 0] = 255
chosen_mask[chosen_mask == 0] = 1
chosen_mask[chosen_mask == 255] = 0
chosen_mask[chosen_mask == 1] = 255

# create a base blank mask
width = 3024    
height = 3024
mask = Image.new("RGBA", (width, height), (0, 0, 0, 1))  # create an opaque image mask

# Convert mask back to pixels to add our mask replacing the third dimension
pix = np.array(mask)
pix[:, :, 3] = chosen_mask

# Convert pixels back to an RGBA image and display
new_mask = Image.fromarray(pix, "RGBA")
new_mask.show()
new_mask.save()

我相信问题出在这一行转换后的图像的格式上:

pix[:, :, 3] = chosen_mask

是否需要对

chosen_mask
执行转换或某些操作才能使图像在这里工作?

完整的错误是:

> Traceback (most recent call last):
  File "/Users/Desktop/code/lang-segment-anything/app.py", line 112, in <module>
    pix[:, :, 2] = chosen_mask
    ~~~^^^^^^^^^
ValueError: could not broadcast input array from shape (3024,3024,3) into shape (3024,3024)
    ~~~^^^^^^^^^
python numpy python-imaging-library
1个回答
0
投票

当你这样做时:

width = 3024    
height = 3024
mask = Image.new("RGBA", (width, height), (0, 0, 0, 1))  # create an opaque image mask

# Convert mask back to pixels to add our mask replacing the third dimension
pix = np.array(mask)

您正在创建一个具有 4 个通道(即 RGBA)的 3024x3024 图像,因此您的 Numpy 数组

pix
将具有 [3024, 3024, 4] 的形状。


当你这样做时:

image = Image.fromarray(np.uint8(image)).convert("RGB")
chosen_mask = np.array(image).astype("uint8")

您制作了一个具有 3 个通道(即 RGB)的 RGB 图像,因此您的 Numpy 数组

chosen_mask
将具有 [3024, 3024, 3] 的形状。


所以,问题是当你这样做时:

pix[:, :, 3] = chosen_mask

你是说你想将

pix
中每个像素位置的 Alpha 通道设置为
chosen_mask
中该位置的 3 个 RGB 通道,但这是行不通的...你不能将 R 和 G 以及B 通道从
chosen_mask
进入 Alpha 通道,因为每个位置的 Alpha 通道中只有一个空间。

因此,您需要通过在

chosen_mask
模式下创建
L
来制作单通道图像:

image = Image.fromarray(np.uint8(image)).convert("L")
chosen_mask = np.array(image).astype("uint8")

或者,您需要从

chosen_mask
中选择要放入
pix
的 A 通道中的 RGB 通道,例如只需将
chosen_mask
的绿色通道放入
pix
的 A 通道即可:

pix[:, :, 3] = chosen_mask[..., 1]
© www.soinside.com 2019 - 2024. All rights reserved.