我有一组特定图像，我需要从中识别手写数字。问题是它们非常扭曲且嘈杂

Question

现在，从这张图片中可以看出，数字有时会被删除，有铅笔记号，其中一些甚至用铅笔/绿笔圈起来。首先，定位图像非常困难，所以我只是测量了边界之间的距离并将其硬编码到我的程序中以裁剪到特定坐标。但问题是，矩形不一致，即长度不同。另一个巨大的障碍是这些 RECTANGLES 通常是 6028，这表明我们不能使用任何在 mnist 数据上训练的模型（因为它是在 2828 上训练的）。尽管如此，我使用了 sklearn/其他在 MNIST 数据集上训练过的预训练计算模型，准确率非常差，只有 20% 左右。

现在，请告诉我该怎么做，我最初的想法是创建自己的数据集并从头开始创建模型。如果我的想法是正确的，任何人都可以指导我如何创建自定义数据集以及如何构建模型。我对深度学习的工作原理有很好的了解，（我从 Michael Neilsen 的 https://neuralnetworksanddeeplearning.com 学习了 DL 作品）。

所以，我希望有人可以帮助我并引导我走向正确的方向..

我尝试过为OCR检测设计的tessract和其他经过训练的Tensorflow模型，但几乎所有的结果都非常不令人满意。

Answer 1

我认为首先要考虑制作一个训练集，其中该集中的每个样本都是您已标记的作物。当您拥有训练集时，您可以训练一个模型来学习预测图像裁剪的正确标签。

创建训练集的方法有多种。您可以使用 ML 算法来帮助加快标记过程。一种方法是首先按颜色对原始图像进行聚类，然后使用聚类过滤掉非数字中的数字：

上面的像素对应于数字的位置。尽管精确的轮廓不清楚，但位置很好。然后，您可以通过将相关像素分组在一起的方式对该图像进行聚类：

我们可以对彩色簇内的坐标进行平均，以获得每个簇的近似中心。有了这些信息，您就可以从原始图像中进行裁剪，其中每个裁剪的中心现在应该与数字位置对齐。这些是每个中心两侧 15 像素的裁剪：

现在是手动部分...您为要用于训练的每种作物分配一个标签。例如，有很多数字看起来像“2”——它们应该被标记为 2。您最终会得到一个训练集，其中有农作物及其相应的标签。不过，您需要更多样品。

此数据可用于训练逻辑回归模型，该模型将提供可与其他分类模型进行比较的基线。

from PIL import Image
from matplotlib import pyplot as plt
import numpy as np

import skimage as ski

#
#Load data
#
image_pil = Image.open('image.jpg')
image = np.asarray(image_pil)

#
# Find where the digits are
#
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

#Normalise the 3D pixel data
samples = image.reshape(-1, 3)
scaler = StandardScaler().fit(samples)
samples = scaler.transform(samples)

#Identify the 4 main colours
n_clusters = 4
kmeans = KMeans(n_clusters=n_clusters, n_init='auto', random_state=np.random.RandomState(0)).fit(samples)

#Show the label assigned to each pixel
samples_as_labels = kmeans.labels_
clustered_im = samples_as_labels.reshape(image.shape[:2])

#We find that cluster=2 is the one that captured the digit colours best
cmap_k = plt.get_cmap('jet', n_clusters)
plt.imshow(np.ma.masked_where(np.isin(clustered_im, [0, 1, 3]), clustered_im), cmap=cmap_k)
plt.imshow(clustered_im[20:60, 90:210], cmap=cmap_k)

#Read out the indices of the pixels belonging to digits
digit_idxs = np.argwhere(samples_as_labels == 2)
digit_coords = np.argwhere(clustered_im == 2)

#Get the average coordinate of each cloud of points
from sklearn.cluster import HDBSCAN
hdb = HDBSCAN().fit(digit_coords)

filt = hdb.labels_ > -1 #ignore outliers
plt.scatter(digit_coords[filt, 1], digit_coords[filt, 0], c=hdb.labels_[filt], cmap='jet', marker='.', s=10)
plt.gca().invert_yaxis()

avg_coords = []
for lab in np.unique(hdb.labels_):
    coords = digit_coords[hdb.labels_ == lab]
    avg_coords.append(coords.mean(axis=0).round().astype(int))

f, axs = plt.subplots(5, 9)
axs = axs.ravel()
for i in range(len(avg_coords)):
    y, x = avg_coords[i]
    axs[i].imshow(
        image[y - 15:y + 15, x - 15:x + 15, ...],
        cmap='binary'
    )
    axs[i].set_xticks([])
    axs[i].set_yticks([])

我有一组特定图像，我需要从中识别手写数字。问题是它们非常扭曲且嘈杂

问题描述投票：0回答：1

1个回答

最新问题

我有一组特定图像，我需要从中识别手写数字。问题是它们非常扭曲且嘈杂

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1