从html导入md，图片链接有多余文字

Question

我正在使用带有 pandoc 的脚本将 html 文件和资源转换为 Markdown。我的脚本“有效”，但我有一个小问题：当我查看 Obsidian 中的 Markdown 注释时，它在资产文件夹中图像文件的链接中有多余的文本：


[![](./assets/3sIcgyHnp7HfaxQy.jpeg)](./assets/3sIcgyHnp7HfaxQy.jpeg)

正确的文字应该是：


![](./assets/3sIcgyHnp7HfaxQy.jpeg)

因此 Pandoc 似乎将 html 结构解释为链接中的链接，导致冗余输出。

我相信我的 html 文件中的代码使用带有 href 属性的标准标签来链接图像。，例如：


    <img class="img-hide" src="./assets/3sIcgyHnp7HfaxQy.jpeg">
</a>

任何人都可以帮我解决这个问题以消除冗余吗？

我的完整脚本，包括 pandoc 行，如下：


# Converts html files into md files (in same location as html files)

success=true

# Use the current working directory as the root directory
root_dir="."

# Recursively traverse the directory structure, suppressing output
find "$root_dir" -name '*.html' -type f -exec sh -c 'pandoc "$1" -t gfm-raw_html --wrap=none -o "${1%.html}.md" >/dev/null 2>&1 || success=false' _ {} \;

if $success; then
    echo "Successfully ran the script."
else
    echo "Some errors occurred during the process."
fi

Answer 1

输入 HTML 包含图像（

<a>

标签）周围的链接（

<img>

标签）。您可以删除 HTML 中的链接，也可以编写一个 pandoc lua 过滤器将其自动删除。

Answer 2

谢谢你。

我编写了一个脚本来删除图像链接周围的标签。它完美地做到了这一点。现在的问题是它正在删除所有链接，包括所有其他超链接。

我该如何解决这个问题？

我的脚本：

#!/bin/bash

# Use the current working directory as the root directory
root_dir="."

# Find and process HTML files
find "$root_dir" -name '*.html' -type f -print0 | while IFS= read -r -d $'\0' html_file; do
    # Use sed to remove <a> tags
    sed -i.bak 's/<a[^>]*>//g; s/<\/a>//g' "$html_file"
done

echo "Successfully removed <a> tags from HTML files."

Answer 3

已解决。

已修复。这是我修复图像链接的更新脚本。

希望对其他人有帮助。享受吧！

#!/bin/bash

# Remove <a> tags from <img>

# Find and edit HTML files in the current directory and subdirectories
find . -name '*.html' -print0 | while IFS= read -r -d '' file; do
    sed -i '' -e 's/<a href=".\/assets[^>]*><img/<img/g' "$file"
    sed -i '' -e 's/><\/a>//g' "$file"
done

echo "HTML files have been modified to remove image hyperlinks."

从html导入md，图片链接有多余文字

问题描述投票：0回答：3

3个回答

最新问题

从html导入md，图片链接有多余文字

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3