如何在 Elixir 中返回空字节两侧的 3 个字符？

Question

如果我有一个字符串，例如

hello this isa<<0>>string.

，如何返回空字节两侧的三个字符，包括空字节，例如

isa<<0>>str

？

我正在尝试类似的事情：

~r/(?<=.{0,2})(.{3}).*?<<0>>(.{3})(?=.{0,2})/

Answer 1

在 Elixir 中，不需要正则表达式来做到这一点。递归会工作得更好更快（并且更具可读性）。

defmodule NullByte do
  def get_3_around(""),
    do: ""
  def get_3_around(<<pre::binary-size(3), 0, post::binary-size(3), _::binary>>),
    do: pre <> <<0>> <> post
  def get_3_around(<<pre::binary-size(3), 0, post::binary>>),
    do: pre <> <<0>> <> post
  def get_3_around(<<_::binary-size(1), rest::binary>>),
    do: get_3_around(rest)

  def test, do: get_3_around("hello this is a " <> <<0>> <> "string")
end

Answer 2

Elixir 中的字符串采用 UTF-8 编码，这意味着单个字符可能比一个字节长，因此最好设计函数来处理 UTF-8 字符。三个字符的长度最多可达 12 个字节，因此您可以使用

var_name::utf8

匹配单个 UTF-8 字符，而不是假设每个字符都是 1 个字节长。不幸的是，您无法在二进制文件中使用

utf8

类型指定大小，因此您无法通过简单地编写 var_name::utf8-size(3) 来匹配多个 UTF-8 字符，而必须显式写出三个不同的“片段”（这完全是一件令人头疼的事情，而且是语言上的疏忽，应该予以纠正），例如：

<<char1::utf8, char2::utf8, char3::utf8, ....>

接下来，空字节是非打印字符，elixir 不会将空字节打印为

<<0>>

。但是，您可以显式打印字符串“

>”，例如

<<0>

IO.iex(7)> IO.puts "<<0>>"
<<0>>

但是，您应该注意“

>”的长度是 5 个字节，而不是 1 个字节。

<<0>在以下示例中，二进制语法将查找双引号之间每个字符的 UTF-8 整数字符代码：

iex(17)> str = <<"123"::utf8, 0::utf8, "456"::utf8>> <<49, 50, 51, 0, 52, 53, 54>> iex(13)> IO.puts str 123^@456 <--shell uses "carrot notation" to display non printing chars :ok iex(14)> IO.inspect str <<49, 50, 51, 0, 52, 53, 54>> <<49, 50, 51, 0, 52, 53, 54>>

如果字符串/二进制包含非打印字符，则 Elixir 不会以双引号格式输出字符串：

iex(2)> IO.inspect <<97,98>> "ab" "ab" iex(3)> IO.inspect <<97, 0, 98>> <<97, 0, 98>> <<97, 0, 98>>

以下是如何在 Elixir 中匹配 UTF-8 字符：

defmodule My do #Look for match starting at beginning of string: def grab_3_chars_either_side_of_null(<<char1::utf8, char2::utf8, char3::utf8, 0::utf8, #Tries to match a null byte char4::utf8, char5::utf8, char6::utf8, _rest::binary>> ) do <<char1::utf8, char2::utf8, char3::utf8, "<<0>>", # Your desired output, which is 5 bytes long. # Change to 0::utf8 if you only want one byte char4::utf8, char5::utf8, char6::utf8>> end #If a match isn't found at the beginning of the string above, #then drop the first UTF-8 character, `_::utf8`, and look for a match at #start of the rest of the string (the recursive function call): def grab_3_chars_either_side_of_null(<<_::utf8, rest::binary>> ) do grab_3_chars_either_side_of_null(rest) end end #If all the UTF-8 characters have been dropped off the front of the string, #then the string is empty, and no matches were found, so return the atom #`:no_match`: def grab_3_chars_either_side_of_null(<<>>), do: :no_match

我将把它作为练习，根据您的需要定义

grab_3_chars_either_side_of_null/1

的其他分支。

注：

<<char1::utf8>>
。
rest::binary
就像正则表达式中的贪婪
```
.*
```
：它将匹配 0 到无限多个字符，并且只能放在二进制文件的末尾。

String.split/3

来分割空字节，然后在每个片段上使用 String.split_at/2 来获取最后三个字符（-3）第一篇文章和第二篇文章的前三个字符 (3)。

Answer 3

defmodule NullByte do def get_3_around(<<>>), do: nil def get_3_around(<<pre::binary-3, 0, post::binary-3, _::binary>>), do: {pre, post} def get_3_around(<<_::binary-1, rest::binary>>), do: get_3_around(rest) end

用途：

iex(1)> NullByte.get_3_around("aaa" <> <<0>> <> "bbb") {"aaa", "bbb"} iex(2)> NullByte.get_3_around("aaabbb" <> <<0>> <> "cccddd") {"bbb", "ccc"} iex(3)> NullByte.get_3_around("aaa" <> <<0>> <> "b") nil iex(4)> NullByte.get_3_around("foo") nil

如何在 Elixir 中返回空字节两侧的 3 个字符？

问题描述投票：0回答：3

3个回答

最新问题

如何在 Elixir 中返回空字节两侧的 3 个字符？

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3