有没有办法交换字节来读取二进制 DEC 格式?

问题描述 投票:0回答:1

我有以所谓的“DEC”格式编写的旧二进制文件。 为了从此格式获得 4 字节浮点数的正确值,我可以执行以下操作:

  1. 读取字节
  2. 交换最后两个字节和前两个字节(swap word 1 and word 2)
  3. 使用 readBin() 将字节转换为数字
  4. 将这个值除以 4

我认为 readBin() 中会有一个字节序选项 [c('little', 'big', 'swap')] 可以解决这个问题,但事实并非如此。这是一个示例和一些代码,显示了当前的解决方法。

# Start with actual value from sample file:
# 4 bytes representing target value of 1.290
# in practice dec_bytes is read in by readBin(con, raw(), n=4)
dec_bytes <- writeBin(1.290, raw(), size=4)
# Now rearrange bytes swapping words
pc_bytes <- c(dec_bytes[3], dec_bytes[4], dec_bytes[1], dec_bytes[2])
# Now use readBin to give numeric value of bytes
pc_float <- readBin(pc_bytes, numeric(), n=1, size=4)
pc_float 
# [1] 0.5161456
# Now divide by 4 to get the correct answer
pc_float <- pc_float / 4
pc_float 
#[1] 0.1290364

我显然可以创建一个函数来执行上面列出的操作,但实际问题是:是否有更简单有效的方法来执行此操作?在大约 30 年前我编写或发现的一些 C 代码中,我使用了以下函数,我只能假设它确实有效:

float ConvertDecToFloat(char bytes[4])
{
    char p[4];
    p[0] = bytes[2];
    p[1] = bytes[3];
    p[2] = bytes[0];
    p[3] = bytes[1];
    if (p[0] || p[1] || p[2] || p[3])
        --p[3];          // adjust exponent

    return *(float*)p;
}

因此 --p[3] 在重新排列后从最后一个字节中减去 1,从而得到正确的答案而不必除以 4。不确定这是否可以在 R 中完成而不转换为整数并返回字节。

r binaryfiles
1个回答
0
投票

由同事回答(感谢 Michael Schwartz)。简单的向量化解决方案是创建一个用于重组字节向量值的索引向量。我有两个可行的解决方案:

# Test on a vector with 24 bytes, convert to 6 doubles of 4 bytes each
values <- c(1, 12, 123, 1234, 12345, 123456)
pc_bytes0 <- writeBin(values, raw(), size = 4)

# Need to shuffle the byte order to reproduce DEC order
# using same procedure we will use to unshuffle

# Swapping needed to convert from PC to DEC byte order
# DEC byte 1 -> 3, 2 -> 4, 3 >- 1, 4 -> 2
byte_adjust <- rep(c(2, 2, -2, -2), 6) 
# Original index order
pc_byte_index <- seq(1:24) # original byte order
# New index order for DEC data storage, add adjustment vector
dec_byte_index <- pc_byte_index + byte_adjust
# Now reshuffle the original data using the index to get the DEC order
dec_bytes <- pc_data[dec_byte_index]
# This what readBin(raw()) will return from DEC file, 
# so actual process starts here.
# Note: To get the true DEC byte array we would have to subtract 01 
# from the 2nd byte in each 4 byte sequence

# Approach 1, make a long vector of original byte order and another of offsets
# and add together
# Data is in DEC sequence, so make vector of original order
dec_byte_index <- seq(1:24) # original byte index order
# These are the index offsets needed
byte_adjust <- rep(c(2, 2, -2, -2), 6)
# Offset original order by adding 
pc_byte_index <- dec_byte_index + byte_adjust
# Apply PC byte order to data
pc_bytes <- dec_bytes[pc_byte_index]
# Now the data can by read in the correct order and correction applied
pc_float <- readBin(pc_bytes, double(), n=6, size=4)
pc_float 
#> pc_float 
#[1]      1     12    123   1234  12345 123456

# Approach 2, use single index, reshape to matrix and apply 
# index representing desired order of 4 original bytes
byte_index <- c(3, 4, 1, 2)
# Convert data to matrix 
dec_byte_matrix <- matrix(dec_bytes, nrow=4, ncol=6)
# Use indicies to swap
pc_bytes <- dec_byte_matrix[index, ]
# Now compute floats
pc_float <- readBin(pc_bytes, double(), n=6, size=4)
#> pc_float 
#[1]      1     12    123   1234  12345 123456

我用 microbench 进行了测试,两者之间的处理时间没有明显差异。请注意,对于原始 DEC 数据,pc_float 需要除以 4 才能得到正确答案,除非进行了字节调整。

© www.soinside.com 2019 - 2024. All rights reserved.