我最近在
reinterpret_cast
上阅读了大量内容,因为我想确保我正确使用它并且不会意外调用未定义的行为。我觉得 cppreference 和 这篇关于严格别名的精彩文章 已经让我完成了 95% 的工作,但我想澄清一下我对什么是 UB、什么不是 UB 的理解。
假设我有一个结构:
struct __attribute__((packed)) SimpleStruct {
uint32_t a = 0;
uint8_t b = 1;
int16_t c = 2;
uint8_t d[5] = {0, 1, 2, 3, 4};
};
我使用了
__attribute__((packed))
指令来确保不使用填充字节,从而损害性能/优化。根据标准,允许通过对象的 reinterpret_cast
到 unsigned char *
检查字节表示,而不是 UB:
unsigned char *bytes_of_simple_struct = reinterpret_cast<unsigned char *>(&simple_struct);
现在,这是我想要澄清的部分,我相信通过此指针修改结构的字节是也允许的,而不是UB(假设你遵守对象的大小):
static_assert(sizeof(simple_struct) == 12);
bytes_of_simple_struct[0] = 0x1U;
现在,我明白
simple_struct.a
的值取决于系统的字节序。但是,在字节修改后访问 simple_struct.a
仍然是定义的行为正确吗?因为只要我没有将字节修改为它们所组成的类型的无效表示,就仍然应该定义行为。
相反,如果我的结构有一个
bool
代替:
struct __attribute__((packed)) SimpleStruct {
bool a_bool = false;
uint8_t b = 1;
int16_t c = 2;
uint8_t d[5] = {0, 1, 2, 3, 4};
};
然后做这样的事情:
bytes_of_simple_struct[0] = 0xFFU;
assert(simple_struct.a_bool == false);
将调用 UB,因为我现在已经修改了
a_bool
的底层字节,使得 bool
类型没有有效的表示。基本上,只要任何字节修改仍然遵守哪些字节可以代表每种类型的规则,就应该定义行为吗?对于基本数字类型,您基本上可以将字节修改为任何内容(这是否有用是另一回事),因为任何字节值都是有效的 uint8_t
,任何两个字节都是有效的 uint16_t
等等...
我的理解正确吗?
对于任何可能觉得这有帮助的人,以一些注释代码的形式很好地总结了我对未定义行为的遗漏:
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <cassert>
struct __attribute__((packed)) SimpleStruct
{
bool a_bool = false;
uint8_t b = 1;
int16_t c = 2;
uint8_t d[6] = { 0, 1, 2, 3, 4, 5 };
};
int
main ()
{
SimpleStruct simple_struct{};
// Ensuring padding has indeed been removed from the struct with __attribute__((packed))
static_assert (sizeof (simple_struct) == 10);
// Defined behavior, casting to unsigned char (or std::byte in C++20) to view the byte representation of an object is allowed
unsigned char *bytes_of_simple_struct =
reinterpret_cast <unsigned char *>(&simple_struct);
for (int i = 0; i < sizeof (simple_struct); i++)
{
printf("Byte %d of struct: %02X\n", i, bytes_of_simple_struct[i]);
}
// Defined behavior, using memcpy() to copy bytes into an object is allowed
static_assert(sizeof(bool) == 1);
uint8_t byte_array[sizeof(SimpleStruct)] = {
0x00U, // Critical this is either 0x00U or 0x01U, the only two valid byte representations for type bool
0x00U,
0x00U,
0x00U,
0x00U,
0x00U,
0x00U,
0x00U,
0x00U,
0x00U,
};
static_assert(sizeof(simple_struct) == sizeof(byte_array));
memcpy(&simple_struct, byte_array, sizeof(simple_struct));
assert(simple_struct.b == 0);
// Undefined behavior! This violates strict aliasing, because:
// - bytes_of_simple_struct[1] - Undefined! We've now de-referenced the unsigned char *,
// but the unsigned char * actually points at a SimpleStruct!
// Assigning values as if it was an unsigned char is undefined.
bytes_of_simple_struct[1] = 0xFFU;
assert(simple_struct.b = 0xFFU);
// A subtly different way to assign a single byte to bytes_of_simple_struct[1] that is defined.
// While it looks similar, the entire reason this is "defined" is because memcpy is not
// interpreting simple_struct as any type, it is simply copying bytes from one memory
// location to another.
unsigned char a_byte = 0xFFU;
memcpy(bytes_of_simple_struct + 1, &a_byte, sizeof(a_byte));
assert(simple_struct.b = 0xFFU);
// However, extreme care must be taken to ensure that the byte representation of the
// type being copied into is still valid post memcpy(). If the byte representation isn't
// valid, undefined behavior still occurs. For example:
a_byte = 0xFFU;
memcpy(bytes_of_simple_struct, &a_byte, sizeof(a_byte));
// A bool can only be represented by bytes 0x0 and 0x1, by copying 0xFF into a bool type
// and referencing simple_struct.a_bool, undefined behavior is invoked.
// For example, these assertions both pass compiled with GCC 13.2!
assert(simple_struct.a_bool != false);
assert(simple_struct.a_bool != true)
}