isobmff 格式的 mvhd 框时间刻度和 mdhd 框时间刻度之间的差异

Question

isobmff 格式的 mvhd 框时间刻度和 mdhd 框时间刻度有什么区别？？

我在官方文档中找到了定义。

电影盒时间刻度为

timescale is an integer that specifies the time-scale for the entire presentation; this is the number of 
time units that pass in one second. For example, a time coordinate system that measures time in 
sixtieths of a second has a time scale of 60

mdhd 盒子时间尺度是

timescale is an integer that specifies the number of time units that pass in one second for this media. 
For example, a time coordinate system that measures time in sixtieths of a second has a time scale 
of 60

如果 Movie Box 时间刻度为 1000，fps 24。
那么 mdhd 时间刻度值为视频轨道的 24000。
正确吗？
（我的想法是视频 mdhd 时间尺度是（fps * mvhd 时间尺度）并且
音频 mdhd 时间刻度是采样率（48000kHz ..等）

我很好奇一些文件的mvhd时间刻度值为30，
对于视频片段文件，某些文件的值为 90000。

下图的MDHD时间刻度为30
下图有 MDHD 时间刻度 90000

Answer 1

MVHD =

global/movie timescale

对于电影时间：的频率（通常设置为）1000刻度代表现实世界时钟的1秒。

MDHD =

media-specific timescale

对于视频时间：此指定频率应代表现实世界时钟的 1 秒。

视频注释：这与STTS原子/盒中的FPS和采样持续时间相关（并受其影响）。
视频示例： 如果 FPS 为 24 并且 样本持续时间 为 1000，则在

mdhd

中我们设置：每 1 秒 24000 滴答数。
我们说 1 个样本（帧）应在实际时钟时间内持续 1/24 秒。 24 个样本 == 1 秒。

//## where FPS mode is Constant (not Variable)
//## eg: STTS_sample_duration is 1000
FPS = ( MVHD_timescale / STTS_sample_duration )

对于音频时间：此指定频率应代表现实世界时钟的 1 秒。

音频注释： 这通常是音频数据的PCM 每秒采样数（单位为赫兹）的速率。
音频示例： 48khz 是 48 000 个 PCM 样本（每秒），因此在

mdhd

中我们设置：48000 每 1 秒滴答数。

在上面的示例中，一秒钟的预期 PCM 音频样本总数为 48000。

例如，您可以想象，我们现在将这 48000 个样本划分为 24 个音频帧。本例中每个音频帧有多少个 PCM 样本？
它是 2000 因为：

( 2000 samples x 24 frames ) = 48000

样本总数。

在 STTS 中写入样本持续时间：每个音频帧 2000 个 PCM 样本
在 MDHD 中写入timescale：每秒 24000 个刻度
在 MVHD 中写入时间刻度：每秒 1000 个刻度线

在 STTS 中，样本持续时间是帧中音频样本的计数，而不是每帧音频时间的计数。

以每秒 24 个音频帧的速度，一个音频帧可容纳 2000 个样本，因此它具有 41.666 毫秒的音频时间。

( 48000 samples / 24 frames ) == 2000 samples length per audio frame
( 1000 ticks per sec / 24 frames ) == 41.666 milliseconds per audio frame

所以你可以计算：

( ( frame_duration_msec * MDHD_timescale ) / MVHD_timescale ) = total audio time per 24 frames
( ( 41.666 * 2400 ) / 1000 ) = 999.984 milliseconds of audio per 24 frames 

//#same as: 
( 41.666 msecs * 24 frames ) = 999.984 milliseconds

在 MP4 内部，音频帧实际上是一个 AAC 帧。
它每帧保存不同数量的预期样本
对于 44100 或 48000，一个AAC 帧可容纳 1024 个样本（或 21.333 毫秒的声音/PCM 数据）。

需要多少个 AAC 帧（音频帧），每个帧有 1024 个 PCM 样本才能在 1 秒内播放预期的 48000 个 PCM 音频样本？

答案是46.875帧。不过，音频解码器会读取 47 个 AAC 帧，而这 47 个帧中剩余的 128 个 PCM 音频样本将被带入下一秒的声音中

( 48000 samples / 46.875 AAC frames ) == 1024 samples length per audio frame
( 1000 ticks per sec / 46.875 AAC frames ) == 21.333 milliseconds per audio frame

(2) 关于侧面查询...

“如果 Movie Box 时间刻度为 1000，fps 24。那么 MDHD 时间刻度值为视频轨道的 24000。正确吗？”

您的视频必须使用恒定帧速率才能使该逻辑发挥作用。
您的 STTS 必须只有一个条目，表示所有视频帧都应用相同的采样持续时间 1000，然后在 MDHD_timescale 中您可以设置 24000，并且在 MVHD_timescale 中您可以设置 1000。

“我的想法是：视频 mdhd 时间刻度为
(fps * mvhd timescale)
和
音频 mdhd 时间刻度是采样率（48000kHz 等）”

MDHD 中的音频/视频时间尺度都表示 1 秒的媒体时间需要多少个“滴答声”。在 STTS 中，您所说的是当前帧代表 MDHD 时间尺度的多少（比率）。

在视频中：
MDHD 为 24000，因为 STTS 中的每个视频样本（帧）都有 1000 刻度持续时间。
STTS 告诉我们需要 24 个视频帧才能匹配 MDHD 值。

在音频中：
MDHD 刻度为 48000，因为 STTS 中的每个音频帧都包含 1024 刻度的 PCM 音频样本。
STTS 告诉我们需要 47 个音频帧才能匹配 MDHD 值。

“我很好奇有些文件的mvhd timescale值为30，对于视频片段文件，某些文件的值为 90000。”

这些数字是特定于 MDHD 中使用的任何其他数字的比率和 STTS 条目。

90,000 是从众多帧速率中获取可用整数的一个很好的值：

90000 / 3750 == 24.000 
90000 / 3600 == 25.000
90000 / 3003 == 29.970 
90000 / 3000 == 30.000
90000 / 1500 == 60.000

In the case of 23.976 FPS, you could use: ( 24000 / 1001 ) == 23.976
In the case of 59.940 FPS, you could use: ( 60000 / 1001 ) == 59.940

isobmff 格式的 mvhd 框时间刻度和 mdhd 框时间刻度之间的差异

问题描述投票：0回答：1

1个回答

最新问题

isobmff 格式的 mvhd 框时间刻度和 mdhd 框时间刻度之间的差异

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1