FFMPEG视频转音频的结果是不同的长度

问题描述 投票:1回答:1

我试图将一个MP4文件覆盖到一个单声道WAV文件,采样频率为16,000 Hz。

当我运行以下代码时,持续时间从 00:09:59.99 (MP4)至 00:09:57.64 (WAV).它的原始,较长的版本从00:48:37.46(MP4)到00:48:23.38(WAV)。它的原始,更长的版本从00:48:37.46(MP4)到00:48:23.38(WAV)。

ffmpeg -i <FILE_NAME>.mp4 -ac 1 -ar 16000 <FILE_NAME>.wav

我也试过下面的代码,结果差很多,从00:48:37.46(MP4)变成了00:48:23.38(WAV)。结果差很多,从00:09:59.99(MP4)到00:12:56.29(AAC)。

ffmpeg -I <FILE_NAME>.mp4 -vn -acodec copy <FILE_NAME>.aac

附上日志。

Report written to "ffmpeg-20200610-093115.log"
Command line:
ffmpeg -i short.mp4 -ac 1 -ar 16000 short.wav -report
ffmpeg version 4.1.1 Copyright (c) 2000-2019 the FFmpeg developers
  built with Apple LLVM version 10.0.0 (clang-1000.11.45.5)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.1.1 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags='-I/Library/Java/JavaVirtualMachines/openjdk-11.0.2.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/openjdk-11.0.2.jdk/Contents/Home/include/darwin' --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libmp3lame --enable-libopus --enable-librubberband --enable-libsnappy --enable-libtesseract --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-videotoolbox --disable-libjack --disable-indev=jack --enable-libaom --enable-libsoxr
  libavutil      56. 22.100 / 56. 22.100
  libavcodec     58. 35.100 / 58. 35.100
  libavformat    58. 20.100 / 58. 20.100
  libavdevice    58.  5.100 / 58.  5.100
  libavfilter     7. 40.101 /  7. 40.101
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  3.100 /  5.  3.100
  libswresample   3.  3.100 /  3.  3.100
  libpostproc    55.  3.100 / 55.  3.100
Splitting the commandline.
Reading option '-i' ... matched as input url with argument 'short.mp4'.
Reading option '-ac' ... matched as option 'ac' (set number of audio channels) with argument '1'.
Reading option '-ar' ... matched as option 'ar' (set audio sampling rate (in Hz)) with argument '16000'.
Reading option 'short.wav' ... matched as output url.
Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option report (generate a report) with argument 1.
Successfully parsed a group of options.
Parsing a group of options: input url short.mp4.
Successfully parsed a group of options.
Opening an input file: short.mp4.
[NULL @ 0x7f98a3008200] Opening 'short.mp4' for reading
[file @ 0x7f98a2904440] Setting default whitelist 'file,crypto'
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] ISO: File Type Major Brand: mp42
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] Unknown dref type 0x206c7275 size 12
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] Processing st: 0, edit list 0 - media time: 0, duration: 7679872
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] Unknown dref type 0x206c7275 size 12
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] Processing st: 1, edit list 0 - media time: 1024, duration: 26459559
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] drop a frame at curr_cts: 0 @ 0
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] Before avformat_find_stream_info() pos: 11213917 bytes read:318782 seeks:1 nb_streams:2
[h264 @ 0x7f98a3808800] nal_unit_type: 7(SPS), nal_ref_idc: 3
[h264 @ 0x7f98a3808800] nal_unit_type: 8(PPS), nal_ref_idc: 3
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] demuxer injecting skip 1024 / discard 0
[aac @ 0x7f98a1008c00] skip 1024 / discard 0 samples due to side data
[h264 @ 0x7f98a3808800] nal_unit_type: 6(SEI), nal_ref_idc: 0
[h264 @ 0x7f98a3808800] nal_unit_type: 5(IDR), nal_ref_idc: 3
[h264 @ 0x7f98a3808800] Format yuv420p chosen by get_format().
[h264 @ 0x7f98a3808800] Reinit context to 640x368, pix_fmt: yuv420p
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] All info found
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x7f98a3008200] After avformat_find_stream_info() pos: 21961 bytes read:351550 seeks:2 frames:46
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'short.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 1
    compatible_brands: isommp41mp42
    creation_time   : 2020-06-10T16:12:17.000000Z
  Duration: 00:09:59.99, start: 0.000000, bitrate: 149 kb/s
    Stream #0:0(eng), 1, 1/12800: Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p, 640x360 [SAR 1:1 DAR 16:9], 47 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
    Metadata:
      creation_time   : 2020-06-10T16:12:17.000000Z
      handler_name    : Core Media Video
    Stream #0:1(eng), 45, 1/44100: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 98 kb/s (default)
    Metadata:
      creation_time   : 2020-06-10T16:12:17.000000Z
      handler_name    : Core Media Audio
Successfully opened the file.
Parsing a group of options: output url short.wav.
Applying option ac (set number of audio channels) with argument 1.
Applying option ar (set audio sampling rate (in Hz)) with argument 16000.
Successfully parsed a group of options.
Opening an output file: short.wav.
[file @ 0x7f98a0c1db40] Setting default whitelist 'file,crypto'
Successfully opened the file.
Stream mapping:
  Stream #0:1 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
cur_dts is invalid (this is harmless if it occurs once at the start per stream)
[aac @ 0x7f98a100de00] skip 1024 / discard 0 samples due to side data
cur_dts is invalid (this is harmless if it occurs once at the start per stream)
detected 12 logical cores
[graph_0_in_0_1 @ 0x7f98a0e2c4c0] Setting 'time_base' to value '1/44100'
[graph_0_in_0_1 @ 0x7f98a0e2c4c0] Setting 'sample_rate' to value '44100'
[graph_0_in_0_1 @ 0x7f98a0e2c4c0] Setting 'sample_fmt' to value 'fltp'
[graph_0_in_0_1 @ 0x7f98a0e2c4c0] Setting 'channel_layout' to value '0x4'
[graph_0_in_0_1 @ 0x7f98a0e2c4c0] tb:1/44100 samplefmt:fltp samplerate:44100 chlayout:0x4
[format_out_0_0 @ 0x7f98a0e2cb80] Setting 'sample_fmts' to value 's16'
[format_out_0_0 @ 0x7f98a0e2cb80] Setting 'sample_rates' to value '16000'
[format_out_0_0 @ 0x7f98a0e2cb80] Setting 'channel_layouts' to value '0x4'
[format_out_0_0 @ 0x7f98a0e2cb80] auto-inserting filter 'auto_resampler_0' between the filter 'Parsed_anull_0' and the filter 'format_out_0_0'
[AVFilterGraph @ 0x7f98a0c16ac0] query_formats: 4 queried, 6 merged, 3 already done, 0 delayed
[auto_resampler_0 @ 0x7f98a0e2d540] [SWR @ 0x7f98a28e1000] Using fltp internally between filters
[auto_resampler_0 @ 0x7f98a0e2d540] ch:1 chl:mono fmt:fltp r:44100Hz -> ch:1 chl:mono fmt:s16 r:16000Hz
Output #0, wav, to 'short.wav':
  Metadata:
    major_brand     : mp42
    minor_version   : 1
    compatible_brands: isommp41mp42
    ISFT            : Lavf58.20.100
    Stream #0:0(eng), 0, 1/16000: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s (default)
    Metadata:
      creation_time   : 2020-06-10T16:12:17.000000Z
      handler_name    : Core Media Audio
      encoder         : Lavc58.35.100 pcm_s16le
size=   17152kB time=00:09:16.63 bitrate= 252.4kbits/s speed=1.11e+03x    
[out_0_0 @ 0x7f98a0e2c700] EOF on sink link out_0_0:default.
No more output streams to write to, finishing.
size=   18676kB time=00:09:59.99 bitrate= 255.0kbits/s speed=1.11e+03x    
video:0kB audio:18676kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000408%
Input file #0 (short.mp4):
  Input stream #0:0 (video): 1 packets read (3689 bytes); 
  Input stream #0:1 (audio): 25739 packets read (7375414 bytes); 25738 frames decoded (26355712 samples); 
  Total: 25740 packets (7379103 bytes) demuxed
Output file #0 (short.wav):
  Output stream #0:0 (audio): 25739 frames encoded (9562163 samples); 25739 packets muxed (19124326 bytes); 
  Total: 25739 packets (19124326 bytes) muxed
25738 frames successfully decoded, 0 decoding errors
[AVIOContext @ 0x7f98a0c1dc40] Statistics: 4 seeks, 76 writeouts
[AVIOContext @ 0x7f98a29045c0] Statistics: 10902846 bytes read, 29 seeks
audio video ffmpeg video-processing
1个回答
1
投票

容器,如MP4,MKV存储时间戳的数据包。它的一个副产品是它允许代表音频轨道中的沉默,只需调整时间戳的数据包之间的沉默。像WAV或原始AAC比特流的容器没有时间戳,所以任何 "沉默 "编码的方式丢失。

你的输入音频是44100 Hz。在这一行靠近日志的最后。

Input stream #0:1 (audio): 25739 packets read (7375414 bytes); 25738 frames decoded (26355712 samples); 

你看到输入流有 26355712 samples. 在44100赫兹时,就是 ~597.6351 seconds. 这就是你在WAV输出中得到的东西。

要插入静音,为了保留音源的持续时间,可以使用

ffmpeg -i <FILE_NAME>.mp4 -af aresample=async=1 -ac 1 -ar 16000 <FILE_NAME>.wav
© www.soinside.com 2019 - 2024. All rights reserved.