Scene change detection during encoding and key frame extraction code

In a previous post we explained how to generate thumbnails for each scene change using Edit Decision Lists from your non-linear editor of choice. Sometimes, though, you won't have an EDL along with your video assets (for instance, when you are using off-airs or videos edited by somebody else). In those situations we can still create a thumbnail for each scene change thanks to how frames are structured in digital compression.

According to Wikipedia, a GOP structure specifies the order in which intra- and inter-frames are arranged. A Group Of Pictures can contain the following frame types:

  • I-frame (intra coded picture) - reference picture, which represents a full image and which is independent of other picture types. Each GOP begins with this type of picture.
  • P-frame (predictive coded picture) - contains motion-compensated difference information from the preceding I- or P-frame.
  • B-frame (bidirectionally predictive coded picture) - contains difference information from the preceding and following I- or P-frame within a GOP.

The GOP structure is often referred by two numbers, for example M=3, N=12, which equals IBBPBBPBBPBBI. The first one tells the distance between two anchor frames (I or P) and the second the distance between two I-frames (GOP length).

In order to use key frames as a scene change detection method we need to use flexible GOP structures, with minimum (min-keynt) and maximum (keyint) values when encoding our video.

For example, a minimum setting that is the same as the frame rate of the video will prevent the encoded video from having two subsequent key frames within a second of each other. Please note that if your video has scenes shorter than a second long you won't be able to detect all scene changes unless you reduce this setting.

Similarly, a maximum setting ensures that a key frame is inserted at least every X number of frames. A recommend setting is to set this as 10 times the frame rate, which equates to 10 seconds of video between key frames. We can set this to infinite to never insert non-scenecut key frames although this might cause problems when seeking (if you try to skip to a part of the video without a key frame, there won't be any video until the next key frame is reached).

In addition, we need to define the threshold for scenecut detection. The encoder calculates a metric for every frame to estimate how different it is from the previous frame. If the value is lower than the threshold, a scenecut is detected.

Once we have encoded the video we can then run the following ffmpeg (I've used 0.8-win64-static) parameters:

ffmpeg -vf select="eq(pict_type\,PICT_TYPE_I)" -i yourvideo.mp4 -vsync 2 -s 73x41 -f image2 thumbnails-%02d.jpeg
-loglevel debug 2>&1 | grep "pict_type:I -> select:1" | cut -d " " -f 6 - > keyframe-timecodes.txt

What follows -vf in a ffmpeg command line is a Filtergraph description. The select filter selects frames to pass in output. The constant of the filter is “pict_type” and the value “ PICT_TYPE_I”. In short, we are only passing key frames to the output.

-vsync 2 prevents ffmpeg to generate more than one copy for each key frame.

-f image2 writes video frames to image files. The output filenames are specified by a pattern, which can be used to produce sequentially numbered series of files. The pattern may contain the string "%d" or "%0Nd".

-loglevel debug 2 > keyframe-timecodes.txt outputs:

[select @ 0000000001A88BE0] n:0 pts:0 t:0.000000 pos:1953 interlace_type:P key:0 pict_type:I -> select:1.000000
[select @ 0000000001A88BE0] n:1 pts:40000 t:0.040000 pos:4202 interlace_type:P key:0 pict_type:P -> select:0.000000

I use “>&1 | grep "pict_type:I -> select:1" | cut -d " " -f 6 -” to output something more readable:

t:0.000000
t:1.360000

Finally, I can convert “keyframe-timecodes.txt” into a chapter navigation list and use the thumbnails to navigate the video:

<a onclick="jwplayer().seek(0); return false" href="#"><img src="thumbnails-01.jpeg"></a>
<a onclick="jwplayer().seek(1.36); return false" href="#"><img src="thumbnails-02.jpeg"></a>

Demo

Is this up to date or have the commands changed?

I've just built ffmpeg from github src (using default options for ./configure),

running this:

ffmpeg -vf select="eq(pict_type\,PICT_TYPE_I)" -i GiantOfMetropolisversionUs_512kb.mp4 -vsync 2 -s 73x41 -f image2 img/thumbnails-%02d.jpeg -loglevel debug

Video was http://archive.org/download/TheGiantOfMetropolis1961/GiantOfMetropolisve...

It breaks out the images but there is no textual output; did the commandline options change? I found this page when trying to understand how to run http://ffmpeg.org/trac/ffmpeg/ticket/442d ...

TellyClub:vista danbri$ ffmpeg -vf select="eq(pict_type\,PICT_TYPE_I)" -i GiantOfMetropolisversionUs_512kb.mp4 -vsync 2 -s 73x41 -f image2 img/thumbnails-%02d.jpeg -loglevel debug
ffmpeg version N-41528-g4289b66 Copyright (c) 2000-2012 the FFmpeg developers
built on Jun 12 2012 12:11:56 with gcc 4.2.1 (Apple Inc. build 5666) (dot 3)
configuration:
libavutil 51. 58.100 / 51. 58.100
libavcodec 54. 25.100 / 54. 25.100
libavformat 54. 6.101 / 54. 6.101
libavdevice 54. 0.100 / 54. 0.100
libavfilter 2. 78.101 / 2. 78.101
libswscale 2. 1.100 / 2. 1.100
libswresample 0. 15.100 / 0. 15.100
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x10100e800] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x10100e800] ISO: File Type Major Brand: isom
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x10100e800] File position before avformat_find_stream_info() is 3372584
[h264 @ 0x10100ee00] Using externally provided dimensions
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x10100e800] All info found
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x10100e800] File position after avformat_find_stream_info() is 3375219
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'GiantOfMetropolisversionUs_512kb.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
creation_time : 1970-01-01 00:00:00
title : GIANT OF METROPOLIS \(Version US\) - http://www.archive.org/details/TheGiantOfMetropolis1961
encoder : Lavf52.38.0
Duration: 01:29:56.26, start: 0.000000, bitrate: 581 kb/s
Stream #0:0(und), 1, 1/2997: Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p, 320x240 [SAR 11:10 DAR 22:15], 50/2997, 512 kb/s, 29.97 fps, 29.97 tbr, 2997 tbn, 59.94 tbc
Metadata:
creation_time : 1970-01-01 00:00:00
handler_name : VideoHandler
Stream #0:1(und), 1, 1/48000: Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, s16, 64 kb/s
Metadata:
creation_time : 1970-01-01 00:00:00
handler_name : SoundHandler
[vbuffer source @ 0x100f14980] Setting entry with key 'video_size' to value '320x240'
[vbuffer source @ 0x100f14980] Setting entry with key 'pix_fmt' to value '0'
[vbuffer source @ 0x100f14980] Setting entry with key 'time_base' to value '1/2997'
[vbuffer source @ 0x100f14980] Setting entry with key 'pixel_aspect' to value '11/10'
[vbuffer source @ 0x100f14980] Setting entry with key 'sws_param' to value 'flags=2'
[vbuffer source @ 0x100f14980] Setting entry with key 'frame_rate' to value '2997/100'
[buffer @ 0x100f14a80] w:320 h:240 pixfmt:yuv420p tb:1/2997 fr:2997/100 sar:11/10 sws_param:flags=2
[ffmpeg_buffersink @ 0x100f15800] No opaque field provided
[scale @ 0x100f159a0] w:320 h:240 fmt:yuv420p sar:11/10 -> w:73 h:41 fmt:yuvj420p sar:902/1095 flags:0x4
[mjpeg @ 0x10104e200] intra_quant_bias = 96 inter_quant_bias = 0
[h264 @ 0x10100ee00] detected 8 logical cores
Output #0, image2, to 'img/thumbnails-%02d.jpeg':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
creation_time : 1970-01-01 00:00:00
title : GIANT OF METROPOLIS \(Version US\) - http://www.archive.org/details/TheGiantOfMetropolis1961
encoder : Lavf54.6.101
Stream #0:0(und), 0, 1/90000: Video: mjpeg, yuvj420p, 73x41 [SAR 902:1095 DAR 22:15], 100/2997, q=2-31, 200 kb/s, 90k tbn, 29.97 tbc
Metadata:
creation_time : 1970-01-01 00:00:00
handler_name : VideoHandler
Stream mapping:
Stream #0:0 -> #0:0 (h264 -> mjpeg)
Press [q] to stop, [?] for help
[h264 @ 0x1010a2000] Using externally provided dimensions
No more inputs to read from, finishing. 0kB time=01:29:47.42 bitrate= 0.0kbits/s
frame=161726 fps=1329 q=11.4 Lsize= 0kB time=01:29:56.26 bitrate= 0.0kbits/s
video:131635kB audio:0kB global headers:0kB muxing overhead -100.000000%

text output

try adding "2>&1 | grep "pict_type:I -> select:1" | cut -d " " -f 6 - > keyframe-timecodes.txt" after "-loglevel debug "

Post new comment

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

Video Production Tweets

  • Here's a sneak preview of a prop from a recent commission. If you like #cycling, #books and #production, read this: http://t.co/AkE2Vjy2 31 weeks 11 hours ago
  • A couple of interesting new blog posts about some recent projects from VP-Lers: First up is Bjorn In The USA… http://t.co/YgfxTLQT 31 weeks 11 hours ago
  • Incidentally, on the subject of Video on Demand, check out a report for #CNBC made a while ago by VP-L staffer, Ed: http://t.co/ZJccurxk 37 weeks 2 days ago
  • Not confirmed, mind. But we at VP-L suspect it'll go through, despite the inevitable criticism from Daily Mail readers. 37 weeks 2 days ago
  • Interesting move from the BBC with the announcement of #ProjectBarcelona http://t.co/qT7CX5QB #VOD Pay per download 37 weeks 2 days ago

Video Production London Blog

  • I still remember my first film as if it were yesterday; I was 13 years old and my father had just bought our first video camera for the summer holidays. I had never hold a camera before in my life...

  • My two great obsessions in life are bicycles and making videos. And books. Okay, my three great obsessions in life are bicycles, making videos and books. So when I got the opportunity to...

glqxz9283 sfy39587p08