The FFmpeg media editing software is a valuable tool, but its documentation is only barely adequate. It certainly does not answer all the questions I have, as a user trying to understand why FFmpeg is not doing what I want it to do.

Fortunately, FFmpeg is open source, so when the documentation fails, one can read the source. I wanted to learn about presentation time stamps and time bases. The fps video filter source code, in file libavfilter/vf_fps.c, was an instructive read.

I took what I learned from reading that source, and did a complete rewrite of the fps filter documentation. It is longer than the original fps filter documentation (as archived in April 2020 — you can check if the present documentation is any better). I believe the rewrite is more complete and more accurate. I contributed the rewrite to the FFmpeg project. I submitted it as a patch to the ffmpeg developers list. Discussion continues. I don’t know if this contribution will ultimately get accepted.

So, for the benefit of FFmpeg users who are web-searching for answers, here is my documentation of FFmpeg’s fps video filter.


fps

Make a new video from the frames and presentation time stamps (PTSs) of the input. The new video has a specified constant frame rate, and new PTSs. It generally keeps frames from the old video, but might repeat or drop some frames. You can choose the method for rounding from input PTS to output PTS. This affects which frames fps keeps, repeats, or drops.

It accepts the following parameters:

fps

The output frame rate, in frames per second. May be an integer, real, or rational number, or an abbreviation. The default is 25.

start_time

A time, in seconds from the start of the input stream, which fps converts to an input starting PTS and an output starting PTS. If set, fps drops input frames which have PTSs less than the input starting PTS. If not set, the input and output starting PTSs are zero, but fps drops no input frames based on PTS. (See details below.)

round

Rounding method to use when calculating output PTSs from input PTSs. If the calculated output PTS is not exactly an integer, then the value determines which neighbouring integer value fps selects.

Possible values are:

  • zero
    • round towards 0
  • inf
    • round away from 0
  • down
    • round towards -infinity
  • up
    • round towards +infinity
  • near
    • round to nearest (midpoints round away from 0)

The default is near.

eof_action

Action which fps takes with the final input frame. The input video passes in a final input PTS, which fps converts to an output PTS limit. fps drops any input frames with a PTS at or after this limit.

Possible values are:

  • round
    • Use same rounding method as for other frames.
  • pass
    • Round the ending input PTS using up. This might make fps include one last input frame. 

The default is round.

Alternatively, the options may be specified as a flat string:  fps[:start_time[:round]].

fps makes an output video with consecutive integer PTSs, and with a time base set to the inverse of the given frame rate. fps keeps, repeats, or drops input frames, in sequence, to the output video. It does so according to their input PTSs, as converted to seconds (via the input time base), then rounded to output PTSs. 

fps sets output PTSs in terms of a timeline which starts at zero. For any output frame, the integer PTS multiplied by the time base gives a value in seconds on that timeline. If the start_time parameter is not set, or is zero, the first output frame’s PTS is zero. Otherwise, the first PTS is the output starting PTS calculated from the start_time parameter. 

fps interprets input PTSs in terms of the same timeline. It multiplies each input frame’s PTS by the input time base, to get a value in seconds on the timeline. It rounds that value to an integer output PTS. For example, if the input video has a frame rate of 30 fps, a time base of 1/30 seconds, and a first frame with a PTS of 300, then fps treats that first frame as occurring 10 seconds (= 300 * 1/30) after the start of the video, even though it is the first frame.

Setting a start_time value allows for padding/trimming at the start of the input. For example, you can set start_time to 0, to pad the beginning with repeats of the first frame if a video stream starts after the audio stream, or to trim any frames with a negative PTS. When start_time is not set, the fps filter does not pad or trim starting frames, as long as they contain PTSs.

See also the setpts and settb filters.

Details

fps outputs exactly one frame for each output PTS. If there is exactly one input frame with an input PTS which converts to the current output PTS, fps keeps (outputs) that frame. If there are multiple frames which convert to the same output PTS, fps outputs the final frame of that group, and drops the previous frames. 

If the input frame PTS converts to an output PTS later than the current output PTS, fps repeats the previously output frame as the current frame. When this happens for the first input frame,  fps “pads” — outputs repetitions of — that first frame until the output PTS reaches the value converted from that first frame’s input PTS. 

fps always drops input frames which have no PTS set, regardless of the start_time parameter. 

The frame rate value must be zero or greater. It may be provided in a variety of forms. Each form is converted into a rational number, with an integer numerator and denominator. 

  • An integer number, e.g. 25. This converts to the rational number 25/1.
  • A real number, e.g. 3.14145926. This converts to a rational number, e.g. 954708/303893
  • A rational number. The numerator and denominator may be either integers or real numbers. e.g. 30/1.001 or -30000/-1001, which both convert to 30000/1001. The denominator must be non-zero.
  • An abbreviation. e.g ntsc as 30000/1001ntsc-film as 24000/1001. See the complete list at the “Video rate” section in the ffmpeg-utils(1) manual.

fps defines a sync point on the timeline, where one input PTS and one output PTS occur at the same moment. It calculates other PTSs as time offsets from this sync point. This affects the details of rounding. If start_time is set, then fps uses it to calculate input and output PTSs, and makes them the sync point. Otherwise, input and output PTS of zero are the sync point.

Note that fps does not assume that input frames are separated by exactly 1/frame_rate seconds. It takes the input PTSs literally. If the increment of PTS between frames varies along the video, fps treats those frames as happening at varying time intervals. 

An input video with PTSs starting past zero might yield unexpected results. Suppose the input PTSs start at 300, and say that converts to 10 seconds. Then fps repeats the first frame to fill the first 10 seconds of the output video. (However, ffmpeg may suppress those repeated frames, depending on the -vsync setting.) If you set start_time to 10 seconds, then fps sets the sync point to the PTSs converted from 10 seconds on the timeline. It no longer repeats the first frame. And, it starts the output PTS at a value corresponding to 10 seconds, instead of zero.

Examples

  • A typical usage to make a video with a frame rate of 25 frames per second, from an input video with any frame rate:
    • fps=fps=25
  • The output frames have PTSs of 0, 1, 2, etc. The frame rate is 25/1. The time base is 1/25.
  • Make a video with a frame rate of 24 frames per second, using an abbreviation, and rounding method to round to nearest:
    • fps=fps=film:round=near
  • Clean up a video with varying time between frames, and dropped frames. The input video is supposed to have an NTSC standard frame rate of 29.97 frames per second, and the time base is 3003/90000, but the PTSs increment variably at slightly more and less than that rate. The recorder dropped some frames, but the PTSs still reflect when the remaining frames were captured. 
    • fps=fps=30/1.001:round=near
  • The output PTSs are 0, 1, 2, etc. The time between frames is exact. The output frame rate is 30000/1001. The time base is 1001/30000. Where frames were dropped by the recorder, fps repeated frames to fill the gaps.