The video buffer held 3 frames:
- the producer frame (for decoding)
- the pending frame (to exchange between the producer and consumer)
- the consumer frame (for rendering)
It worked well, but it prevented video buffers to be chained, because
the consumer frame of the first video buffer must be the same as the
producer frame of the second video buffer.
To solve this problem, make the decoder and the screen handle their own
frames, and keep only the pending frame in the video_buffer.
This paves the way to support asynchronous swscale.