Video stream bframe support#12700
Conversation
…sample array-typed
…test - update_sample_durations: only clear the last sample's duration when the range extends to the actual end of the sample deque. Previously a sub-range update (from split/compacted chunk handling) would unconditionally clear a duration already computed by a wider pass. - Add #[expect(clippy::cast_sign_loss)] annotation on Arrow offset cast in read_pts_offsets_from_chunk for convention compliance. - Add video_stream_cache_bframe_incremental_buildup test that exercises the on_store_events path with B-frame offsets, verifying frame_nr, PTS offsets, and statistics stay consistent as frames arrive one at a time (with both compaction enabled and disabled).
There was a problem hiding this comment.
Hi! Thanks for opening this pull request.
Because this is your first time contributing to this repository, make sure you've read our Contributor Guide and Code of Conduct.
|
Thanks for wanting to contribute! This is a fairly complex and large piece that comes with quite some design choices & testing needs. Your description seems to be entirely generated, which can be fine but in the future make sure to add your own thoughts about the tradeoffs involved, what's good, what's bad, how this was tested, some screenshots & examples etc. Either way we don't have time to process that right now, so I'll put it on draft See also https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md#what-to-contribute |
To clarify, I don’t care that it’s generated but I deeply care about it being reviewable and indicating that thought went into the things where thought is required! Right now that's just the usual "agent did things" listing which doesn't help understanding the relevant pieces much |
|
@Wumpf Sorry for the generated description. I was in a hurry yesterday and thought it was more important to review the code with a concise description. But I took some time to describe the decisions/considerations that were made in the design. I chose to implement DTS on the timeline with PTS offset per sample. The timeline timestamp is the decoding timestamp, but an optional field (presentation_time_offset, which is an int64 with the timeline unit) was added that gives the delta PTS - DTS per sample, and when is absent, PTS == DTS enabling backward compatibility at no cost for no B-frames. Two alternatives were considered for the problem, the first being the one suggested in the issue itself, which would be to add the decoder_timeline, but I see that it solves another problem that can be addressed in the future (which would be if the user logs the samples out of decoding order), but the current implementation assumes that the samples arrive in DTS order on the timeline. The second alternative would be to separate the DTS and PTS fields, but I thought that wouldn't be efficient in the sense of duplicating the metadata, and the decoder doesn't need the DTS. Making I see some limitations in my implementation:
Finally, I tested using unit tests for the cases I found relevant, but I also did a manual e2e test using an H.264 file (re-encoded with libx264 with max_b_frames=3) and streaming it through the Python SDK and it seems to be working. I'm open to discuss further if changes are needed 😄 |
Related
What
Adds B-frame support for H.264/H.265 streams on the
VideoStreamarchetype.New types:
VideoPresentationTimestampOffsetdatatype (transparent i64) and matching component — represents the per-sample offset from decode timestamp (DTS) to presentation timestamp (PTS), in timeline units.Archetype changes (
VideoStream):samplefield is now array-typedVideoSample, allowing multiple frames to be batched into a single row in decode order.New optional
presentation_time_offsetfield `VideoPresentationTimestampOffset — when present, PTS for sample i = DTS_i + offset_i. When absent, PTS == DTS (the common no-B-frame case, fully backward compatible).Cache/decoder pipeline (
VideoStreamCache):Frame numbers are now assigned by presentation order (PTS-sorted) instead of decode order.
Sample durations are computed from PTS-sorted timestamps so scrubbing and seek work correctly.
read_pts_offsets_from_chunk extracts offset arrays from incoming chunks.
Incremental chunk additions properly update frame numbers and SamplesStatistics.
demux/mod.rs: video duration is now computed from actual min/max PTS to handle reordered frames.