Skip to content

Video stream bframe support#12700

Draft
caiopiccirillo wants to merge 12 commits into
rerun-io:mainfrom
caiopiccirillo:feat/video-stream-bframe-support
Draft

Video stream bframe support#12700
caiopiccirillo wants to merge 12 commits into
rerun-io:mainfrom
caiopiccirillo:feat/video-stream-bframe-support

Conversation

@caiopiccirillo
Copy link
Copy Markdown

Related

What

Adds B-frame support for H.264/H.265 streams on the VideoStream archetype.

New types:

VideoPresentationTimestampOffset datatype (transparent i64) and matching component — represents the per-sample offset from decode timestamp (DTS) to presentation timestamp (PTS), in timeline units.

Archetype changes (VideoStream):

sample field is now array-typed VideoSample, allowing multiple frames to be batched into a single row in decode order.

New optional presentation_time_offset field `VideoPresentationTimestampOffset — when present, PTS for sample i = DTS_i + offset_i. When absent, PTS == DTS (the common no-B-frame case, fully backward compatible).

Cache/decoder pipeline (VideoStreamCache):

Frame numbers are now assigned by presentation order (PTS-sorted) instead of decode order.
Sample durations are computed from PTS-sorted timestamps so scrubbing and seek work correctly.

read_pts_offsets_from_chunk extracts offset arrays from incoming chunks.

Incremental chunk additions properly update frame numbers and SamplesStatistics.
demux/mod.rs: video duration is now computed from actual min/max PTS to handle reordered frames.

caiopiccirillo and others added 12 commits March 19, 2026 15:44
…test

- update_sample_durations: only clear the last sample's duration when the
  range extends to the actual end of the sample deque. Previously a
  sub-range update (from split/compacted chunk handling) would
  unconditionally clear a duration already computed by a wider pass.
- Add #[expect(clippy::cast_sign_loss)] annotation on Arrow offset cast
  in read_pts_offsets_from_chunk for convention compliance.
- Add video_stream_cache_bframe_incremental_buildup test that exercises
  the on_store_events path with B-frame offsets, verifying frame_nr,
  PTS offsets, and statistics stay consistent as frames arrive one at a
  time (with both compaction enabled and disabled).
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thanks for opening this pull request.

Because this is your first time contributing to this repository, make sure you've read our Contributor Guide and Code of Conduct.

@Wumpf
Copy link
Copy Markdown
Member

Wumpf commented Mar 20, 2026

Thanks for wanting to contribute! This is a fairly complex and large piece that comes with quite some design choices & testing needs. Your description seems to be entirely generated, which can be fine but in the future make sure to add your own thoughts about the tradeoffs involved, what's good, what's bad, how this was tested, some screenshots & examples etc.

Either way we don't have time to process that right now, so I'll put it on draft

See also https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md#what-to-contribute

@Wumpf Wumpf marked this pull request as draft March 20, 2026 12:20
@Wumpf
Copy link
Copy Markdown
Member

Wumpf commented Mar 20, 2026

Your description seems to be entirely generated, which can be fine

To clarify, I don’t care that it’s generated but I deeply care about it being reviewable and indicating that thought went into the things where thought is required! Right now that's just the usual "agent did things" listing which doesn't help understanding the relevant pieces much

@caiopiccirillo
Copy link
Copy Markdown
Author

@Wumpf Sorry for the generated description. I was in a hurry yesterday and thought it was more important to review the code with a concise description.

But I took some time to describe the decisions/considerations that were made in the design.

I chose to implement DTS on the timeline with PTS offset per sample. The timeline timestamp is the decoding timestamp, but an optional field (presentation_time_offset, which is an int64 with the timeline unit) was added that gives the delta PTS - DTS per sample, and when is absent, PTS == DTS enabling backward compatibility at no cost for no B-frames.

Two alternatives were considered for the problem, the first being the one suggested in the issue itself, which would be to add the decoder_timeline, but I see that it solves another problem that can be addressed in the future (which would be if the user logs the samples out of decoding order), but the current implementation assumes that the samples arrive in DTS order on the timeline. The second alternative would be to separate the DTS and PTS fields, but I thought that wouldn't be efficient in the sense of duplicating the metadata, and the decoder doesn't need the DTS.

Making sample an array was necessary because B-frame encoders naturally output batches, thus allowing multiple samples per row and keeping the sample stream in decoding order within a log call (which aligns with how encoder outputs data).

I see some limitations in my implementation:

  1. I haven't implemented the decoder_timeline field.
  2. I haven't tested for AV1/VP9.
  3. The frame number assignment is done by PTS order, which can be negligible for "typical" streams, but is worth noting for very large batches.

Finally, I tested using unit tests for the cases I found relevant, but I also did a manual e2e test using an H.264 file (re-encoded with libx264 with max_b_frames=3) and streaming it through the Python SDK and it seems to be working.

I'm open to discuss further if changes are needed 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support h.264/h.265 b-frames on VideoStream archetype

2 participants