Skip to content

Fix inflate operator: Reset max output volume. Use size in bytes, not elements.#6283

Merged
mzient merged 1 commit into
NVIDIA:mainfrom
mzient:fix_inflate_op
Apr 9, 2026
Merged

Fix inflate operator: Reset max output volume. Use size in bytes, not elements.#6283
mzient merged 1 commit into
NVIDIA:mainfrom
mzient:fix_inflate_op

Conversation

@mzient
Copy link
Copy Markdown
Contributor

@mzient mzient commented Apr 8, 2026

Category:

Bug fix (non-breaking change which fixes an issue)

Description:

This PR fixes intermittent failure of inflate operator tests caused by uninitialized variable and improper element size assumptions.

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

test_dali_stateless_operators.test_inflate_stateless

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 8, 2026

Greptile Summary

This PR fixes two bugs in the GPU LZ4 inflate operator: (1) max_output_sample_vol_ was never reset to zero inside ParseOutputShape, so it could carry a stale value from a previous iteration; (2) the two size arguments passed to nvcompBatchedLZ4DecompressGetTempSizeAsync were in elements rather than bytes, causing an under-sized temp buffer and potential decompression failures for any output type wider than uint8.

Confidence Score: 5/5

Safe to merge; both targeted bug fixes are correct and the only remaining finding is a P2 style hardening suggestion.

All root-cause bugs (uninitialized max volume and element-vs-byte mismatch) are correctly addressed. The single comment is a P2 latent edge-case that the operator already guards against in RunImpl, so it does not block merge.

dali/operators/decoder/inflate/inflate_params.h — the early-return path in ParseOutputShape still skips resetting the max-volume accumulators.

Vulnerabilities

No security concerns identified.

Important Files Changed

Filename Overview
dali/operators/decoder/inflate/inflate_params.h Adds max_output_sample_vol_ = 0 reset alongside the existing max_output_vol_ = 0 reset, fixing a missing initialisation that could cause stale values on repeated calls; reset is still skipped on the num_samples == 0 early-return path.
dali/operators/decoder/inflate/inflate_gpu.cc Correctly converts element-count volumes to byte sizes before passing to nvcompBatchedLZ4DecompressGetTempSizeAsync, fixing incorrect temp-buffer sizing for output types wider than uint8.
dali/operators/decoder/inflate/inflate.h Introduces TypeInfo element_type_ member (initialised in constructor) to provide element byte-size to the GPU implementation; straightforward and correct.

Sequence Diagram

sequenceDiagram
    participant Inflate as "Inflate (GPU)"
    participant Impl as "InflateOpGpuLZ4Impl"
    participant Params as "ShapeParams"
    participant nvcomp as "nvcompBatchedLZ4"

    Inflate->>Impl: "SetupImpl(output_desc, ws)"
    Impl->>Params: "ProcessInputArgs(ws, batch_size)"
    Params->>Params: "ParseOutputShape() - reset max_output_vol_=0, max_output_sample_vol_=0"
    Params-->>Impl: "output shape ready"

    Inflate->>Impl: "RunImpl(ws)"
    Impl->>Params: "GetMaxOutChunkVol() elements"
    Impl->>Params: "GetMaxOutVol() elements"
    Note over Impl: "multiply by element_type_.size() for bytes"
    Impl->>nvcomp: "nvcompBatchedLZ4DecompressGetTempSizeAsync(num_chunks, max_chunk_bytes, params, tempSize, max_total_bytes)"
    nvcomp-->>Impl: "tempSize"
    Impl->>nvcomp: "nvcompBatchedLZ4DecompressAsync(...)"
    nvcomp-->>Impl: "decompressed output"
    Impl->>Impl: "FillTheTails(...)"
Loading

Comments Outside Diff (1)

  1. dali/operators/decoder/inflate/inflate_params.h, line 351-356 (link)

    P2 Stale max-volume on empty batch early-return

    When num_samples == 0 the function returns on line 352 before the two reset assignments, so both max_output_vol_ and max_output_sample_vol_ keep whatever values they held from the previous call. If a non-empty batch is processed first and then an empty batch follows, GetMaxOutChunkVol() / GetMaxOutVol() will return stale (oversized) values. RunImpl does guard against empty batches with an early return, so this doesn't cause a crash today, but it is a latent correctness issue.

Reviews (1): Last reviewed commit: "Reset max output volume. Use size in byt..." | Re-trigger Greptile

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [48015281]: BUILD STARTED

@mzient mzient changed the title Reset max output volume. Use size in bytes, not elements. Fix inflate operator: Reset max output volume. Use size in bytes, not elements. Apr 8, 2026
@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [48015281]: BUILD FAILED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [48015281]: BUILD PASSED

@stiepan stiepan requested review from stiepan April 9, 2026 10:04
@stiepan stiepan assigned stiepan and unassigned stiepan Apr 9, 2026
@jantonguirao jantonguirao self-assigned this Apr 9, 2026
@mzient mzient merged commit a33d8a7 into NVIDIA:main Apr 9, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants