Adding mutation batcher and row assembler in cdc data generator#3762
Open
shreyakhajanchi wants to merge 2 commits intoGoogleCloudPlatform:mainfrom
Open
Adding mutation batcher and row assembler in cdc data generator#3762shreyakhajanchi wants to merge 2 commits intoGoogleCloudPlatform:mainfrom
shreyakhajanchi wants to merge 2 commits intoGoogleCloudPlatform:mainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3762 +/- ##
============================================
+ Coverage 53.16% 53.28% +0.12%
+ Complexity 6490 6162 -328
============================================
Files 1075 1079 +4
Lines 65233 65490 +257
Branches 7230 7289 +59
============================================
+ Hits 34680 34896 +216
- Misses 28223 28243 +20
- Partials 2330 2351 +21
🚀 New features to boost your workflow:
|
|
Warning Gemini encountered an error creating the summary. You can try again by commenting |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📝 Description
This PR introduces the
MutationBatcherandRowAssemblercomponents into thev2/cdc-data-generatormodule, separating row synthesis from transient buffering logic. It accumulates and chunks mutations in worker memory based on size limits partitioned per table, shard, and operation type (INSERT,UPDATE,DELETE).Additionally, it implements
FailureRecordto uniformly format serialization/writer errors into JSON lines for dead-letter queue (DLQ) storage.Changes
Core Production Classes
BufferKey.java[NEW]: AutoValue key structure for buffer group partitioning.MutationBatcher.java[NEW]: Accumulates rows, drives local thresholds, and manages transactional flushes.RowAssembler.java[NEW]: Pure-function helpers for updating and deleting BeamRowassembly.FailureRecord.java[NEW]: Uniform serialization wrapper for DLQ errors.DataGeneratorUtils.java[MODIFY]: Embedded hex conversion helpers (canonicalizeValue,bytesToHex).Unit Test Suites
MutationBatcherTest.java[NEW]: Validates batch thresholds, selective update flushes, and custom connection constraints.RowAssemblerTest.java[NEW]: Asserts field schema mapping rules.FailureRecordTest.java[NEW]: Verifies JSON conversion attributes across raw byte arrays and nulls.