Related Template(s)
gs://dataflow-templates-europe-west2/latest/GCS_Text_to_Cloud_Spanner
Template Version
Beam SDK Version: 2.72.0
What happened?
Description
In the GCS_Text_to_Cloud_Spanner template, the invalidOutputPath parameter in TextImportPipeline.java is currently set as optional with an empty default string.
If a user triggers the pipeline with malformed source data (e.g., a decimal value that exceeds Spanner's limits) and does not provide an invalidOutputPath, the Dead Letter Queue attempts to write the failed mutation to a null/empty GCS bucket.
Because there is no pre-flight validation to check if the error bucket exists, the worker crashes during the Finalize step. Dataflow then indefinitely retries this failed bundle, causing the job to hang permanently at 100% completion rather than failing gracefully, leading to runaway compute costs.
Steps to reproduce
- Launch the
GCS_Text_to_Cloud_Spanner template.
- Provide a source CSV that contains a value that Spanner will reject (e.g., a NUMERIC field with 15 decimal places).
- Omit the
invalidOutputPath parameter.
- Observe the worker getting stuck in an infinite retry loop during the
Finalize step.
Expected behavior
The parameter invalidOutputPath should be strictly required. If it is omitted, the pipeline should throw a validation exception and fail immediately at initialization before spinning up workers.
Additional context
I have already written a fix for this by setting optional = false and removing the empty default string on the parameter. I will open a Pull Request linking to this issue shortly!
Relevant log output
INFO 2026-04-21T08:52:37.287Z Finalizing 1 file results
INFO 2026-04-21T08:52:37.311Z Finalizing for destination null num shards 1.
INFO 2026-04-21T08:52:37.334Z Will copy temporary file FileResult{tempFilename=/.temp-beam-2b3a3eea-9e34-4e47-a2a2-43fa6afba2ed/9226b88e6be1a28c-336b-447b-b512-a4ee1140d53b, shard=0, window=GlobalWindow, paneInfo=PaneInfo.NO_FIRING} to final location /-00000-of-00001-7ab1bac4-7dd4-46ea-b816-30bf2fd991eb-spanner-depth-0.csv
{
"insertId": "7194243126373549299:152328:0:59730",
"jsonPayload": {
"work": "7202850336977575572",
"job": "2026-04-21_01_46_59-15019665261852597760",
"message": "Will copy temporary file FileResult{tempFilename=/.temp-beam-2b3a3eea-9e34-4e47-a2a2-43fa6afba2ed/9226b88e6be1a28c-336b-447b-b512-a4ee1140d53b, shard=0, window=GlobalWindow, paneInfo=PaneInfo.NO_FIRING} to final location /-00000-of-00001-7ab1bac4-7dd4-46ea-b816-30bf2fd991eb-spanner-depth-0.csv",
"worker": "option-hedge-2026-04-21-1-04210146-f7qs-harness-7csj",
"thread": "23",
"step": "TextImportTransform/Write failed Spanner records 0/WriteFiles/FinalizeTempFileBundles/Finalize/ParMultiDo(Finalize)",
"logger": "org.apache.beam.sdk.io.FileBasedSink",
"instruction": "process_bundle-7202850336977575572-96",
"portability_worker_id": "sdk-0-0"
},
"resource": {
"type": "dataflow_step",
"labels": {
"job_name": "option_hedge_2026-04-21_1776760795-processing-job",
"step_id": "TextImportTransform/Write failed Spanner records 0/WriteFiles/FinalizeTempFileBundles/Finalize/ParMultiDo(Finalize)",
"job_id": "2026-04-21_01_46_59-15019665261852597760",
"region": "europe-west2",
"project_id": "xo-attara-portal"
}
},
"timestamp": "2026-04-21T08:52:37.334Z",
"severity": "INFO",
"labels": {
"compute.googleapis.com/resource_name": "option-hedge-2026-04-21-1-04210146-f7qs-harness-7csj",
"dataflow.googleapis.com/job_name": "option_hedge_2026-04-21_1776760795-processing-job",
"dataflow.googleapis.com/job_id": "2026-04-21_01_46_59-15019665261852597760",
"dataflow.googleapis.com/region": "europe-west2",
"compute.googleapis.com/resource_type": "instance",
"compute.googleapis.com/resource_id": "7194243126373549299",
"dataflow.googleapis.com/log_type": "supportability"
},
"logName": "projects/xo-attara-portal/logs/dataflow.googleapis.com%2Fworker",
"receiveTimestamp": "2026-04-21T08:52:57.067702626Z"
}
Related Template(s)
gs://dataflow-templates-europe-west2/latest/GCS_Text_to_Cloud_Spanner
Template Version
Beam SDK Version: 2.72.0
What happened?
Description
In the
GCS_Text_to_Cloud_Spannertemplate, theinvalidOutputPathparameter inTextImportPipeline.javais currently set as optional with an empty default string.If a user triggers the pipeline with malformed source data (e.g., a decimal value that exceeds Spanner's limits) and does not provide an
invalidOutputPath, the Dead Letter Queue attempts to write the failed mutation to a null/empty GCS bucket.Because there is no pre-flight validation to check if the error bucket exists, the worker crashes during the
Finalizestep. Dataflow then indefinitely retries this failed bundle, causing the job to hang permanently at 100% completion rather than failing gracefully, leading to runaway compute costs.Steps to reproduce
GCS_Text_to_Cloud_Spannertemplate.invalidOutputPathparameter.Finalizestep.Expected behavior
The parameter
invalidOutputPathshould be strictly required. If it is omitted, the pipeline should throw a validation exception and fail immediately at initialization before spinning up workers.Additional context
I have already written a fix for this by setting
optional = falseand removing the empty default string on the parameter. I will open a Pull Request linking to this issue shortly!Relevant log output