Skip to content

[Bug]: GcsTextToCloudSpanner hangs in infinite loop when invalidOutputPath is omitted #3722

@alihelmy27

Description

@alihelmy27

Related Template(s)

gs://dataflow-templates-europe-west2/latest/GCS_Text_to_Cloud_Spanner

Template Version

Beam SDK Version: 2.72.0

What happened?

Description
In the GCS_Text_to_Cloud_Spanner template, the invalidOutputPath parameter in TextImportPipeline.java is currently set as optional with an empty default string.

If a user triggers the pipeline with malformed source data (e.g., a decimal value that exceeds Spanner's limits) and does not provide an invalidOutputPath, the Dead Letter Queue attempts to write the failed mutation to a null/empty GCS bucket.

Because there is no pre-flight validation to check if the error bucket exists, the worker crashes during the Finalize step. Dataflow then indefinitely retries this failed bundle, causing the job to hang permanently at 100% completion rather than failing gracefully, leading to runaway compute costs.

Steps to reproduce

  1. Launch the GCS_Text_to_Cloud_Spanner template.
  2. Provide a source CSV that contains a value that Spanner will reject (e.g., a NUMERIC field with 15 decimal places).
  3. Omit the invalidOutputPath parameter.
  4. Observe the worker getting stuck in an infinite retry loop during the Finalize step.

Expected behavior
The parameter invalidOutputPath should be strictly required. If it is omitted, the pipeline should throw a validation exception and fail immediately at initialization before spinning up workers.

Additional context
I have already written a fix for this by setting optional = false and removing the empty default string on the parameter. I will open a Pull Request linking to this issue shortly!

Relevant log output

INFO 2026-04-21T08:52:37.287Z Finalizing 1 file results

INFO 2026-04-21T08:52:37.311Z Finalizing for destination null num shards 1.

INFO 2026-04-21T08:52:37.334Z Will copy temporary file FileResult{tempFilename=/.temp-beam-2b3a3eea-9e34-4e47-a2a2-43fa6afba2ed/9226b88e6be1a28c-336b-447b-b512-a4ee1140d53b, shard=0, window=GlobalWindow, paneInfo=PaneInfo.NO_FIRING} to final location /-00000-of-00001-7ab1bac4-7dd4-46ea-b816-30bf2fd991eb-spanner-depth-0.csv

  {

    "insertId": "7194243126373549299:152328:0:59730",

    "jsonPayload": {

      "work": "7202850336977575572",

      "job": "2026-04-21_01_46_59-15019665261852597760",

      "message": "Will copy temporary file FileResult{tempFilename=/.temp-beam-2b3a3eea-9e34-4e47-a2a2-43fa6afba2ed/9226b88e6be1a28c-336b-447b-b512-a4ee1140d53b, shard=0, window=GlobalWindow, paneInfo=PaneInfo.NO_FIRING} to final location /-00000-of-00001-7ab1bac4-7dd4-46ea-b816-30bf2fd991eb-spanner-depth-0.csv",

      "worker": "option-hedge-2026-04-21-1-04210146-f7qs-harness-7csj",

      "thread": "23",

      "step": "TextImportTransform/Write failed Spanner records 0/WriteFiles/FinalizeTempFileBundles/Finalize/ParMultiDo(Finalize)",

      "logger": "org.apache.beam.sdk.io.FileBasedSink",

      "instruction": "process_bundle-7202850336977575572-96",

      "portability_worker_id": "sdk-0-0"

    },

    "resource": {

      "type": "dataflow_step",

      "labels": {

        "job_name": "option_hedge_2026-04-21_1776760795-processing-job",

        "step_id": "TextImportTransform/Write failed Spanner records 0/WriteFiles/FinalizeTempFileBundles/Finalize/ParMultiDo(Finalize)",

        "job_id": "2026-04-21_01_46_59-15019665261852597760",

        "region": "europe-west2",

        "project_id": "xo-attara-portal"

      }

    },

    "timestamp": "2026-04-21T08:52:37.334Z",

    "severity": "INFO",

    "labels": {

      "compute.googleapis.com/resource_name": "option-hedge-2026-04-21-1-04210146-f7qs-harness-7csj",

      "dataflow.googleapis.com/job_name": "option_hedge_2026-04-21_1776760795-processing-job",

      "dataflow.googleapis.com/job_id": "2026-04-21_01_46_59-15019665261852597760",

      "dataflow.googleapis.com/region": "europe-west2",

      "compute.googleapis.com/resource_type": "instance",

      "compute.googleapis.com/resource_id": "7194243126373549299",

      "dataflow.googleapis.com/log_type": "supportability"

    },

    "logName": "projects/xo-attara-portal/logs/dataflow.googleapis.com%2Fworker",

    "receiveTimestamp": "2026-04-21T08:52:57.067702626Z"

  }

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingp2

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions