Skip to content

Fix UNION column alias aggregation#630

Merged
collerek merged 3 commits into
macbre:masterfrom
kimjune01:fix/union-column-alias-401
May 14, 2026
Merged

Fix UNION column alias aggregation#630
collerek merged 3 commits into
macbre:masterfrom
kimjune01:fix/union-column-alias-401

Conversation

@kimjune01
Copy link
Copy Markdown
Contributor

Summary

  • When UNION combines queries with the same column alias, columns_aliases now aggregates all source columns into a list instead of silently overwriting
  • Handles deduplication: repeated identical targets stay as a single value, different targets aggregate into a list

Fixes #401

Test plan

  • pytest test/ -v: 270 tests pass
  • New test test_union_column_aliases verifies {"M": ["tab1.A", "tab2.B"]} for UNION ALL with matching aliases
  • Existing UNION tests unaffected

When the same alias appears in multiple UNION branches, aggregate
all source columns into a list instead of overwriting with the last
occurrence.

Before:
  SELECT a.A as M FROM tab1 a UNION SELECT b.B as M FROM tab2 b
  columns_aliases = {"M": "tab2.B"}  # Lost tab1.A

After:
  columns_aliases = {"M": ["tab1.A", "tab2.B"]}

Implementation:
- Modified _Collector.add_alias to check for existing alias entries
- When duplicate found, convert single value to list or append to list
- Deduplicates identical sources (same alias pointing to same column)

Fixes macbre#401
@kimjune01 kimjune01 requested review from collerek and macbre as code owners May 13, 2026 04:13
Comment thread sql_metadata/column_extractor.py Outdated
Comment on lines +203 to +216
self.alias_map[name] = target
if name in self.alias_map:
# Alias already exists — aggregate targets into a list
existing = self.alias_map[name]
if isinstance(existing, list):
# Already a list — append new target
if target not in existing:
existing.append(target)
else:
# Single value — convert to list
if existing != target:
self.alias_map[name] = [existing, target]
else:
# First occurrence — store as-is
self.alias_map[name] = target
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks :)

Although we need some more fixes as this change will nest lists if the alias is referencing multiple columns, since this change changes the global resolution, not only the unions:

def test_union_alias_with_expression_targets():
      # Case A: scalar then list-target → produces a NESTED list
      q1 = """
      SELECT a AS x FROM t1
      UNION ALL
      SELECT b + c AS x FROM t2
      """
      p = Parser(q1)
      # PR currently returns {'x': ['a', ['b', 'c']]} — nested
      assert p.columns_aliases == {"x": ["a", "b", "c"]}

      # Case B: list then list-target → TypeError: unhashable type: 'UniqueList'
      q2 = """
      SELECT a + b AS x FROM t1
      UNION ALL
      SELECT c + d AS x FROM t2
      """
      p = Parser(q2)
      # PR currently raises TypeError: unhashable type: 'UniqueList.'
      assert p.columns_aliases == {"x": ["a", "b", "c", "d"]}

we need something like:

def add_alias(self, name: str, target: Any, clause: str) -> None:
      self.alias_names.append(name)
      if clause:
          self.alias_dict.setdefault(clause, UniqueList()).append(name)
      if target is None:
          return
      existing = self.alias_map.get(name, [])
      merged = UniqueList(existing if isinstance(existing, list) else [existing])
      merged.extend(target if isinstance(target, list) else [target])
      self.alias_map[name] = merged if len(merged) > 1 else merged[0]

Per @collerek review: the prior add_alias logic produced nested lists
when a scalar target was followed by a list target, and raised
TypeError when both targets were UniqueList (unhashable in 'not in').

Adopt the reviewer's suggested implementation: normalize existing into
a UniqueList, extend with target (list or scalar), collapse to scalar
when len==1. Add regression tests for both cases.
@kimjune01
Copy link
Copy Markdown
Contributor Author

Thanks for the careful review @collerek — both regressions reproduce exactly as you described:

  • Case A (SELECT a AS x ... UNION ALL SELECT b + c AS x ...): produced {'x': ['a', ['b', 'c']]} (nested)
  • Case B (SELECT a + b AS x ... UNION ALL SELECT c + d AS x ...): raised TypeError: cannot use 'sql_metadata.utils.UniqueList' as a set element from the target not in existing membership check

Pushed 4ca0cd6 adopting your suggested implementation verbatim. Both regression cases are now covered in test/test_unions.py::test_union_alias_with_expression_targets. Full suite: 271 passed.

@collerek collerek merged commit bd107e7 into macbre:master May 14, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Aliases with "Union"

2 participants