Skip to content

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Jun 4, 2025

This commit introduces the ST_LENGTH function for BigQuery DataFrames. ST_LENGTH computes the length of GEOGRAPHY objects in meters.

The implementation includes:

  • A new operation geo_st_length_op in bigframes.operations.geo_ops.
  • The user-facing function st_length in bigframes.bigquery._operations.geo.
  • Exposure of the new operation and function in relevant __init__.py files.
  • Comprehensive unit tests covering various geometry types (Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection), empty geographies, and NULL inputs.

The function behaves as per the BigQuery ST_LENGTH documentation:

  • Returns 0 for POINT, MULTIPOINT, and empty GEOGRAPHYs.
  • Returns the perimeter for POLYGON and MULTIPOLYGON.
  • Returns the total length for LINESTRING and MULTILINESTRING.
  • For GEOMETRYCOLLECTION, sums the lengths/perimeters of its constituent linestrings and polygons.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

This commit introduces the ST_LENGTH function for BigQuery DataFrames.
ST_LENGTH computes the length of GEOGRAPHY objects in meters.

The implementation includes:
- A new operation `geo_st_length_op` in `bigframes.operations.geo_ops`.
- The user-facing function `st_length` in `bigframes.bigquery._operations.geo`.
- Exposure of the new operation and function in relevant `__init__.py` files.
- Comprehensive unit tests covering various geometry types (Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection), empty geographies, and NULL inputs.

The function behaves as per the BigQuery ST_LENGTH documentation:
- Returns 0 for POINT, MULTIPOINT, and empty GEOGRAPHYs.
- Returns the perimeter for POLYGON and MULTIPOLYGON.
- Returns the total length for LINESTRING and MULTILINESTRING.
- For GEOMETRYCOLLECTION, sums the lengths/perimeters of its constituent linestrings and polygons.
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jun 4, 2025
This commit adds a `length` property to the `GeoSeries` class.
Accessing this property will raise a `NotImplementedError`, guiding you to utilize the `bigframes.bigquery.st_length()` function instead.

This change includes:
- The `length` property in `bigframes/geopandas/geoseries.py`.
- A unit test in `tests/system/small/geopandas/test_geoseries.py` to verify that the correct error is raised with the specified message when `GeoSeries.length` is accessed.
This commit applies a user-provided patch that includes:
- Removing `st_length` from `bigframes/bigquery/_operations/__init__.py`.
- Adding an Ibis implementation for `geo_st_length_op` in `bigframes/core/compile/scalar_op_compiler.py`.
- Modifying `KMeans` in `bigframes/ml/cluster.py` to handle `init="k-means++"`.
- Updating geo tests in `tests/system/small/bigquery/test_geo.py` to use `to_pandas()` and `pd.testing.assert_series_equal`.

Note: System tests requiring Google Cloud authentication were not executed due to limitations in my current environment.
This commit introduces the `use_spheroid` parameter to the `ST_LENGTH`
geography function, aligning it more closely with the BigQuery
ST_LENGTH(geography_expression[, use_spheroid]) signature.

Key changes:
- `bigframes.operations.geo_ops.GeoStLengthOp` is now a dataclass
  that accepts `use_spheroid` (defaulting to `False`). A check is
  included to raise `NotImplementedError` if `use_spheroid` is `True`,
  as this is the current limitation in BigQuery.
- The Ibis compiler implementation for `geo_st_length_op` in
  `bigframes.core.compile.scalar_op_compiler.py` has been updated
  to accept the new `GeoStLengthOp` operator type.
- The user-facing `st_length` function in
  `bigframes.bigquery._operations.geo.py` now includes the
  `use_spheroid` keyword argument.
- The docstring for `st_length` has been updated to match the
  official BigQuery documentation, clarifying that only lines contribute
  to the length (points and polygons result in 0 length), and
  detailing the `use_spheroid` parameter. Examples have been
  updated accordingly.
- Tests in `tests/system/small/bigquery/test_geo.py` have been
  updated to:
    - Reflect the correct behavior (0 length for polygons/points).
    - Test calls with both default `use_spheroid` and explicit
      `use_spheroid=False`.
    - Verify that `use_spheroid=True` raises a `NotImplementedError`.

Note: System tests requiring Google Cloud authentication were not
re-executed for this specific commit due to environment limitations
identified in previous steps. The changes primarily affect the operator
definition, function signature, and client-side validation, with the
core Ibis compilation logic for length remaining unchanged.
This commit refactors the ST_LENGTH implementation to correctly
pass the `use_spheroid` parameter to BigQuery by using Ibis's
`ibis_udf.scalar.builtin('ST_LENGTH', ...)` function.

Key changes:
- `bigframes.operations.geo_ops.GeoStLengthOp`: The client-side
  `NotImplementedError` for `use_spheroid=True` (raised in
  `__post_init__`) has been removed. BigQuery DataFrames will now
  pass this parameter directly to BigQuery.
- `bigframes.core.compile.scalar_op_compiler.geo_length_op_impl`:
  The implementation now always uses
  `ibis_udf.scalar.builtin('ST_LENGTH', x, op.use_spheroid)`
  instead of `x.length()`. This ensures the `use_spheroid`
  parameter is included in the SQL generated for BigQuery.
- `tests/system/small/bigquery/test_geo.py`:
    - The test expecting a client-side `NotImplementedError` for
      `use_spheroid=True` has been removed.
    - A new test `test_st_length_use_spheroid_true_errors_from_bq`
      has been added. This test calls `st_length` with
      `use_spheroid=True` and asserts that an exception is raised
      from BigQuery, as BigQuery itself currently only supports
      `use_spheroid=False` for the `ST_LENGTH` function.
    - Existing tests for `st_length` were already updated in a
      previous commit to reflect that only line geometries contribute
      to the length, and these continue to verify behavior with
      `use_spheroid=False`.

This change ensures that BigQuery DataFrames accurately reflects BigQuery's
`ST_LENGTH` capabilities concerning the `use_spheroid` parameter.
This commit refactors the ST_LENGTH geography operation to use an
Ibis UDF defined via `@ibis_udf.scalar.builtin`. This aligns with
the pattern exemplified by other built-in functions like ST_DISTANCE
when a direct Ibis method with all necessary parameters is not available.

Key changes:
- A new `st_length` function is defined in
  `bigframes/core/compile/scalar_op_compiler.py` using
  `@ibis_udf.scalar.builtin`. This UDF maps to BigQuery's
  `ST_LENGTH(geography, use_spheroid)` function.
- The `geo_length_op_impl` in the same file now calls this
  `st_length` Ibis UDF, replacing the previous use of
  `op_typing.ibis_function`.
- The `GeoStLengthOp` in `bigframes/operations/geo_ops.py` and
  the user-facing `st_length` function in
  `bigframes/bigquery/_operations/geo.py` remain unchanged from
  the previous version, as they correctly define the operation's
  interface and parameters.

This change provides a cleaner and more direct way to map the
BigQuery DataFrames operation to the specific BigQuery ST_LENGTH
SQL function signature, while maintaining the existing BigQuery DataFrames
operation structure. The behavior of the `st_length` function,
including its handling of the `use_spheroid` parameter and error
conditions from BigQuery, remains the same.
This commit refactors the system tests for the `st_length` geography
function in `tests/system/small/bigquery/test_geo.py`.

The numerous individual test cases for different geometry types
have been combined into a single, comprehensive test function
`test_st_length_various_geometries`. This new test uses a single
GeoSeries with a variety of inputs (Point, LineString, Polygon,
MultiPoint, MultiLineString, MultiPolygon, GeometryCollection,
None/Empty) and compares the output of `st_length` (with both
default and explicit `use_spheroid=False`) against a pandas Series
of expected lengths.

This consolidation improves the conciseness and maintainability of
the tests for `st_length`. The test for `use_spheroid=True`
(expecting an error from BigQuery) remains separate.
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: l Pull request size is large. labels Jun 5, 2025
google-labs-jules bot and others added 5 commits June 5, 2025 17:30
This commit fixes an ImportError caused by an incorrect name being
used for the ST_LENGTH geography operator in `bigframes/operations/__init__.py`.

When `geo_st_length_op` (a variable) was replaced by the dataclass
`GeoStLengthOp`, the import and `__all__` list in this `__init__.py`
file were not updated. This commit changes the import from `.geo_ops`
to correctly import `GeoStLengthOp` and updates the `__all__` list
to export `GeoStLengthOp`.
@tswast tswast marked this pull request as ready for review June 5, 2025 18:02
@tswast tswast requested review from a team as code owners June 5, 2025 18:02
@tswast tswast requested a review from chelsea-lin June 5, 2025 18:02
@tswast
Copy link
Collaborator Author

tswast commented Jun 5, 2025

@chelsea-lin This is mostly generated via jules.google.com/ but I patched some of the tests to make it work. I think this now ready for review.

) -> bigframes.series.Series:
"""Returns the total length in meters of the lines in the input GEOGRAPHY.
If geography_expression is a point or a polygon, returns zero. If
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update geography_expression into series

bigframes.series.Series:
Series of floats representing the lengths in meters.
"""
series = series._apply_unary_op(ops.GeoStLengthOp(use_spheroid=use_spheroid))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC from the docstring, we don't support for a True value of use_spheroid. Can you please throw a NotImplementedError here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, why we cannot support a True value of use_spheroid here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a server-side limitation: https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions#st_length

I'd rather avoid any client-side checks, as maybe someday the server side will implement this feature.

I'm not sure why they have the parameter at all if it's not implemented, to be honest.

@tswast tswast requested a review from chelsea-lin June 5, 2025 19:27
chelsea-lin
chelsea-lin previously approved these changes Jun 5, 2025
@tswast
Copy link
Collaborator Author

tswast commented Jun 6, 2025

Looks like this broke some other functions. I need to take a closer look:

__________________________________ test_geo_x __________________________________
[gw6] linux -- Python 3.9.20 /tmpfs/src/github/python-bigquery-dataframes/.nox/system-3-9/bin/python

urban_areas_dfs = (       urban_area_code                      name  \
geo_id                                             
0            ..., -83.74099 ...
84630            84630  ...  MULTIPOLYGON (((-79.13478 38.15216, -79.13484 ...

[22 rows x 13 columns])

    def test_geo_x(urban_areas_dfs):
        bf_ua, pd_ua = urban_areas_dfs
        bf_series: bigframes.geopandas.GeoSeries = bf_ua["internal_point_geom"].geo
        pd_series: geopandas.GeoSeries = geopandas.GeoSeries(pd_ua["internal_point_geom"])
>       bf_result = bf_series.x.to_pandas()

[tests/system/small/geopandas/test_geoseries.py:48](https://cs.corp.google.com/piper///depot/google3/tests/system/small/geopandas/test_geoseries.py?l=48): 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[bigframes/core/log_adapter.py:175](https://cs.corp.google.com/piper///depot/google3/bigframes/core/log_adapter.py?l=175): in wrapper
    return method(*args, **kwargs)
[bigframes/series.py:577](https://cs.corp.google.com/piper///depot/google3/bigframes/series.py?l=577): in to_pandas
    df, query_job = self._block.to_pandas(
[bigframes/core/blocks.py:531](https://cs.corp.google.com/piper///depot/google3/bigframes/core/blocks.py?l=531): in to_pandas
    df, query_job = self._materialize_local(
[bigframes/core/blocks.py:619](https://cs.corp.google.com/piper///depot/google3/bigframes/core/blocks.py?l=619): in _materialize_local
    execute_result = self.session._executor.execute(
[bigframes/session/bq_caching_executor.py:196](https://cs.corp.google.com/piper///depot/google3/bigframes/session/bq_caching_executor.py?l=196): in execute
    return self._execute_plan(
[bigframes/session/bq_caching_executor.py:661](https://cs.corp.google.com/piper///depot/google3/bigframes/session/bq_caching_executor.py?l=661): in _execute_plan
    compiled = compile.compile_sql(
[bigframes/core/compile/compiler.py:60](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiler.py?l=60): in compile_sql
    sql = compile_result_node(result_node)
[bigframes/core/compile/compiler.py:87](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiler.py?l=87): in compile_result_node
    return compile_node(root.child).to_sql(
[bigframes/core/compile/compiler.py:98](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiler.py?l=98): in compile_node
    return node.reduce_up(lambda node, children: _compile_node(node, *children))
[bigframes/core/bigframe_node.py:376](https://cs.corp.google.com/piper///depot/google3/bigframes/core/bigframe_node.py?l=376): in reduce_up
    result = reduction(node, child_results)
[bigframes/core/compile/compiler.py:98](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiler.py?l=98): in <lambda>
    return node.reduce_up(lambda node, children: _compile_node(node, *children))
/[usr/local/lib/python3.9/functools.py:888](https://cs.corp.google.com/piper///depot/google3/usr/local/lib/python3.9/functools.py?l=888): in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
[bigframes/core/compile/compiler.py:243](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiler.py?l=243): in compile_projection
    return child.projection(tuple(projections))
[bigframes/core/compile/compiled.py:129](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiled.py?l=129): in projection
    new_values = [
[bigframes/core/compile/compiled.py:130](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/compiled.py?l=130): in <listcomp>
    op_compiler.compile_expression(expression, bindings).name(id)
/[usr/local/lib/python3.9/functools.py:938](https://cs.corp.google.com/piper///depot/google3/usr/local/lib/python3.9/functools.py?l=938): in _method
    return method.__get__(obj, cls)(*args, **kwargs)
[bigframes/core/compile/scalar_op_compiler.py:102](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/scalar_op_compiler.py?l=102): in _
    return self.compile_row_op(expression.op, inputs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <bigframes.core.compile.scalar_op_compiler.ScalarOpCompiler object at 0x1478cde8d580>
op = geo_x()
inputs = [r0 := SQLQueryResult
  query:
    SELECT `geo_id`, `internal_point_geom` FROM `bigframes- …
  schema:
    geo_id     ...ial:geography;4326
    urban_area_geom          geospatial:geography;4326

internal_point_geom: r0.internal_point_geom]

    def compile_row_op(
        self, op: ops.RowOp, inputs: typing.Sequence[ibis_types.Value]
    ) -> ibis_types.Value:
>       impl = self._registry[op.name]
E       KeyError: 'geo_x'

[bigframes/core/compile/scalar_op_compiler.py:107](https://cs.corp.google.com/piper///depot/google3/bigframes/core/compile/scalar_op_compiler.py?l=107): KeyError
_____________________________ test_geo_x_non_point _____________________________
[gw6] linux -- Python 3.9.20 /tmpfs/src/github/python-bigquery-dataframes/.nox/system-3-9/bin/python

urban_areas_dfs = (       urban_area_code                      name  \
geo_id                                             
0            ..., -83.74099 ...
84630            84630  ...  MULTIPOLYGON (((-79.13478 38.15216, -79.13484 ...

[22 rows x 13 columns])

    def test_geo_x_non_point(urban_areas_dfs):
        bf_ua, _ = urban_areas_dfs
        bf_series: bigframes.geopandas.GeoSeries = bf_ua["urban_area_geom"].geo
    
        with pytest.raises(google.api_core.exceptions.BadRequest, match="ST_X"):
>           bf_series.x.to_pandas()

[tests/system/small/geopandas/test_geoseries.py:62](https://cs.corp.google.com/piper///depot/google3/tests/system/small/geopandas/test_geoseries.py?l=62):

@tswast tswast merged commit c5b7fda into main Jun 9, 2025
22 of 24 checks passed
@tswast tswast deleted the feat-st-length branch June 9, 2025 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants