Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-76960: Fix urljoin() and urldefrag() for URIs with empty components #123273

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Aug 23, 2024

  • urljoin() with relative reference "?" sets empty query and removes fragment.
  • Preserve empty components (authority, params, query, fragment) in urljoin().
  • Preserve empty components (authority, params, query) in urldefrag().

Also refactor the code and get rid of double _coerce_args() and _coerce_result() calls in urljoin(), urldefrag(), urlparse() and urlunparse().

…ponents

* urljoin() with relative reference "?" sets empty query and removes fragment.
* Preserve empty components (authority, params, query, fragment) in urljoin().
* Preserve empty components (authority, params, query) in urldefrag().

Also refactor the code and get rid of double _coerce_args() and
_coerce_result() calls in urljoin(), urldefrag(), urlparse() and
urlunparse().
Comment on lines -529 to -530
self.checkJoin(SIMPLE_BASE, 'http:g','http://a/b/c/g')
self.checkJoin(SIMPLE_BASE, 'http:','http://a/b/c/d')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated below.

Comment on lines -551 to -552
self.checkJoin(SIMPLE_BASE, 'http:','http://a/b/c/d')
self.checkJoin(SIMPLE_BASE, 'http:?y','http://a/b/c/d?y')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated in new tests below.

v = SplitResult(scheme or '', netloc or '', url, query or '', fragment or '')
return _coerce_result(v)

def _urlsplit(url, scheme=None, allow_fragments=True):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactoring is also a preparation for #67041. These private functions distinguish undefined components (None) from empty components (empty string), but public functions remove this difference. With new options added for #67041 this will be optional.

Comment on lines +592 to +593
if fragment is None:
fragment = bfragment
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contradicts RFC 3986 according to which the target fragment is always set to the relative reference fragment. But this means that an empty string (which has no defined fragment) should remove fragment. I think this is an error in RFC 3986.


if scheme is None:
scheme = bscheme
if scheme != bscheme or scheme not in uses_relative:
return _coerce_result(url)
if scheme in uses_netloc:
if netloc:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to RFC 3986, this should be netlog is not None (undefined). But this gives results too different from Ruby and Go (they may be wrong) and from the current Python code. I leave this for a separate issue.

@serhiy-storchaka
Copy link
Member Author

@orsenthil, please take a look at this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant