-
-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-76960: Fix urljoin() and urldefrag() for URIs with empty components #123273
base: main
Are you sure you want to change the base?
Conversation
…ponents * urljoin() with relative reference "?" sets empty query and removes fragment. * Preserve empty components (authority, params, query, fragment) in urljoin(). * Preserve empty components (authority, params, query) in urldefrag(). Also refactor the code and get rid of double _coerce_args() and _coerce_result() calls in urljoin(), urldefrag(), urlparse() and urlunparse().
| self.checkJoin(SIMPLE_BASE, 'http:g','http://a/b/c/g') | ||
| self.checkJoin(SIMPLE_BASE, 'http:','http://a/b/c/d') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated below.
| self.checkJoin(SIMPLE_BASE, 'http:','http://a/b/c/d') | ||
| self.checkJoin(SIMPLE_BASE, 'http:?y','http://a/b/c/d?y') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated in new tests below.
| v = SplitResult(scheme or '', netloc or '', url, query or '', fragment or '') | ||
| return _coerce_result(v) | ||
|
|
||
| def _urlsplit(url, scheme=None, allow_fragments=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if fragment is None: | ||
| fragment = bfragment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This contradicts RFC 3986 according to which the target fragment is always set to the relative reference fragment. But this means that an empty string (which has no defined fragment) should remove fragment. I think this is an error in RFC 3986.
|
|
||
| if scheme is None: | ||
| scheme = bscheme | ||
| if scheme != bscheme or scheme not in uses_relative: | ||
| return _coerce_result(url) | ||
| if scheme in uses_netloc: | ||
| if netloc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to RFC 3986, this should be netlog is not None (undefined). But this gives results too different from Ruby and Go (they may be wrong) and from the current Python code. I leave this for a separate issue.
|
@orsenthil, please take a look at this PR. |
Also refactor the code and get rid of double _coerce_args() and _coerce_result() calls in urljoin(), urldefrag(), urlparse() and urlunparse().