From 7333458ee455c5ab0cd5fdf34b80b638c22a7268 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu, 19 Jun 2025 05:26:53 +0000 Subject: [PATCH 001/148] chore(deps): bump urllib3 from 2.4.0 to 2.5.0 Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.4.0 to 2.5.0. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](https://github.com/urllib3/urllib3/compare/2.4.0...2.5.0) --- updated-dependencies: - dependency-name: urllib3 dependency-version: 2.5.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index 21eea04..a409656 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -34,6 +34,6 @@ setuptools==80.9.0 six==1.17.0 tqdm==4.67.1 twine==6.1.0 -urllib3==2.4.0 +urllib3==2.5.0 webencodings==0.5.1 zipp==3.23.0 From fb8945fc094cb9087a23c2f81826b0fc5d521b2c Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 23 Jun 2025 14:34:26 +0000 Subject: [PATCH 002/148] chore(deps): bump the python-packages group with 5 updates Bumps the python-packages group with 5 updates: | Package | From | To | | --- | --- | --- | | [flake8](https://github.com/pycqa/flake8) | `7.2.0` | `7.3.0` | | [pycodestyle](https://github.com/PyCQA/pycodestyle) | `2.13.0` | `2.14.0` | | [pyflakes](https://github.com/PyCQA/pyflakes) | `3.3.2` | `3.4.0` | | [pygments](https://github.com/pygments/pygments) | `2.19.1` | `2.19.2` | | [urllib3](https://github.com/urllib3/urllib3) | `2.4.0` | `2.5.0` | Updates `flake8` from 7.2.0 to 7.3.0 - [Commits](https://github.com/pycqa/flake8/compare/7.2.0...7.3.0) Updates `pycodestyle` from 2.13.0 to 2.14.0 - [Release notes](https://github.com/PyCQA/pycodestyle/releases) - [Changelog](https://github.com/PyCQA/pycodestyle/blob/main/CHANGES.txt) - [Commits](https://github.com/PyCQA/pycodestyle/compare/2.13.0...2.14.0) Updates `pyflakes` from 3.3.2 to 3.4.0 - [Changelog](https://github.com/PyCQA/pyflakes/blob/main/NEWS.rst) - [Commits](https://github.com/PyCQA/pyflakes/compare/3.3.2...3.4.0) Updates `pygments` from 2.19.1 to 2.19.2 - [Release notes](https://github.com/pygments/pygments/releases) - [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES) - [Commits](https://github.com/pygments/pygments/compare/2.19.1...2.19.2) Updates `urllib3` from 2.4.0 to 2.5.0 - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](https://github.com/urllib3/urllib3/compare/2.4.0...2.5.0) --- updated-dependencies: - dependency-name: flake8 dependency-version: 7.3.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages - dependency-name: pycodestyle dependency-version: 2.14.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages - dependency-name: pyflakes dependency-version: 3.4.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages - dependency-name: pygments dependency-version: 2.19.2 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages - dependency-name: urllib3 dependency-version: 2.5.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/release-requirements.txt b/release-requirements.txt index 21eea04..2f5a899 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -6,7 +6,7 @@ charset-normalizer==3.4.2 click==8.1.8 colorama==0.4.6 docutils==0.21.2 -flake8==7.2.0 +flake8==7.3.0 gitchangelog==3.0.4 idna==3.10 importlib-metadata==8.7.0 @@ -21,9 +21,9 @@ packaging==25.0 pathspec==0.12.1 pkginfo==1.12.1.2 platformdirs==4.3.8 -pycodestyle==2.13.0 -pyflakes==3.3.2 -Pygments==2.19.1 +pycodestyle==2.14.0 +pyflakes==3.4.0 +Pygments==2.19.2 readme-renderer==44.0 requests==2.32.4 requests-toolbelt==1.0.0 @@ -34,6 +34,6 @@ setuptools==80.9.0 six==1.17.0 tqdm==4.67.1 twine==6.1.0 -urllib3==2.4.0 +urllib3==2.5.0 webencodings==0.5.1 zipp==3.23.0 From 175ac19be683d5aa8b614aa5da8d1a4912050ccc Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed, 9 Jul 2025 13:55:16 +0000 Subject: [PATCH 003/148] chore(deps): bump certifi in the python-packages group Bumps the python-packages group with 1 update: [certifi](https://github.com/certifi/python-certifi). Updates `certifi` from 2025.6.15 to 2025.7.9 - [Commits](https://github.com/certifi/python-certifi/compare/2025.06.15...2025.07.09) --- updated-dependencies: - dependency-name: certifi dependency-version: 2025.7.9 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index 2f5a899..1c766de 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -1,7 +1,7 @@ autopep8==2.3.2 black==25.1.0 bleach==6.2.0 -certifi==2025.6.15 +certifi==2025.7.9 charset-normalizer==3.4.2 click==8.1.8 colorama==0.4.6 From 1bad563e3f23d3d8b9f98721d857a660692f4847 Mon Sep 17 00:00:00 2001 From: Eric Wheeler Date: Sat, 19 Jul 2025 17:17:58 -0700 Subject: [PATCH 004/148] Add conditional check for git checkout in development path Only insert development path into sys.path when running from a git checkout (when ../.git exists). This makes the script more robust by only using the development tree when available and falling back to installed package otherwise. --- bin/github-backup | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/bin/github-backup b/bin/github-backup index b33d19f..c6116a1 100755 --- a/bin/github-backup +++ b/bin/github-backup @@ -4,6 +4,15 @@ import logging import os import sys + +# If we are running from a git-checkout, we can run against the development +# tree without installing. +if os.path.exists(os.path.join(os.path.dirname(__file__), "..", ".git")): + sys.path.insert( + 0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..")) + ) + + from github_backup.github_backup import ( backup_account, backup_repositories, From d820dd994d931f8dbead5e63dceef5c5b49bafa3 Mon Sep 17 00:00:00 2001 From: Eric Wheeler Date: Sat, 19 Jul 2025 17:28:52 -0700 Subject: [PATCH 005/148] Fix -R flag to allow backups of repositories not owned by user Previously, using -R flag would show zero issues/PRs for repositories not owned by the primary user due to incorrect pagination parameters being added to single repository API calls. - Remove pagination parameters for single repository requests - Support owner/repo format in -R flag (e.g., -R owner/repo-name) - Skip filtering when specific repository is requested - Fix URL construction for requests without query parameters This enables backing up any repository, not just those owned by the primary user specified in -u flag. --- github_backup/github_backup.py | 49 +++++++++++++++++++++++++--------- 1 file changed, 36 insertions(+), 13 deletions(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 29c9e58..4b2d790 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -578,10 +578,15 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False): page = 0 while True: - page = page + 1 + if single_request: + request_page, request_per_page = None, None + else: + page = page + 1 + request_page, request_per_page = page, per_page + request = _construct_request( - per_page, - page, + request_per_page, + request_page, query_args, template, auth, @@ -715,14 +720,22 @@ def _get_response(request, auth, template): def _construct_request( per_page, page, query_args, template, auth, as_app=None, fine=False ): - querystring = urlencode( - dict( - list({"per_page": per_page, "page": page}.items()) - + list(query_args.items()) - ) - ) + all_query_args = {} + if per_page: + all_query_args["per_page"] = per_page + if page: + all_query_args["page"] = page + if query_args: + all_query_args.update(query_args) + + request_url = template + if all_query_args: + querystring = urlencode(all_query_args) + request_url = template + "?" + querystring + else: + querystring = "" - request = Request(template + "?" + querystring) + request = Request(request_url) if auth is not None: if not as_app: if fine: @@ -735,7 +748,11 @@ def _construct_request( request.add_header( "Accept", "application/vnd.github.machine-man-preview+json" ) - logger.info("Requesting {}?{}".format(template, querystring)) + + log_url = template + if querystring: + log_url += "?" + querystring + logger.info("Requesting {}".format(log_url)) return request @@ -885,9 +902,13 @@ def retrieve_repositories(args, authenticated_user): ) if args.repository: + if "/" in args.repository: + repo_path = args.repository + else: + repo_path = "{0}/{1}".format(args.user, args.repository) single_request = True - template = "https://{0}/repos/{1}/{2}".format( - get_github_api_host(args), args.user, args.repository + template = "https://{0}/repos/{1}".format( + get_github_api_host(args), repo_path ) repos = retrieve_data(args, template, single_request=single_request) @@ -928,6 +949,8 @@ def retrieve_repositories(args, authenticated_user): def filter_repositories(args, unfiltered_repositories): + if args.repository: + return unfiltered_repositories logger.info("Filtering repositories") repositories = [] From a4f15b06d94c0481861a3cd149f3ac5b10fbefa7 Mon Sep 17 00:00:00 2001 From: Eric Wheeler Date: Fri, 25 Jul 2025 11:47:08 -0700 Subject: [PATCH 006/148] Revert "Add conditional check for git checkout in development path" This reverts commit 1bad563e3f23d3d8b9f98721d857a660692f4847. --- bin/github-backup | 9 --------- 1 file changed, 9 deletions(-) diff --git a/bin/github-backup b/bin/github-backup index c6116a1..b33d19f 100755 --- a/bin/github-backup +++ b/bin/github-backup @@ -4,15 +4,6 @@ import logging import os import sys - -# If we are running from a git-checkout, we can run against the development -# tree without installing. -if os.path.exists(os.path.join(os.path.dirname(__file__), "..", ".git")): - sys.path.insert( - 0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..")) - ) - - from github_backup.github_backup import ( backup_account, backup_repositories, From 82c1fc30864a23599af5a285a0a2fc1201d59f03 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed, 30 Jul 2025 13:49:49 +0000 Subject: [PATCH 007/148] chore(deps): bump the python-packages group across 1 directory with 3 updates Bumps the python-packages group with 3 updates in the / directory: [certifi](https://github.com/certifi/python-certifi), [docutils](https://github.com/rtfd/recommonmark) and [rich](https://github.com/Textualize/rich). Updates `certifi` from 2025.7.9 to 2025.7.14 - [Commits](https://github.com/certifi/python-certifi/compare/2025.07.09...2025.07.14) Updates `docutils` from 0.21.2 to 0.22 - [Changelog](https://github.com/readthedocs/recommonmark/blob/master/CHANGELOG.md) - [Commits](https://github.com/rtfd/recommonmark/commits) Updates `rich` from 14.0.0 to 14.1.0 - [Release notes](https://github.com/Textualize/rich/releases) - [Changelog](https://github.com/Textualize/rich/blob/master/CHANGELOG.md) - [Commits](https://github.com/Textualize/rich/compare/v14.0.0...v14.1.0) --- updated-dependencies: - dependency-name: certifi dependency-version: 2025.7.14 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages - dependency-name: docutils dependency-version: '0.22' dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages - dependency-name: rich dependency-version: 14.1.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/release-requirements.txt b/release-requirements.txt index 1c766de..788fa95 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -1,11 +1,11 @@ autopep8==2.3.2 black==25.1.0 bleach==6.2.0 -certifi==2025.7.9 +certifi==2025.7.14 charset-normalizer==3.4.2 click==8.1.8 colorama==0.4.6 -docutils==0.21.2 +docutils==0.22 flake8==7.3.0 gitchangelog==3.0.4 idna==3.10 @@ -29,7 +29,7 @@ requests==2.32.4 requests-toolbelt==1.0.0 restructuredtext-lint==1.4.0 rfc3986==2.0.0 -rich==14.0.0 +rich==14.1.0 setuptools==80.9.0 six==1.17.0 tqdm==4.67.1 From 5f07157c9b417c538ead38a1902035e0ac45188f Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Fri, 8 Aug 2025 20:41:53 +0000 Subject: [PATCH 008/148] Release version 0.50.3 --- CHANGES.rst | 160 +++++++++++++++++++++++++++++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 160 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 2fddca5..960977f 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,9 +1,167 @@ Changelog ========= -0.50.2 (2025-06-16) +0.50.3 (2025-08-08) ------------------- ------------------------ +- Revert "Add conditional check for git checkout in development path" + [Eric Wheeler] + + This reverts commit 1bad563e3f23d3d8b9f98721d857a660692f4847. +- Fix -R flag to allow backups of repositories not owned by user. [Eric + Wheeler] + + Previously, using -R flag would show zero issues/PRs for repositories + not owned by the primary user due to incorrect pagination parameters + being added to single repository API calls. + + - Remove pagination parameters for single repository requests + - Support owner/repo format in -R flag (e.g., -R owner/repo-name) + - Skip filtering when specific repository is requested + - Fix URL construction for requests without query parameters + + This enables backing up any repository, not just those owned by the + primary user specified in -u flag. +- Add conditional check for git checkout in development path. [Eric + Wheeler] + + Only insert development path into sys.path when running from a git checkout + (when ../.git exists). This makes the script more robust by only using the + development tree when available and falling back to installed package otherwise. +- Chore(deps): bump the python-packages group across 1 directory with 3 + updates. [dependabot[bot]] + + Bumps the python-packages group with 3 updates in the / directory: [certifi](https://github.com/certifi/python-certifi), [docutils](https://github.com/rtfd/recommonmark) and [rich](https://github.com/Textualize/rich). + + + Updates `certifi` from 2025.7.9 to 2025.7.14 + - [Commits](https://github.com/certifi/python-certifi/compare/2025.07.09...2025.07.14) + + Updates `docutils` from 0.21.2 to 0.22 + - [Changelog](https://github.com/readthedocs/recommonmark/blob/master/CHANGELOG.md) + - [Commits](https://github.com/rtfd/recommonmark/commits) + + Updates `rich` from 14.0.0 to 14.1.0 + - [Release notes](https://github.com/Textualize/rich/releases) + - [Changelog](https://github.com/Textualize/rich/blob/master/CHANGELOG.md) + - [Commits](https://github.com/Textualize/rich/compare/v14.0.0...v14.1.0) + + --- + updated-dependencies: + - dependency-name: certifi + dependency-version: 2025.7.14 + dependency-type: direct:production + update-type: version-update:semver-patch + dependency-group: python-packages + - dependency-name: docutils + dependency-version: '0.22' + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + - dependency-name: rich + dependency-version: 14.1.0 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + ... +- Chore(deps): bump certifi in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [certifi](https://github.com/certifi/python-certifi). + + + Updates `certifi` from 2025.6.15 to 2025.7.9 + - [Commits](https://github.com/certifi/python-certifi/compare/2025.06.15...2025.07.09) + + --- + updated-dependencies: + - dependency-name: certifi + dependency-version: 2025.7.9 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + ... +- Chore(deps): bump urllib3 from 2.4.0 to 2.5.0. [dependabot[bot]] + + Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.4.0 to 2.5.0. + - [Release notes](https://github.com/urllib3/urllib3/releases) + - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) + - [Commits](https://github.com/urllib3/urllib3/compare/2.4.0...2.5.0) + + --- + updated-dependencies: + - dependency-name: urllib3 + dependency-version: 2.5.0 + dependency-type: direct:production + ... +- Chore(deps): bump the python-packages group with 5 updates. + [dependabot[bot]] + + Bumps the python-packages group with 5 updates: + + | Package | From | To | + | --- | --- | --- | + | [flake8](https://github.com/pycqa/flake8) | `7.2.0` | `7.3.0` | + | [pycodestyle](https://github.com/PyCQA/pycodestyle) | `2.13.0` | `2.14.0` | + | [pyflakes](https://github.com/PyCQA/pyflakes) | `3.3.2` | `3.4.0` | + | [pygments](https://github.com/pygments/pygments) | `2.19.1` | `2.19.2` | + | [urllib3](https://github.com/urllib3/urllib3) | `2.4.0` | `2.5.0` | + + + Updates `flake8` from 7.2.0 to 7.3.0 + - [Commits](https://github.com/pycqa/flake8/compare/7.2.0...7.3.0) + + Updates `pycodestyle` from 2.13.0 to 2.14.0 + - [Release notes](https://github.com/PyCQA/pycodestyle/releases) + - [Changelog](https://github.com/PyCQA/pycodestyle/blob/main/CHANGES.txt) + - [Commits](https://github.com/PyCQA/pycodestyle/compare/2.13.0...2.14.0) + + Updates `pyflakes` from 3.3.2 to 3.4.0 + - [Changelog](https://github.com/PyCQA/pyflakes/blob/main/NEWS.rst) + - [Commits](https://github.com/PyCQA/pyflakes/compare/3.3.2...3.4.0) + + Updates `pygments` from 2.19.1 to 2.19.2 + - [Release notes](https://github.com/pygments/pygments/releases) + - [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES) + - [Commits](https://github.com/pygments/pygments/compare/2.19.1...2.19.2) + + Updates `urllib3` from 2.4.0 to 2.5.0 + - [Release notes](https://github.com/urllib3/urllib3/releases) + - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) + - [Commits](https://github.com/urllib3/urllib3/compare/2.4.0...2.5.0) + + --- + updated-dependencies: + - dependency-name: flake8 + dependency-version: 7.3.0 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + - dependency-name: pycodestyle + dependency-version: 2.14.0 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + - dependency-name: pyflakes + dependency-version: 3.4.0 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + - dependency-name: pygments + dependency-version: 2.19.2 + dependency-type: direct:production + update-type: version-update:semver-patch + dependency-group: python-packages + - dependency-name: urllib3 + dependency-version: 2.5.0 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + ... + + +0.50.2 (2025-06-16) +------------------- - Chore(deps): bump certifi in the python-packages group. [dependabot[bot]] diff --git a/github_backup/__init__.py b/github_backup/__init__.py index 079baa7..e7d2f93 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.50.2" +__version__ = "0.50.3" From 338d5a956b4b61c3ee65517785433157a914d2c9 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 11 Aug 2025 20:51:37 +0000 Subject: [PATCH 009/148] chore(deps): bump the python-packages group with 2 updates Bumps the python-packages group with 2 updates: [certifi](https://github.com/certifi/python-certifi) and [charset-normalizer](https://github.com/jawah/charset_normalizer). Updates `certifi` from 2025.7.14 to 2025.8.3 - [Commits](https://github.com/certifi/python-certifi/compare/2025.07.14...2025.08.03) Updates `charset-normalizer` from 3.4.2 to 3.4.3 - [Release notes](https://github.com/jawah/charset_normalizer/releases) - [Changelog](https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md) - [Commits](https://github.com/jawah/charset_normalizer/compare/3.4.2...3.4.3) --- updated-dependencies: - dependency-name: certifi dependency-version: 2025.8.3 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages - dependency-name: charset-normalizer dependency-version: 3.4.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/release-requirements.txt b/release-requirements.txt index 788fa95..1769460 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -1,8 +1,8 @@ autopep8==2.3.2 black==25.1.0 bleach==6.2.0 -certifi==2025.7.14 -charset-normalizer==3.4.2 +certifi==2025.8.3 +charset-normalizer==3.4.3 click==8.1.8 colorama==0.4.6 docutils==0.22 From f027760ac5b701ec7edffe72e265223821f9371b Mon Sep 17 00:00:00 2001 From: Mateusz Hajder <6783135+mhajder@users.noreply.github.com> Date: Tue, 12 Aug 2025 10:18:52 +0200 Subject: [PATCH 010/148] chore: update Dockerfile to use Python 3.12 and improve dependency installation --- .dockerignore | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++ .gitignore | 4 ++- Dockerfile | 42 ++++++++++++++++++++++------- 3 files changed, 110 insertions(+), 11 deletions(-) create mode 100644 .dockerignore diff --git a/.dockerignore b/.dockerignore new file mode 100644 index 0000000..07a3ea4 --- /dev/null +++ b/.dockerignore @@ -0,0 +1,75 @@ +# Docker ignore file to reduce build context size + +# Temp files +*~ +~* +.*~ +\#* +.#* +*# +dist + +# Build files +build +dist +pkg +*.egg +*.egg-info + +# Debian Files +debian/files +debian/python-github-backup* + +# Sphinx build +doc/_build + +# Generated man page +doc/github_backup.1 + +# Annoying macOS files +.DS_Store +._* + +# IDE configuration files +.vscode +.atom +.idea +*.code-workspace + +# RSA +id_rsa +id_rsa.pub + +# Virtual env +venv +.venv + +# Git +.git +.gitignore +.gitchangelog.rc +.github + +# Documentation +*.md +!README.md + +# Environment variables files +.env +.env.* +!.env.example +*.log + +# Cache files +**/__pycache__/ +*.py[cod] + +# Docker files +docker-compose.yml +Dockerfile* + +# Other files +release +*.tar +*.zip +*.gzip diff --git a/.gitignore b/.gitignore index f0ed9db..652f035 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,4 @@ -*.py[oc] +*.py[cod] # Temp files *~ @@ -33,6 +33,7 @@ doc/github_backup.1 # IDE configuration files .vscode .atom +.idea README @@ -42,3 +43,4 @@ id_rsa.pub # Virtual env venv +.venv diff --git a/Dockerfile b/Dockerfile index 6217594..2c28829 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,16 +1,38 @@ -FROM python:3.9.18-slim +FROM python:3.12-alpine3.22 AS builder -RUN --mount=type=cache,target=/var/cache/apt \ - apt-get update && apt-get install -y git git-lfs +RUN pip install --no-cache-dir --upgrade pip \ + && pip install --no-cache-dir uv -WORKDIR /usr/src/app +WORKDIR /app -COPY release-requirements.txt . -RUN --mount=type=cache,target=/root/.cache/pip \ - pip install -r release-requirements.txt +RUN --mount=type=cache,target=/root/.cache/uv \ + --mount=type=bind,source=requirements.txt,target=requirements.txt \ + --mount=type=bind,source=release-requirements.txt,target=release-requirements.txt \ + uv venv \ + && uv pip install -r release-requirements.txt COPY . . -RUN --mount=type=cache,target=/root/.cache/pip \ - pip install . -ENTRYPOINT [ "github-backup" ] +RUN --mount=type=cache,target=/root/.cache/uv \ + uv pip install . + + +FROM python:3.12-alpine3.22 +ENV PYTHONUNBUFFERED=1 + +RUN apk add --no-cache \ + ca-certificates \ + git \ + git-lfs \ + && addgroup -g 1000 appuser \ + && adduser -D -u 1000 -G appuser appuser + +COPY --from=builder --chown=appuser:appuser /app /app + +WORKDIR /app + +USER appuser + +ENV PATH="/app/.venv/bin:$PATH" + +ENTRYPOINT ["github-backup"] From 65749bfde4d7e5910763d77f6b89719687e96969 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 18 Aug 2025 06:33:46 +0000 Subject: [PATCH 011/148] chore(deps): bump actions/checkout from 4 to 5 Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v4...v5) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] --- .github/workflows/automatic-release.yml | 2 +- .github/workflows/docker.yml | 2 +- .github/workflows/lint.yml | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/workflows/automatic-release.yml b/.github/workflows/automatic-release.yml index 4c2150e..c6eb48b 100644 --- a/.github/workflows/automatic-release.yml +++ b/.github/workflows/automatic-release.yml @@ -18,7 +18,7 @@ jobs: runs-on: ubuntu-24.04 steps: - name: Checkout repository - uses: actions/checkout@v4 + uses: actions/checkout@v5 with: fetch-depth: 0 ssh-key: ${{ secrets.DEPLOY_PRIVATE_KEY }} diff --git a/.github/workflows/docker.yml b/.github/workflows/docker.yml index b0607f7..2c7cb38 100644 --- a/.github/workflows/docker.yml +++ b/.github/workflows/docker.yml @@ -38,7 +38,7 @@ jobs: steps: - name: Checkout repository - uses: actions/checkout@v4 + uses: actions/checkout@v5 with: persist-credentials: false diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml index 541242d..03686f4 100644 --- a/.github/workflows/lint.yml +++ b/.github/workflows/lint.yml @@ -18,7 +18,7 @@ jobs: steps: - name: Checkout repository - uses: actions/checkout@v4 + uses: actions/checkout@v5 with: fetch-depth: 0 - name: Setup Python From d3b67f884a21a0542a8f2e65f3233241a2e76706 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue, 19 Aug 2025 20:54:47 +0000 Subject: [PATCH 012/148] chore(deps): bump requests in the python-packages group Bumps the python-packages group with 1 update: [requests](https://github.com/psf/requests). Updates `requests` from 2.32.4 to 2.32.5 - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](https://github.com/psf/requests/compare/v2.32.4...v2.32.5) --- updated-dependencies: - dependency-name: requests dependency-version: 2.32.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index 1769460..2e16603 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -25,7 +25,7 @@ pycodestyle==2.14.0 pyflakes==3.4.0 Pygments==2.19.2 readme-renderer==44.0 -requests==2.32.4 +requests==2.32.5 requests-toolbelt==1.0.0 restructuredtext-lint==1.4.0 rfc3986==2.0.0 From 8bfad9b5b71f2ca988db56a3300fef039c4ac691 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed, 27 Aug 2025 20:52:18 +0000 Subject: [PATCH 013/148] chore(deps): bump platformdirs in the python-packages group Bumps the python-packages group with 1 update: [platformdirs](https://github.com/tox-dev/platformdirs). Updates `platformdirs` from 4.3.8 to 4.4.0 - [Release notes](https://github.com/tox-dev/platformdirs/releases) - [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst) - [Commits](https://github.com/tox-dev/platformdirs/compare/4.3.8...4.4.0) --- updated-dependencies: - dependency-name: platformdirs dependency-version: 4.4.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index 2e16603..e02238f 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -20,7 +20,7 @@ mypy-extensions==1.1.0 packaging==25.0 pathspec==0.12.1 pkginfo==1.12.1.2 -platformdirs==4.3.8 +platformdirs==4.4.0 pycodestyle==2.14.0 pyflakes==3.4.0 Pygments==2.19.2 From 1c465f4d35f777f4d601e0fcf32131fbf6e000bd Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed, 3 Sep 2025 23:43:31 +0000 Subject: [PATCH 014/148] chore(deps): bump more-itertools in the python-packages group Bumps the python-packages group with 1 update: [more-itertools](https://github.com/more-itertools/more-itertools). Updates `more-itertools` from 10.7.0 to 10.8.0 - [Release notes](https://github.com/more-itertools/more-itertools/releases) - [Commits](https://github.com/more-itertools/more-itertools/compare/v10.7.0...v10.8.0) --- updated-dependencies: - dependency-name: more-itertools dependency-version: 10.8.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index e02238f..82e6645 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -15,7 +15,7 @@ keyring==25.6.0 markdown-it-py==3.0.0 mccabe==0.7.0 mdurl==0.1.2 -more-itertools==10.7.0 +more-itertools==10.8.0 mypy-extensions==1.1.0 packaging==25.0 pathspec==0.12.1 From 268a989b09b96f575e058d3c12fe6a71580c1214 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Fri, 5 Sep 2025 13:09:08 +0000 Subject: [PATCH 015/148] chore(deps): bump twine from 6.1.0 to 6.2.0 in the python-packages group Bumps the python-packages group with 1 update: [twine](https://github.com/pypa/twine). Updates `twine` from 6.1.0 to 6.2.0 - [Release notes](https://github.com/pypa/twine/releases) - [Changelog](https://github.com/pypa/twine/blob/main/docs/changelog.rst) - [Commits](https://github.com/pypa/twine/compare/6.1.0...6.2.0) --- updated-dependencies: - dependency-name: twine dependency-version: 6.2.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index 82e6645..68d6bd9 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -33,7 +33,7 @@ rich==14.1.0 setuptools==80.9.0 six==1.17.0 tqdm==4.67.1 -twine==6.1.0 +twine==6.2.0 urllib3==2.5.0 webencodings==0.5.1 zipp==3.23.0 From d3079bfb74ec4be5a8f49b28e228dc1cbb4dcc44 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 8 Sep 2025 04:10:35 +0000 Subject: [PATCH 016/148] chore(deps): bump actions/setup-python from 5 to 6 Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](https://github.com/actions/setup-python/compare/v5...v6) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] --- .github/workflows/automatic-release.yml | 2 +- .github/workflows/lint.yml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/automatic-release.yml b/.github/workflows/automatic-release.yml index c6eb48b..2160206 100644 --- a/.github/workflows/automatic-release.yml +++ b/.github/workflows/automatic-release.yml @@ -27,7 +27,7 @@ jobs: git config --local user.email "action@github.com" git config --local user.name "GitHub Action" - name: Setup Python - uses: actions/setup-python@v5 + uses: actions/setup-python@v6 with: python-version: '3.12' - name: Install prerequisites diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml index 03686f4..e0036e2 100644 --- a/.github/workflows/lint.yml +++ b/.github/workflows/lint.yml @@ -22,7 +22,7 @@ jobs: with: fetch-depth: 0 - name: Setup Python - uses: actions/setup-python@v5 + uses: actions/setup-python@v6 with: python-version: "3.12" cache: "pip" From 12ac519e9c1f19a42c25e7cc7aa1ba5bc508509b Mon Sep 17 00:00:00 2001 From: Jose Diaz-Gonzalez Date: Thu, 11 Sep 2025 16:26:53 -0400 Subject: [PATCH 017/148] chore: Rename ISSUE_TEMPLATE.md to .github/ISSUE_TEMPLATE.md --- ISSUE_TEMPLATE.md => .github/ISSUE_TEMPLATE.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename ISSUE_TEMPLATE.md => .github/ISSUE_TEMPLATE.md (100%) diff --git a/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md similarity index 100% rename from ISSUE_TEMPLATE.md rename to .github/ISSUE_TEMPLATE.md From 39848e650cc15809631b31adf9df4b1fa54712e2 Mon Sep 17 00:00:00 2001 From: Jose Diaz-Gonzalez Date: Thu, 11 Sep 2025 16:27:23 -0400 Subject: [PATCH 018/148] chore: Rename PULL_REQUEST.md to .github/PULL_REQUEST.md --- PULL_REQUEST.md => .github/PULL_REQUEST.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename PULL_REQUEST.md => .github/PULL_REQUEST.md (100%) diff --git a/PULL_REQUEST.md b/.github/PULL_REQUEST.md similarity index 100% rename from PULL_REQUEST.md rename to .github/PULL_REQUEST.md From 03c660724d39b92af100454629685ec442aeb521 Mon Sep 17 00:00:00 2001 From: Jose Diaz-Gonzalez Date: Thu, 11 Sep 2025 16:30:10 -0400 Subject: [PATCH 019/148] chore: create bug template --- .github/ISSUE_TEMPLATE/bug.md | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 .github/ISSUE_TEMPLATE/bug.md diff --git a/.github/ISSUE_TEMPLATE/bug.md b/.github/ISSUE_TEMPLATE/bug.md new file mode 100644 index 0000000..0d0fee5 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug.md @@ -0,0 +1,28 @@ +--- +name: Bug Report +description: File a bug report. +body: + - type: markdown + attributes: + value: | + # Important notice regarding filed issues + + This project already fills my needs, and as such I have no real reason to continue it's development. This project is otherwise provided as is, and no support is given. + + If pull requests implementing bug fixes or enhancements are pushed, I am happy to review and merge them (time permitting). + + If you wish to have a bug fixed, you have a few options: + + - Fix it yourself and file a pull request. + - File a bug and hope someone else fixes it for you. + - Pay me to fix it (my rate is $200 an hour, minimum 1 hour, contact me via my [github email address](https://github.com/josegonzalez) if you want to go this route). + + In all cases, feel free to file an issue, they may be of help to others in the future. + - type: textarea + id: what-happened + attributes: + label: What happened? + description: Also tell us, what did you expect to happen? + placeholder: Tell us what you see! + validations: + required: true From df4d751be27252c2d2c1bf272d3e62cb55a2da61 Mon Sep 17 00:00:00 2001 From: Jose Diaz-Gonzalez Date: Thu, 11 Sep 2025 16:30:46 -0400 Subject: [PATCH 020/148] Rename bug.md to bug.yaml --- .github/ISSUE_TEMPLATE/{bug.md => bug.yaml} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename .github/ISSUE_TEMPLATE/{bug.md => bug.yaml} (100%) diff --git a/.github/ISSUE_TEMPLATE/bug.md b/.github/ISSUE_TEMPLATE/bug.yaml similarity index 100% rename from .github/ISSUE_TEMPLATE/bug.md rename to .github/ISSUE_TEMPLATE/bug.yaml From 85ab54e5147ddddb0ce4dbb7dc2c144b9db18acf Mon Sep 17 00:00:00 2001 From: Jose Diaz-Gonzalez Date: Thu, 11 Sep 2025 16:31:38 -0400 Subject: [PATCH 021/148] Update issue templates --- .github/ISSUE_TEMPLATE/bug_report.md | 38 ++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 .github/ISSUE_TEMPLATE/bug_report.md diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md new file mode 100644 index 0000000..dd84ea7 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -0,0 +1,38 @@ +--- +name: Bug report +about: Create a report to help us improve +title: '' +labels: '' +assignees: '' + +--- + +**Describe the bug** +A clear and concise description of what the bug is. + +**To Reproduce** +Steps to reproduce the behavior: +1. Go to '...' +2. Click on '....' +3. Scroll down to '....' +4. See error + +**Expected behavior** +A clear and concise description of what you expected to happen. + +**Screenshots** +If applicable, add screenshots to help explain your problem. + +**Desktop (please complete the following information):** + - OS: [e.g. iOS] + - Browser [e.g. chrome, safari] + - Version [e.g. 22] + +**Smartphone (please complete the following information):** + - Device: [e.g. iPhone6] + - OS: [e.g. iOS8.1] + - Browser [e.g. stock browser, safari] + - Version [e.g. 22] + +**Additional context** +Add any other context about the problem here. From d6bf031bf7ae0cd5bce311a725d36fe3214a1ec8 Mon Sep 17 00:00:00 2001 From: Jose Diaz-Gonzalez Date: Thu, 11 Sep 2025 16:32:32 -0400 Subject: [PATCH 022/148] Delete .github/ISSUE_TEMPLATE/bug_report.md --- .github/ISSUE_TEMPLATE/bug_report.md | 38 ---------------------------- 1 file changed, 38 deletions(-) delete mode 100644 .github/ISSUE_TEMPLATE/bug_report.md diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md deleted file mode 100644 index dd84ea7..0000000 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ /dev/null @@ -1,38 +0,0 @@ ---- -name: Bug report -about: Create a report to help us improve -title: '' -labels: '' -assignees: '' - ---- - -**Describe the bug** -A clear and concise description of what the bug is. - -**To Reproduce** -Steps to reproduce the behavior: -1. Go to '...' -2. Click on '....' -3. Scroll down to '....' -4. See error - -**Expected behavior** -A clear and concise description of what you expected to happen. - -**Screenshots** -If applicable, add screenshots to help explain your problem. - -**Desktop (please complete the following information):** - - OS: [e.g. iOS] - - Browser [e.g. chrome, safari] - - Version [e.g. 22] - -**Smartphone (please complete the following information):** - - Device: [e.g. iPhone6] - - OS: [e.g. iOS8.1] - - Browser [e.g. stock browser, safari] - - Version [e.g. 22] - -**Additional context** -Add any other context about the problem here. From 3d5f61aa2279c9cef3b3e9f8e8770768362afd73 Mon Sep 17 00:00:00 2001 From: Jose Diaz-Gonzalez Date: Thu, 11 Sep 2025 16:33:49 -0400 Subject: [PATCH 023/148] Create feature.yaml --- .github/ISSUE_TEMPLATE/feature.yaml | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) create mode 100644 .github/ISSUE_TEMPLATE/feature.yaml diff --git a/.github/ISSUE_TEMPLATE/feature.yaml b/.github/ISSUE_TEMPLATE/feature.yaml new file mode 100644 index 0000000..dbfd2c5 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature.yaml @@ -0,0 +1,27 @@ +--- +name: Feature Request +description: File a feature request. +body: + - type: markdown + attributes: + value: | + # Important notice regarding filed issues + + This project already fills my needs, and as such I have no real reason to continue it's development. This project is otherwise provided as is, and no support is given. + + If pull requests implementing bug fixes or enhancements are pushed, I am happy to review and merge them (time permitting). + + If you wish to have a bug fixed, you have a few options: + + - Fix it yourself and file a pull request. + - File a bug and hope someone else fixes it for you. + - Pay me to fix it (my rate is $200 an hour, minimum 1 hour, contact me via my [github email address](https://github.com/josegonzalez) if you want to go this route). + + In all cases, feel free to file an issue, they may be of help to others in the future. + - type: textarea + id: what-would-you-like-to-happen + attributes: + label: What would you like to happen? + description: Please describe in detail how the new functionality should work as well as any issues with existing functionality. + validations: + required: true From eb756d665c425fd30ae266b82809a229a7cf1d41 Mon Sep 17 00:00:00 2001 From: Jose Diaz-Gonzalez Date: Thu, 11 Sep 2025 16:34:18 -0400 Subject: [PATCH 024/148] Delete .github/ISSUE_TEMPLATE.md --- .github/ISSUE_TEMPLATE.md | 13 ------------- 1 file changed, 13 deletions(-) delete mode 100644 .github/ISSUE_TEMPLATE.md diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md deleted file mode 100644 index 734420b..0000000 --- a/.github/ISSUE_TEMPLATE.md +++ /dev/null @@ -1,13 +0,0 @@ -# Important notice regarding filed issues - -This project already fills my needs, and as such I have no real reason to continue it's development. This project is otherwise provided as is, and no support is given. - -If pull requests implementing bug fixes or enhancements are pushed, I am happy to review and merge them (time permitting). - -If you wish to have a bug fixed, you have a few options: - -- Fix it yourself and file a pull request. -- File a bug and hope someone else fixes it for you. -- Pay me to fix it (my rate is $200 an hour, minimum 1 hour, contact me via my [github email address](https://github.com/josegonzalez) if you want to go this route). - -In all cases, feel free to file an issue, they may be of help to others in the future. From 9d28d9c2b041aab387fc950846794ca7a374d9d9 Mon Sep 17 00:00:00 2001 From: Jose Diaz-Gonzalez Date: Thu, 11 Sep 2025 16:34:50 -0400 Subject: [PATCH 025/148] Update feature.yaml --- .github/ISSUE_TEMPLATE/feature.yaml | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/.github/ISSUE_TEMPLATE/feature.yaml b/.github/ISSUE_TEMPLATE/feature.yaml index dbfd2c5..4b1f408 100644 --- a/.github/ISSUE_TEMPLATE/feature.yaml +++ b/.github/ISSUE_TEMPLATE/feature.yaml @@ -11,11 +11,11 @@ body: If pull requests implementing bug fixes or enhancements are pushed, I am happy to review and merge them (time permitting). - If you wish to have a bug fixed, you have a few options: + If you wish to have a feature implemented, you have a few options: - - Fix it yourself and file a pull request. - - File a bug and hope someone else fixes it for you. - - Pay me to fix it (my rate is $200 an hour, minimum 1 hour, contact me via my [github email address](https://github.com/josegonzalez) if you want to go this route). + - Implement it yourself and file a pull request. + - File an issue and hope someone else implements it for you. + - Pay me to implement it (my rate is $200 an hour, minimum 1 hour, contact me via my [github email address](https://github.com/josegonzalez) if you want to go this route). In all cases, feel free to file an issue, they may be of help to others in the future. - type: textarea From 5bedaf825f2a161617d41e002f8ddc0af1dfee60 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Fri, 19 Sep 2025 13:09:40 +0000 Subject: [PATCH 026/148] chore(deps): bump the python-packages group across 1 directory with 2 updates Bumps the python-packages group with 2 updates in the / directory: [black](https://github.com/psf/black) and [docutils](https://github.com/rtfd/recommonmark). Updates `black` from 25.1.0 to 25.9.0 - [Release notes](https://github.com/psf/black/releases) - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) - [Commits](https://github.com/psf/black/compare/25.1.0...25.9.0) Updates `docutils` from 0.22 to 0.22.1 - [Changelog](https://github.com/readthedocs/recommonmark/blob/master/CHANGELOG.md) - [Commits](https://github.com/rtfd/recommonmark/commits) --- updated-dependencies: - dependency-name: black dependency-version: 25.9.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages - dependency-name: docutils dependency-version: 0.22.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/release-requirements.txt b/release-requirements.txt index 68d6bd9..76d8fd0 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -1,11 +1,11 @@ autopep8==2.3.2 -black==25.1.0 +black==25.9.0 bleach==6.2.0 certifi==2025.8.3 charset-normalizer==3.4.3 click==8.1.8 colorama==0.4.6 -docutils==0.22 +docutils==0.22.1 flake8==7.3.0 gitchangelog==3.0.4 idna==3.10 From 64b5667a1690a04eb39b96305569f2a41a0e8d41 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 22 Sep 2025 13:12:10 +0000 Subject: [PATCH 027/148] chore(deps): bump docutils in the python-packages group Bumps the python-packages group with 1 update: [docutils](https://github.com/rtfd/recommonmark). Updates `docutils` from 0.22.1 to 0.22.2 - [Changelog](https://github.com/readthedocs/recommonmark/blob/master/CHANGELOG.md) - [Commits](https://github.com/rtfd/recommonmark/commits) --- updated-dependencies: - dependency-name: docutils dependency-version: 0.22.2 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index 76d8fd0..1df8412 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -5,7 +5,7 @@ certifi==2025.8.3 charset-normalizer==3.4.3 click==8.1.8 colorama==0.4.6 -docutils==0.22.1 +docutils==0.22.2 flake8==7.3.0 gitchangelog==3.0.4 idna==3.10 From 963ed3e6f605c40d83e194f7f1ad9d0594f77bd3 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 6 Oct 2025 13:53:31 +0000 Subject: [PATCH 028/148] chore(deps): bump the python-packages group with 3 updates Bumps the python-packages group with 3 updates: [certifi](https://github.com/certifi/python-certifi), [click](https://github.com/pallets/click) and [markdown-it-py](https://github.com/executablebooks/markdown-it-py). Updates `certifi` from 2025.8.3 to 2025.10.5 - [Commits](https://github.com/certifi/python-certifi/compare/2025.08.03...2025.10.05) Updates `click` from 8.1.8 to 8.3.0 - [Release notes](https://github.com/pallets/click/releases) - [Changelog](https://github.com/pallets/click/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/click/compare/8.1.8...8.3.0) Updates `markdown-it-py` from 3.0.0 to 4.0.0 - [Release notes](https://github.com/executablebooks/markdown-it-py/releases) - [Changelog](https://github.com/executablebooks/markdown-it-py/blob/master/CHANGELOG.md) - [Commits](https://github.com/executablebooks/markdown-it-py/compare/v3.0.0...v4.0.0) --- updated-dependencies: - dependency-name: certifi dependency-version: 2025.10.5 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages - dependency-name: click dependency-version: 8.3.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages - dependency-name: markdown-it-py dependency-version: 4.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/release-requirements.txt b/release-requirements.txt index 1df8412..b5c3b26 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -1,9 +1,9 @@ autopep8==2.3.2 black==25.9.0 bleach==6.2.0 -certifi==2025.8.3 +certifi==2025.10.5 charset-normalizer==3.4.3 -click==8.1.8 +click==8.3.0 colorama==0.4.6 docutils==0.22.2 flake8==7.3.0 @@ -12,7 +12,7 @@ idna==3.10 importlib-metadata==8.7.0 jaraco.classes==3.4.0 keyring==25.6.0 -markdown-it-py==3.0.0 +markdown-it-py==4.0.0 mccabe==0.7.0 mdurl==0.1.2 more-itertools==10.8.0 From 90396d2bdfc0bc9e54ddf00bd6cf3435f20a7516 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Fri, 10 Oct 2025 13:09:42 +0000 Subject: [PATCH 029/148] chore(deps): bump the python-packages group across 1 directory with 2 updates Bumps the python-packages group with 2 updates in the / directory: [platformdirs](https://github.com/tox-dev/platformdirs) and [rich](https://github.com/Textualize/rich). Updates `platformdirs` from 4.4.0 to 4.5.0 - [Release notes](https://github.com/tox-dev/platformdirs/releases) - [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst) - [Commits](https://github.com/tox-dev/platformdirs/compare/4.4.0...4.5.0) Updates `rich` from 14.1.0 to 14.2.0 - [Release notes](https://github.com/Textualize/rich/releases) - [Changelog](https://github.com/Textualize/rich/blob/master/CHANGELOG.md) - [Commits](https://github.com/Textualize/rich/compare/v14.1.0...v14.2.0) --- updated-dependencies: - dependency-name: platformdirs dependency-version: 4.5.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages - dependency-name: rich dependency-version: 14.2.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/release-requirements.txt b/release-requirements.txt index b5c3b26..f5bcdb4 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -20,7 +20,7 @@ mypy-extensions==1.1.0 packaging==25.0 pathspec==0.12.1 pkginfo==1.12.1.2 -platformdirs==4.4.0 +platformdirs==4.5.0 pycodestyle==2.14.0 pyflakes==3.4.0 Pygments==2.19.2 @@ -29,7 +29,7 @@ requests==2.32.5 requests-toolbelt==1.0.0 restructuredtext-lint==1.4.0 rfc3986==2.0.0 -rich==14.1.0 +rich==14.2.0 setuptools==80.9.0 six==1.17.0 tqdm==4.67.1 From 38b4a2c1066f90327278f85fda1792a26d5510fc Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 13 Oct 2025 13:42:50 +0000 Subject: [PATCH 030/148] chore(deps): bump idna from 3.10 to 3.11 in the python-packages group Bumps the python-packages group with 1 update: [idna](https://github.com/kjd/idna). Updates `idna` from 3.10 to 3.11 - [Release notes](https://github.com/kjd/idna/releases) - [Changelog](https://github.com/kjd/idna/blob/master/HISTORY.rst) - [Commits](https://github.com/kjd/idna/compare/v3.10...v3.11) --- updated-dependencies: - dependency-name: idna dependency-version: '3.11' dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index f5bcdb4..895083f 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -8,7 +8,7 @@ colorama==0.4.6 docutils==0.22.2 flake8==7.3.0 gitchangelog==3.0.4 -idna==3.10 +idna==3.11 importlib-metadata==8.7.0 jaraco.classes==3.4.0 keyring==25.6.0 From 759ec58beb24e55539f401fccfb68f83a72ffe7d Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue, 14 Oct 2025 13:10:22 +0000 Subject: [PATCH 031/148] chore(deps): bump charset-normalizer in the python-packages group Bumps the python-packages group with 1 update: [charset-normalizer](https://github.com/jawah/charset_normalizer). Updates `charset-normalizer` from 3.4.3 to 3.4.4 - [Release notes](https://github.com/jawah/charset_normalizer/releases) - [Changelog](https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md) - [Commits](https://github.com/jawah/charset_normalizer/compare/3.4.3...3.4.4) --- updated-dependencies: - dependency-name: charset-normalizer dependency-version: 3.4.4 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index 895083f..6f1b161 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -2,7 +2,7 @@ autopep8==2.3.2 black==25.9.0 bleach==6.2.0 certifi==2025.10.5 -charset-normalizer==3.4.3 +charset-normalizer==3.4.4 click==8.3.0 colorama==0.4.6 docutils==0.22.2 From 4dae43c58e0f907e050c498c225ea5d40b970fd0 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue, 28 Oct 2025 13:11:27 +0000 Subject: [PATCH 032/148] chore(deps): bump bleach in the python-packages group Bumps the python-packages group with 1 update: [bleach](https://github.com/mozilla/bleach). Updates `bleach` from 6.2.0 to 6.3.0 - [Changelog](https://github.com/mozilla/bleach/blob/main/CHANGES) - [Commits](https://github.com/mozilla/bleach/compare/v6.2.0...v6.3.0) --- updated-dependencies: - dependency-name: bleach dependency-version: 6.3.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index 6f1b161..bd9ebf2 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -1,6 +1,6 @@ autopep8==2.3.2 black==25.9.0 -bleach==6.2.0 +bleach==6.3.0 certifi==2025.10.5 charset-normalizer==3.4.4 click==8.3.0 From cd23dd1a16558b40ebdfae72f233db42e5b485f9 Mon Sep 17 00:00:00 2001 From: Rodos Date: Tue, 4 Nov 2025 10:07:22 +1100 Subject: [PATCH 033/148] feat: Enforce Python 3.8+ requirement and add multi-version CI testing - Add python_requires=">=3.8" to setup.py to enforce minimum version at install time - Update README to explicitly document Python 3.8+ requirement - Add CI matrix to test lint/build on Python 3.8-3.14 (7 versions) - Aligns with actual usage patterns (~99% of downloads on Python 3.8+) - Prevents future PRs from inadvertently using incompatible syntax This change protects users by preventing installation on unsupported Python versions and ensures contributors can see version requirements clearly. --- .github/workflows/lint.yml | 5 ++++- README.rst | 2 +- setup.py | 1 + 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml index e0036e2..cf74eb7 100644 --- a/.github/workflows/lint.yml +++ b/.github/workflows/lint.yml @@ -15,6 +15,9 @@ jobs: lint: name: lint runs-on: ubuntu-24.04 + strategy: + matrix: + python-version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13", "3.14"] steps: - name: Checkout repository @@ -24,7 +27,7 @@ jobs: - name: Setup Python uses: actions/setup-python@v6 with: - python-version: "3.12" + python-version: ${{ matrix.python-version }} cache: "pip" - run: pip install -r release-requirements.txt && pip install wheel - run: flake8 --ignore=E501,E203,W503 diff --git a/README.rst b/README.rst index 5dcef95..c5fafa3 100644 --- a/README.rst +++ b/README.rst @@ -9,8 +9,8 @@ The package can be used to backup an *entire* `Github `_ or Requirements ============ +- Python 3.8 or higher - GIT 1.9+ -- Python Installation ============ diff --git a/setup.py b/setup.py index c4b8cf1..6ef7551 100644 --- a/setup.py +++ b/setup.py @@ -50,5 +50,6 @@ def open_file(fname): long_description=open_file("README.rst").read(), long_description_content_type="text/x-rst", install_requires=open_file("requirements.txt").readlines(), + python_requires=">=3.8", zip_safe=True, ) From 73dc75ab952300213d4930bc93cb76067b7f87e0 Mon Sep 17 00:00:00 2001 From: Rodos Date: Tue, 4 Nov 2025 13:30:42 +1100 Subject: [PATCH 034/148] fix: Remove Python 3.8 and 3.9 from CI matrix 3.8 and 3.9 are failing because the pinned dependencies don't support them: - autopep8==2.3.2 needs Python 3.9+ - bleach==6.3.0 needs Python 3.10+ Both are EOL now anyway (3.8 in Oct 2024, 3.9 in Oct 2025). Just fixing CI to test 3.10-3.14 for now. Will do a separate PR to formally drop 3.8/3.9 support with python_requires and README updates. --- .github/workflows/lint.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml index cf74eb7..02ad174 100644 --- a/.github/workflows/lint.yml +++ b/.github/workflows/lint.yml @@ -17,7 +17,7 @@ jobs: runs-on: ubuntu-24.04 strategy: matrix: - python-version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13", "3.14"] + python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"] steps: - name: Checkout repository From 875e31819afe3ed4cd2e77cdb8b3a1f4c626a29b Mon Sep 17 00:00:00 2001 From: Rodos Date: Tue, 4 Nov 2025 13:53:41 +1100 Subject: [PATCH 035/148] feat: Drop support for Python 3.8 and 3.9 (EOL) Both Python 3.8 and 3.9 have reached end-of-life: - Python 3.8: EOL October 7, 2024 - Python 3.9: EOL October 31, 2025 Changes: - Add python_requires=">=3.10" to setup.py - Remove Python 3.8 and 3.9 from classifiers - Add Python 3.13 and 3.14 to classifiers - Update README to document Python 3.10+ requirement --- README.rst | 2 +- setup.py | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/README.rst b/README.rst index 5dcef95..e435951 100644 --- a/README.rst +++ b/README.rst @@ -9,8 +9,8 @@ The package can be used to backup an *entire* `Github `_ or Requirements ============ +- Python 3.10 or higher - GIT 1.9+ -- Python Installation ============ diff --git a/setup.py b/setup.py index c4b8cf1..374e6ec 100644 --- a/setup.py +++ b/setup.py @@ -40,15 +40,16 @@ def open_file(fname): "Development Status :: 5 - Production/Stable", "Topic :: System :: Archiving :: Backup", "License :: OSI Approved :: MIT License", - "Programming Language :: Python :: 3.8", - "Programming Language :: Python :: 3.9", "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", "Programming Language :: Python :: 3.12", + "Programming Language :: Python :: 3.13", + "Programming Language :: Python :: 3.14", ], description="backup a github user or organization", long_description=open_file("README.rst").read(), long_description_content_type="text/x-rst", install_requires=open_file("requirements.txt").readlines(), + python_requires=">=3.10", zip_safe=True, ) From a194fa48cead59dda7f491ab6c4aeffb8a0d4c7f Mon Sep 17 00:00:00 2001 From: Rodos Date: Mon, 3 Nov 2025 13:36:15 +1100 Subject: [PATCH 036/148] feat: Add attachment download support for issues and pull requests Adds new --attachments flag that downloads user-uploaded files from issue and PR bodies and comments. Key features: - Determines attachment URLs - Tracks downloads in manifest.json with metadata - Supports --skip-existing to avoid re-downloading - Handles filename collisions with counter suffix - Smart retry logic for transient vs permanent failures - Uses Content-Disposition for correct file extensions --- README.rst | 30 +- github_backup/github_backup.py | 610 ++++++++++++++++++++++++++++++++- 2 files changed, 637 insertions(+), 3 deletions(-) diff --git a/README.rst b/README.rst index e435951..69d5524 100644 --- a/README.rst +++ b/README.rst @@ -50,7 +50,7 @@ CLI Help output:: [--keychain-name OSX_KEYCHAIN_ITEM_NAME] [--keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT] [--releases] [--latest-releases NUMBER_OF_LATEST_RELEASES] - [--skip-prerelease] [--assets] + [--skip-prerelease] [--assets] [--attachments] [--exclude [REPOSITORY [REPOSITORY ...]] [--throttle-limit THROTTLE_LIMIT] [--throttle-pause THROTTLE_PAUSE] USER @@ -133,6 +133,9 @@ CLI Help output:: --skip-prerelease skip prerelease and draft versions; only applies if including releases --assets include assets alongside release information; only applies if including releases + --attachments download user-attachments from issues and pull requests + to issues/attachments/{issue_number}/ and + pulls/attachments/{pull_number}/ directories --exclude [REPOSITORY [REPOSITORY ...]] names of repositories to exclude from backup. --throttle-limit THROTTLE_LIMIT @@ -213,6 +216,29 @@ When you use the ``--lfs`` option, you will need to make sure you have Git LFS i Instructions on how to do this can be found on https://git-lfs.github.com. +About Attachments +----------------- + +When you use the ``--attachments`` option with ``--issues`` or ``--pulls``, the tool will download user-uploaded attachments (images, videos, documents, etc.) from issue and pull request descriptions and comments. In some circumstances attachments contain valuable data related to the topic, and without their backup important information or context might be lost inadvertently. + +Attachments are saved to ``issues/attachments/{issue_number}/`` and ``pulls/attachments/{pull_number}/`` directories, where ``{issue_number}`` is the GitHub issue number (e.g., issue #123 saves to ``issues/attachments/123/``). Each attachment directory contains: + +- The downloaded attachment files (named by their GitHub identifier with appropriate file extensions) +- If multiple attachments have the same filename, conflicts are resolved with numeric suffixes (e.g., ``report.pdf``, ``report_1.pdf``, ``report_2.pdf``) +- A ``manifest.json`` file documenting all downloads, including URLs, file metadata, and download status + +The tool automatically extracts file extensions from HTTP headers to ensure files can be more easily opened by your operating system. + +**Supported URL formats:** + +- Modern: ``github.com/user-attachments/{assets,files}/*`` +- Legacy: ``user-images.githubusercontent.com/*`` and ``private-user-images.githubusercontent.com/*`` +- Repo files: ``github.com/{owner}/{repo}/files/*`` (filtered to current repository) +- Repo assets: ``github.com/{owner}/{repo}/assets/*`` (filtered to current repository) + +**Repository filtering** for repo files/assets handles renamed and transferred repositories gracefully. URLs are included if they either match the current repository name directly, or redirect to it (e.g., ``willmcgugan/rich`` redirects to ``Textualize/rich`` after transfer). + + Run in Docker container ----------------------- @@ -303,7 +329,7 @@ Quietly and incrementally backup useful Github user data (public and private rep export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN GH_USER=YOUR-GITHUB-USER - github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER + github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --attachments --pull-details --gists --starred-gists $GH_USER Debug an error/block or incomplete backup into a temporary directory. Omit "incremental" to fill a previous incomplete backup. :: diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 4b2d790..e8d9ae0 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -420,6 +420,12 @@ def parse_args(args=None): dest="include_assets", help="include assets alongside release information; only applies if including releases", ) + parser.add_argument( + "--attachments", + action="store_true", + dest="include_attachments", + help="download user-attachments from issues and pull requests", + ) parser.add_argument( "--throttle-limit", dest="throttle_limit", @@ -814,7 +820,9 @@ def redirect_request(self, req, fp, code, msg, headers, newurl): request = super(S3HTTPRedirectHandler, self).redirect_request( req, fp, code, msg, headers, newurl ) - del request.headers["Authorization"] + # Only delete Authorization header if it exists (attachments may not have it) + if "Authorization" in request.headers: + del request.headers["Authorization"] return request @@ -867,6 +875,598 @@ def download_file(url, path, auth, as_app=False, fine=False): ) +def download_attachment_file(url, path, auth, as_app=False, fine=False): + """Download attachment file directly (not via GitHub API). + + Similar to download_file() but for direct file URLs, not API endpoints. + Attachment URLs (user-images, user-attachments) are direct downloads, + not API endpoints, so we skip _construct_request() which adds API params. + + URL Format Support & Authentication Requirements: + + | URL Format | Auth Required | Notes | + |----------------------------------------------|---------------|--------------------------| + | github.com/user-attachments/assets/* | Private only | Modern format (2024+) | + | github.com/user-attachments/files/* | Private only | Modern format (2024+) | + | user-images.githubusercontent.com/* | No (public) | Legacy CDN, all eras | + | private-user-images.githubusercontent.com/* | JWT in URL | Legacy private (5min) | + | github.com/{owner}/{repo}/files/* | Repo filter | Old repo files | + + - Modern user-attachments: Requires GitHub token auth for private repos + - Legacy public CDN: No auth needed/accepted (returns 400 with auth header) + - Legacy private CDN: Uses JWT token embedded in URL, no GitHub token needed + - Repo files: Filtered to current repository only during extraction + + Returns dict with metadata: + - success: bool + - http_status: int (200, 404, etc.) + - content_type: str or None + - original_filename: str or None (from Content-Disposition) + - size_bytes: int or None + - error: str or None + """ + import re + from datetime import datetime, timezone + + metadata = { + "url": url, + "success": False, + "http_status": None, + "content_type": None, + "original_filename": None, + "size_bytes": None, + "downloaded_at": datetime.now(timezone.utc).isoformat(), + "error": None, + } + + if os.path.exists(path): + metadata["success"] = True + metadata["http_status"] = 200 # Assume success if already exists + metadata["size_bytes"] = os.path.getsize(path) + return metadata + + # Create simple request (no API query params) + request = Request(url) + request.add_header("Accept", "application/octet-stream") + + # Add authentication header only for modern github.com/user-attachments URLs + # Legacy CDN URLs (user-images.githubusercontent.com) are public and don't need/accept auth + # Private CDN URLs (private-user-images) use JWT tokens embedded in the URL + if auth is not None and "github.com/user-attachments/" in url: + if not as_app: + if fine: + # Fine-grained token: plain token with "token " prefix + request.add_header("Authorization", "token " + auth) + else: + # Classic token: base64-encoded with "Basic " prefix + request.add_header("Authorization", "Basic ".encode("ascii") + auth) + else: + # App authentication + auth = auth.encode("ascii") + request.add_header("Authorization", "token ".encode("ascii") + auth) + + # Reuse S3HTTPRedirectHandler from download_file() + opener = build_opener(S3HTTPRedirectHandler) + + try: + response = opener.open(request) + metadata["http_status"] = response.getcode() + + # Extract Content-Type + content_type = response.headers.get("Content-Type", "").split(";")[0].strip() + if content_type: + metadata["content_type"] = content_type + + # Extract original filename from Content-Disposition header + # Format: attachment; filename=example.mov or attachment;filename="example.mov" + content_disposition = response.headers.get("Content-Disposition", "") + if content_disposition: + # Match: filename=something or filename="something" or filename*=UTF-8''something + match = re.search(r'filename\*?=["\']?([^"\';\r\n]+)', content_disposition) + if match: + original_filename = match.group(1).strip() + # Handle RFC 5987 encoding: filename*=UTF-8''example.mov + if "UTF-8''" in original_filename: + original_filename = original_filename.split("UTF-8''")[1] + metadata["original_filename"] = original_filename + + # Fallback: Extract filename from final URL after redirects + # This handles user-attachments/assets URLs which redirect to S3 with filename.ext + if not metadata["original_filename"]: + from urllib.parse import urlparse, unquote + + final_url = response.geturl() + parsed = urlparse(final_url) + # Get filename from path (last component before query string) + path_parts = parsed.path.split("/") + if path_parts: + # URL might be encoded, decode it + filename_from_url = unquote(path_parts[-1]) + # Only use if it has an extension + if "." in filename_from_url: + metadata["original_filename"] = filename_from_url + + # Download file + chunk_size = 16 * 1024 + bytes_downloaded = 0 + with open(path, "wb") as f: + while True: + chunk = response.read(chunk_size) + if not chunk: + break + f.write(chunk) + bytes_downloaded += len(chunk) + + metadata["size_bytes"] = bytes_downloaded + metadata["success"] = True + + except HTTPError as exc: + metadata["http_status"] = exc.code + metadata["error"] = str(exc.reason) + logger.warning( + "Skipping download of attachment {0} due to HTTPError: {1}".format( + url, exc.reason + ) + ) + except URLError as e: + metadata["error"] = str(e.reason) + logger.warning( + "Skipping download of attachment {0} due to URLError: {1}".format( + url, e.reason + ) + ) + except socket.error as e: + metadata["error"] = str(e.strerror) if hasattr(e, "strerror") else str(e) + logger.warning( + "Skipping download of attachment {0} due to socket error: {1}".format( + url, e.strerror if hasattr(e, "strerror") else str(e) + ) + ) + except Exception as e: + metadata["error"] = str(e) + logger.warning( + "Skipping download of attachment {0} due to error: {1}".format(url, str(e)) + ) + + return metadata + + +def extract_attachment_urls(item_data, issue_number=None, repository_full_name=None): + """Extract GitHub-hosted attachment URLs from issue/PR body and comments. + + What qualifies as an attachment? + There is no "attachment" concept in the GitHub API - it's a user behavior pattern + we've identified through analysis of real-world repositories. We define attachments as: + + - User-uploaded files hosted on GitHub's CDN domains + - Found outside of code blocks (not examples/documentation) + - Matches known GitHub attachment URL patterns + + This intentionally captures bare URLs pasted by users, not just markdown/HTML syntax. + Some false positives (example URLs in documentation) may occur - these fail gracefully + with HTTP 404 and are logged in the manifest. + + Supported URL formats: + - Modern: github.com/user-attachments/{assets,files}/* + - Legacy: user-images.githubusercontent.com/* (including private-user-images) + - Repo files: github.com/{owner}/{repo}/files/* (filtered to current repo) + - Repo assets: github.com/{owner}/{repo}/assets/* (filtered to current repo) + + Repository filtering (repo files/assets only): + - Direct match: URL is for current repository → included + - Redirect match: URL redirects to current repository → included (handles renames/transfers) + - Different repo: URL is for different repository → excluded + + Code block filtering: + - Removes fenced code blocks (```) and inline code (`) before extraction + - Prevents extracting URLs from code examples and documentation snippets + + Args: + item_data: Issue or PR data dict + issue_number: Issue/PR number for logging + repository_full_name: Full repository name (owner/repo) for filtering repo-scoped URLs + """ + import re + + urls = [] + + # Define all GitHub attachment patterns + # Stop at markdown punctuation: whitespace, ), `, ", >, < + # Trailing sentence punctuation (. ! ? , ; : ' ") is stripped in post-processing + patterns = [ + r'https://github\.com/user-attachments/(?:assets|files)/[^\s\)`"<>]+', # Modern + r'https://(?:private-)?user-images\.githubusercontent\.com/[^\s\)`"<>]+', # Legacy CDN + ] + + # Add repo-scoped patterns (will be filtered by repository later) + # These patterns match ANY repo, then we filter to current repo with redirect checking + repo_files_pattern = r'https://github\.com/[^/]+/[^/]+/files/\d+/[^\s\)`"<>]+' + repo_assets_pattern = r'https://github\.com/[^/]+/[^/]+/assets/\d+/[^\s\)`"<>]+' + patterns.append(repo_files_pattern) + patterns.append(repo_assets_pattern) + + def clean_url(url): + """Remove trailing sentence and markdown punctuation that's not part of the URL.""" + return url.rstrip(".!?,;:'\")") + + def remove_code_blocks(text): + """Remove markdown code blocks (fenced and inline) from text. + + This prevents extracting URLs from code examples like: + - Fenced code blocks: ```code``` + - Inline code: `code` + """ + # Remove fenced code blocks first (```...```) + # DOTALL flag makes . match newlines + text = re.sub(r"```.*?```", "", text, flags=re.DOTALL) + + # Remove inline code (`...`) + # Non-greedy match between backticks + text = re.sub(r"`[^`]*`", "", text) + + return text + + def is_repo_scoped_url(url): + """Check if URL is a repo-scoped attachment (files or assets).""" + return bool( + re.match(r"https://github\.com/[^/]+/[^/]+/(?:files|assets)/\d+/", url) + ) + + def check_redirect_to_current_repo(url, current_repo): + """Check if URL redirects to current repository. + + Returns True if: + - URL is already for current repo + - URL redirects (301/302) to current repo (handles renames/transfers) + + Returns False otherwise (URL is for a different repo). + """ + # Extract owner/repo from URL + match = re.match(r"https://github\.com/([^/]+)/([^/]+)/", url) + if not match: + return False + + url_owner, url_repo = match.groups() + url_repo_full = f"{url_owner}/{url_repo}" + + # Direct match - no need to check redirect + if url_repo_full.lower() == current_repo.lower(): + return True + + # Different repo - check if it redirects to current repo + # This handles repository transfers and renames + try: + import urllib.request + import urllib.error + + # Make HEAD request with redirect following disabled + # We need to manually handle redirects to see the Location header + request = urllib.request.Request(url, method="HEAD") + request.add_header("User-Agent", "python-github-backup") + + # Create opener that does NOT follow redirects + class NoRedirectHandler(urllib.request.HTTPRedirectHandler): + def redirect_request(self, req, fp, code, msg, headers, newurl): + return None # Don't follow redirects + + opener = urllib.request.build_opener(NoRedirectHandler) + + try: + _ = opener.open(request, timeout=10) + # Got 200 - URL works as-is but for different repo + return False + except urllib.error.HTTPError as e: + # Check if it's a redirect (301, 302, 307, 308) + if e.code in (301, 302, 307, 308): + location = e.headers.get("Location", "") + # Check if redirect points to current repo + if location: + redirect_match = re.match( + r"https://github\.com/([^/]+)/([^/]+)/", location + ) + if redirect_match: + redirect_owner, redirect_repo = redirect_match.groups() + redirect_repo_full = f"{redirect_owner}/{redirect_repo}" + return redirect_repo_full.lower() == current_repo.lower() + return False + except Exception: + # On any error (timeout, network issue, etc.), be conservative + # and exclude the URL to avoid downloading from wrong repos + return False + + # Extract from body + body = item_data.get("body") or "" + # Remove code blocks before searching for URLs + body_cleaned = remove_code_blocks(body) + for pattern in patterns: + found_urls = re.findall(pattern, body_cleaned) + urls.extend([clean_url(url) for url in found_urls]) + + # Extract from issue comments + if "comment_data" in item_data: + for comment in item_data["comment_data"]: + comment_body = comment.get("body") or "" + # Remove code blocks before searching for URLs + comment_cleaned = remove_code_blocks(comment_body) + for pattern in patterns: + found_urls = re.findall(pattern, comment_cleaned) + urls.extend([clean_url(url) for url in found_urls]) + + # Extract from PR regular comments + if "comment_regular_data" in item_data: + for comment in item_data["comment_regular_data"]: + comment_body = comment.get("body") or "" + # Remove code blocks before searching for URLs + comment_cleaned = remove_code_blocks(comment_body) + for pattern in patterns: + found_urls = re.findall(pattern, comment_cleaned) + urls.extend([clean_url(url) for url in found_urls]) + + regex_urls = list(set(urls)) # dedupe + + # Filter repo-scoped URLs to current repository only + # This handles repository transfers/renames via redirect checking + if repository_full_name: + filtered_urls = [] + for url in regex_urls: + if is_repo_scoped_url(url): + # Check if URL belongs to current repo (or redirects to it) + if check_redirect_to_current_repo(url, repository_full_name): + filtered_urls.append(url) + # else: skip URLs from other repositories + else: + # Non-repo-scoped URLs (user-attachments, CDN) - always include + filtered_urls.append(url) + regex_urls = filtered_urls + + return regex_urls + + +def extract_and_apply_extension(filepath, original_filename): + """Extract extension from original filename and rename file if needed. + + Args: + filepath: Current file path (may have no extension) + original_filename: Original filename from Content-Disposition (has extension) + + Returns: + Final filepath with extension applied + """ + if not original_filename or not os.path.exists(filepath): + return filepath + + # Get extension from original filename + original_ext = os.path.splitext(original_filename)[1] + if not original_ext: + return filepath + + # Check if current file already has this extension + current_ext = os.path.splitext(filepath)[1] + if current_ext == original_ext: + return filepath + + # Rename file to add extension + new_filepath = filepath + original_ext + try: + os.rename(filepath, new_filepath) + logger.debug("Renamed {0} to {1}".format(filepath, new_filepath)) + return new_filepath + except Exception as e: + logger.warning("Could not rename {0}: {1}".format(filepath, str(e))) + return filepath + + +def get_attachment_filename(url): + """Get filename from attachment URL, handling all GitHub formats. + + Formats: + - github.com/user-attachments/assets/{uuid} → uuid (add extension later) + - github.com/user-attachments/files/{id}/{filename} → filename + - github.com/{owner}/{repo}/files/{id}/{filename} → filename + - user-images.githubusercontent.com/{user}/{hash}.{ext} → hash.ext + - private-user-images.githubusercontent.com/...?jwt=... → extract from path + """ + from urllib.parse import urlparse + + parsed = urlparse(url) + path_parts = parsed.path.split("/") + + # Modern: /user-attachments/files/{id}/{filename} + if "user-attachments/files" in parsed.path: + return path_parts[-1] + + # Modern: /user-attachments/assets/{uuid} + elif "user-attachments/assets" in parsed.path: + return path_parts[-1] # extension added later via detect_and_add_extension + + # Repo files: /{owner}/{repo}/files/{id}/{filename} + elif "/files/" in parsed.path and len(path_parts) >= 2: + return path_parts[-1] + + # Legacy: user-images.githubusercontent.com/{user}/{hash-with-ext} + elif "githubusercontent.com" in parsed.netloc: + return path_parts[-1] # Already has extension usually + + # Fallback: use last path component + return path_parts[-1] if path_parts[-1] else "unknown_attachment" + + +def resolve_filename_collision(filepath): + """Resolve filename collisions using counter suffix pattern. + + If filepath exists, returns a new filepath with counter suffix. + Pattern: report.pdf → report_1.pdf → report_2.pdf + + Also protects against manifest.json collisions by treating it as reserved. + + Args: + filepath: Full path to file that might exist + + Returns: + filepath that doesn't collide (may be same as input if no collision) + """ + directory = os.path.dirname(filepath) + filename = os.path.basename(filepath) + + # Protect manifest.json - it's a reserved filename + if filename == "manifest.json": + name, ext = os.path.splitext(filename) + counter = 1 + while True: + new_filename = f"{name}_{counter}{ext}" + new_filepath = os.path.join(directory, new_filename) + if not os.path.exists(new_filepath): + return new_filepath + counter += 1 + + if not os.path.exists(filepath): + return filepath + + name, ext = os.path.splitext(filename) + + counter = 1 + while True: + new_filename = f"{name}_{counter}{ext}" + new_filepath = os.path.join(directory, new_filename) + if not os.path.exists(new_filepath): + return new_filepath + counter += 1 + + +def download_attachments(args, item_cwd, item_data, number, repository, item_type="issue"): + """Download user-attachments from issue/PR body and comments with manifest. + + Args: + args: Command line arguments + item_cwd: Working directory (issue_cwd or pulls_cwd) + item_data: Issue or PR data dict + number: Issue or PR number + repository: Repository dict + item_type: "issue" or "pull" for logging/manifest + """ + import json + from datetime import datetime, timezone + + item_type_display = "issue" if item_type == "issue" else "pull request" + + urls = extract_attachment_urls( + item_data, issue_number=number, repository_full_name=repository["full_name"] + ) + if not urls: + return + + attachments_dir = os.path.join(item_cwd, "attachments", str(number)) + manifest_path = os.path.join(attachments_dir, "manifest.json") + + # Load existing manifest if skip_existing is enabled + existing_urls = set() + existing_metadata = [] + if args.skip_existing and os.path.exists(manifest_path): + try: + with open(manifest_path, "r") as f: + existing_manifest = json.load(f) + all_metadata = existing_manifest.get("attachments", []) + # Only skip URLs that were successfully downloaded OR failed with permanent errors + # Retry transient failures (5xx, timeouts, network errors) + for item in all_metadata: + if item.get("success"): + existing_urls.add(item["url"]) + else: + # Check if this is a permanent failure (don't retry) or transient (retry) + http_status = item.get("http_status") + if http_status in [404, 410, 451]: + # Permanent failures - don't retry + existing_urls.add(item["url"]) + # Transient failures (5xx, auth errors, timeouts) will be retried + existing_metadata = all_metadata + except (json.JSONDecodeError, IOError): + # If manifest is corrupted, re-download everything + logger.warning( + "Corrupted manifest for {0} #{1}, will re-download".format( + item_type_display, number + ) + ) + existing_urls = set() + existing_metadata = [] + + # Filter to only new URLs + new_urls = [url for url in urls if url not in existing_urls] + + if not new_urls and existing_urls: + logger.debug( + "Skipping attachments for {0} #{1} (all {2} already downloaded)".format( + item_type_display, number, len(urls) + ) + ) + return + + if new_urls: + logger.info( + "Downloading {0} new attachment(s) for {1} #{2}".format( + len(new_urls), item_type_display, number + ) + ) + + mkdir_p(item_cwd, attachments_dir) + + # Collect metadata for manifest (start with existing) + attachment_metadata_list = existing_metadata[:] + + for url in new_urls: + filename = get_attachment_filename(url) + filepath = os.path.join(attachments_dir, filename) + + # Check for collision BEFORE downloading + filepath = resolve_filename_collision(filepath) + + # Download and get metadata + metadata = download_attachment_file( + url, + filepath, + get_auth(args, encode=not args.as_app), + as_app=args.as_app, + fine=args.token_fine is not None, + ) + + # Apply extension from Content-Disposition if available + if metadata["success"] and metadata.get("original_filename"): + final_filepath = extract_and_apply_extension( + filepath, metadata["original_filename"] + ) + # Check for collision again ONLY if filename changed (extension was added) + if final_filepath != filepath: + final_filepath = resolve_filename_collision(final_filepath) + # Update saved_as to reflect actual filename + metadata["saved_as"] = os.path.basename(final_filepath) + else: + metadata["saved_as"] = ( + os.path.basename(filepath) if metadata["success"] else None + ) + + attachment_metadata_list.append(metadata) + + # Write manifest + if attachment_metadata_list: + manifest = { + "issue_number": number, + "issue_type": item_type, + "repository": f"{args.user}/{args.repository}" + if hasattr(args, "repository") and args.repository + else args.user, + "manifest_updated_at": datetime.now(timezone.utc).isoformat(), + "attachments": attachment_metadata_list, + } + + manifest_path = os.path.join(attachments_dir, "manifest.json") + with open(manifest_path, "w") as f: + json.dump(manifest, f, indent=2) + logger.debug( + "Wrote manifest for {0} #{1}: {2} attachments".format( + item_type_display, number, len(attachment_metadata_list) + ) + ) + + def get_authenticated_user(args): template = "https://{0}/user".format(get_github_api_host(args)) data = retrieve_data(args, template, single_request=True) @@ -1157,6 +1757,10 @@ def backup_issues(args, repo_cwd, repository, repos_template): if args.include_issue_events or args.include_everything: template = events_template.format(number) issues[number]["event_data"] = retrieve_data(args, template) + if args.include_attachments: + download_attachments( + args, issue_cwd, issues[number], number, repository, item_type="issue" + ) with codecs.open(issue_file + ".temp", "w", encoding="utf-8") as f: json_dump(issue, f) @@ -1228,6 +1832,10 @@ def backup_pulls(args, repo_cwd, repository, repos_template): if args.include_pull_commits or args.include_everything: template = commits_template.format(number) pulls[number]["commit_data"] = retrieve_data(args, template) + if args.include_attachments: + download_attachments( + args, pulls_cwd, pulls[number], number, repository, item_type="pull" + ) with codecs.open(pull_file + ".temp", "w", encoding="utf-8") as f: json_dump(pull, f) From 1ed3d66777a848c37a4b5897357693290fa5b374 Mon Sep 17 00:00:00 2001 From: Rodos Date: Tue, 4 Nov 2025 09:10:22 +1100 Subject: [PATCH 037/148] refactor: Add atomic writes for attachment files and manifests --- github_backup/github_backup.py | 94 ++++++++++++++++------------------ 1 file changed, 45 insertions(+), 49 deletions(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index e8d9ae0..b0c2aef 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -948,6 +948,8 @@ def download_attachment_file(url, path, auth, as_app=False, fine=False): # Reuse S3HTTPRedirectHandler from download_file() opener = build_opener(S3HTTPRedirectHandler) + temp_path = path + ".temp" + try: response = opener.open(request) metadata["http_status"] = response.getcode() @@ -986,10 +988,10 @@ def download_attachment_file(url, path, auth, as_app=False, fine=False): if "." in filename_from_url: metadata["original_filename"] = filename_from_url - # Download file + # Download file to temporary location chunk_size = 16 * 1024 bytes_downloaded = 0 - with open(path, "wb") as f: + with open(temp_path, "wb") as f: while True: chunk = response.read(chunk_size) if not chunk: @@ -997,6 +999,9 @@ def download_attachment_file(url, path, auth, as_app=False, fine=False): f.write(chunk) bytes_downloaded += len(chunk) + # Atomic rename to final location + os.rename(temp_path, path) + metadata["size_bytes"] = bytes_downloaded metadata["success"] = True @@ -1027,6 +1032,12 @@ def download_attachment_file(url, path, auth, as_app=False, fine=False): logger.warning( "Skipping download of attachment {0} due to error: {1}".format(url, str(e)) ) + # Clean up temp file if it was partially created + if os.path.exists(temp_path): + try: + os.remove(temp_path) + except Exception: + pass return metadata @@ -1222,40 +1233,6 @@ def redirect_request(self, req, fp, code, msg, headers, newurl): return regex_urls -def extract_and_apply_extension(filepath, original_filename): - """Extract extension from original filename and rename file if needed. - - Args: - filepath: Current file path (may have no extension) - original_filename: Original filename from Content-Disposition (has extension) - - Returns: - Final filepath with extension applied - """ - if not original_filename or not os.path.exists(filepath): - return filepath - - # Get extension from original filename - original_ext = os.path.splitext(original_filename)[1] - if not original_ext: - return filepath - - # Check if current file already has this extension - current_ext = os.path.splitext(filepath)[1] - if current_ext == original_ext: - return filepath - - # Rename file to add extension - new_filepath = filepath + original_ext - try: - os.rename(filepath, new_filepath) - logger.debug("Renamed {0} to {1}".format(filepath, new_filepath)) - return new_filepath - except Exception as e: - logger.warning("Could not rename {0}: {1}".format(filepath, str(e))) - return filepath - - def get_attachment_filename(url): """Get filename from attachment URL, handling all GitHub formats. @@ -1333,7 +1310,9 @@ def resolve_filename_collision(filepath): counter += 1 -def download_attachments(args, item_cwd, item_data, number, repository, item_type="issue"): +def download_attachments( + args, item_cwd, item_data, number, repository, item_type="issue" +): """Download user-attachments from issue/PR body and comments with manifest. Args: @@ -1428,20 +1407,36 @@ def download_attachments(args, item_cwd, item_data, number, repository, item_typ fine=args.token_fine is not None, ) - # Apply extension from Content-Disposition if available + # If download succeeded but we got an extension from Content-Disposition, + # we may need to rename the file to add the extension if metadata["success"] and metadata.get("original_filename"): - final_filepath = extract_and_apply_extension( - filepath, metadata["original_filename"] - ) - # Check for collision again ONLY if filename changed (extension was added) - if final_filepath != filepath: + original_ext = os.path.splitext(metadata["original_filename"])[1] + current_ext = os.path.splitext(filepath)[1] + + # Add extension if not present + if original_ext and current_ext != original_ext: + final_filepath = filepath + original_ext + # Check for collision again with new extension final_filepath = resolve_filename_collision(final_filepath) - # Update saved_as to reflect actual filename - metadata["saved_as"] = os.path.basename(final_filepath) + logger.debug( + "Adding extension {0} to {1}".format(original_ext, filepath) + ) + + # Rename to add extension (already atomic from download) + try: + os.rename(filepath, final_filepath) + metadata["saved_as"] = os.path.basename(final_filepath) + except Exception as e: + logger.warning( + "Could not add extension to {0}: {1}".format(filepath, str(e)) + ) + metadata["saved_as"] = os.path.basename(filepath) + else: + metadata["saved_as"] = os.path.basename(filepath) + elif metadata["success"]: + metadata["saved_as"] = os.path.basename(filepath) else: - metadata["saved_as"] = ( - os.path.basename(filepath) if metadata["success"] else None - ) + metadata["saved_as"] = None attachment_metadata_list.append(metadata) @@ -1458,8 +1453,9 @@ def download_attachments(args, item_cwd, item_data, number, repository, item_typ } manifest_path = os.path.join(attachments_dir, "manifest.json") - with open(manifest_path, "w") as f: + with open(manifest_path + ".temp", "w") as f: json.dump(manifest, f, indent=2) + os.rename(manifest_path + ".temp", manifest_path) # Atomic write logger.debug( "Wrote manifest for {0} #{1}: {2} attachments".format( item_type_display, number, len(attachment_metadata_list) From e7880bb056307159e8c31ac7a3d917884cbcc9bc Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Thu, 6 Nov 2025 02:11:08 +0000 Subject: [PATCH 038/148] Release version 0.51.0 --- CHANGES.rst | 366 +++++++++++++++++++++++++++++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 366 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 960977f..50cbd09 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,9 +1,373 @@ Changelog ========= -0.50.3 (2025-08-08) +0.51.0 (2025-11-06) ------------------- ------------------------ + +Fix +~~~ +- Remove Python 3.8 and 3.9 from CI matrix. [Rodos] + + 3.8 and 3.9 are failing because the pinned dependencies don't support them: + - autopep8==2.3.2 needs Python 3.9+ + - bleach==6.3.0 needs Python 3.10+ + + Both are EOL now anyway (3.8 in Oct 2024, 3.9 in Oct 2025). + + Just fixing CI to test 3.10-3.14 for now. Will do a separate PR to formally + drop 3.8/3.9 support with python_requires and README updates. + +Other +~~~~~ +- Refactor: Add atomic writes for attachment files and manifests. + [Rodos] +- Feat: Add attachment download support for issues and pull requests. + [Rodos] + + Adds new --attachments flag that downloads user-uploaded files from + issue and PR bodies and comments. Key features: + + - Determines attachment URLs + - Tracks downloads in manifest.json with metadata + - Supports --skip-existing to avoid re-downloading + - Handles filename collisions with counter suffix + - Smart retry logic for transient vs permanent failures + - Uses Content-Disposition for correct file extensions +- Feat: Drop support for Python 3.8 and 3.9 (EOL) [Rodos] + + Both Python 3.8 and 3.9 have reached end-of-life: + - Python 3.8: EOL October 7, 2024 + - Python 3.9: EOL October 31, 2025 + + Changes: + - Add python_requires=">=3.10" to setup.py + - Remove Python 3.8 and 3.9 from classifiers + - Add Python 3.13 and 3.14 to classifiers + - Update README to document Python 3.10+ requirement +- Feat: Enforce Python 3.8+ requirement and add multi-version CI + testing. [Rodos] + + - Add python_requires=">=3.8" to setup.py to enforce minimum version at install time + - Update README to explicitly document Python 3.8+ requirement + - Add CI matrix to test lint/build on Python 3.8-3.14 (7 versions) + - Aligns with actual usage patterns (~99% of downloads on Python 3.8+) + - Prevents future PRs from inadvertently using incompatible syntax + + This change protects users by preventing installation on unsupported Python + versions and ensures contributors can see version requirements clearly. +- Chore(deps): bump bleach in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [bleach](https://github.com/mozilla/bleach). + + + Updates `bleach` from 6.2.0 to 6.3.0 + - [Changelog](https://github.com/mozilla/bleach/blob/main/CHANGES) + - [Commits](https://github.com/mozilla/bleach/compare/v6.2.0...v6.3.0) + + --- + updated-dependencies: + - dependency-name: bleach + dependency-version: 6.3.0 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + ... +- Chore(deps): bump charset-normalizer in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [charset-normalizer](https://github.com/jawah/charset_normalizer). + + + Updates `charset-normalizer` from 3.4.3 to 3.4.4 + - [Release notes](https://github.com/jawah/charset_normalizer/releases) + - [Changelog](https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md) + - [Commits](https://github.com/jawah/charset_normalizer/compare/3.4.3...3.4.4) + + --- + updated-dependencies: + - dependency-name: charset-normalizer + dependency-version: 3.4.4 + dependency-type: direct:production + update-type: version-update:semver-patch + dependency-group: python-packages + ... +- Chore(deps): bump idna from 3.10 to 3.11 in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [idna](https://github.com/kjd/idna). + + + Updates `idna` from 3.10 to 3.11 + - [Release notes](https://github.com/kjd/idna/releases) + - [Changelog](https://github.com/kjd/idna/blob/master/HISTORY.rst) + - [Commits](https://github.com/kjd/idna/compare/v3.10...v3.11) + + --- + updated-dependencies: + - dependency-name: idna + dependency-version: '3.11' + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + ... +- Chore(deps): bump the python-packages group across 1 directory with 2 + updates. [dependabot[bot]] + + Bumps the python-packages group with 2 updates in the / directory: [platformdirs](https://github.com/tox-dev/platformdirs) and [rich](https://github.com/Textualize/rich). + + + Updates `platformdirs` from 4.4.0 to 4.5.0 + - [Release notes](https://github.com/tox-dev/platformdirs/releases) + - [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst) + - [Commits](https://github.com/tox-dev/platformdirs/compare/4.4.0...4.5.0) + + Updates `rich` from 14.1.0 to 14.2.0 + - [Release notes](https://github.com/Textualize/rich/releases) + - [Changelog](https://github.com/Textualize/rich/blob/master/CHANGELOG.md) + - [Commits](https://github.com/Textualize/rich/compare/v14.1.0...v14.2.0) + + --- + updated-dependencies: + - dependency-name: platformdirs + dependency-version: 4.5.0 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + - dependency-name: rich + dependency-version: 14.2.0 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + ... +- Chore(deps): bump the python-packages group with 3 updates. + [dependabot[bot]] + + Bumps the python-packages group with 3 updates: [certifi](https://github.com/certifi/python-certifi), [click](https://github.com/pallets/click) and [markdown-it-py](https://github.com/executablebooks/markdown-it-py). + + + Updates `certifi` from 2025.8.3 to 2025.10.5 + - [Commits](https://github.com/certifi/python-certifi/compare/2025.08.03...2025.10.05) + + Updates `click` from 8.1.8 to 8.3.0 + - [Release notes](https://github.com/pallets/click/releases) + - [Changelog](https://github.com/pallets/click/blob/main/CHANGES.rst) + - [Commits](https://github.com/pallets/click/compare/8.1.8...8.3.0) + + Updates `markdown-it-py` from 3.0.0 to 4.0.0 + - [Release notes](https://github.com/executablebooks/markdown-it-py/releases) + - [Changelog](https://github.com/executablebooks/markdown-it-py/blob/master/CHANGELOG.md) + - [Commits](https://github.com/executablebooks/markdown-it-py/compare/v3.0.0...v4.0.0) + + --- + updated-dependencies: + - dependency-name: certifi + dependency-version: 2025.10.5 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + - dependency-name: click + dependency-version: 8.3.0 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + - dependency-name: markdown-it-py + dependency-version: 4.0.0 + dependency-type: direct:production + update-type: version-update:semver-major + dependency-group: python-packages + ... +- Chore(deps): bump docutils in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [docutils](https://github.com/rtfd/recommonmark). + + + Updates `docutils` from 0.22.1 to 0.22.2 + - [Changelog](https://github.com/readthedocs/recommonmark/blob/master/CHANGELOG.md) + - [Commits](https://github.com/rtfd/recommonmark/commits) + + --- + updated-dependencies: + - dependency-name: docutils + dependency-version: 0.22.2 + dependency-type: direct:production + update-type: version-update:semver-patch + dependency-group: python-packages + ... +- Chore(deps): bump the python-packages group across 1 directory with 2 + updates. [dependabot[bot]] + + Bumps the python-packages group with 2 updates in the / directory: [black](https://github.com/psf/black) and [docutils](https://github.com/rtfd/recommonmark). + + + Updates `black` from 25.1.0 to 25.9.0 + - [Release notes](https://github.com/psf/black/releases) + - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) + - [Commits](https://github.com/psf/black/compare/25.1.0...25.9.0) + + Updates `docutils` from 0.22 to 0.22.1 + - [Changelog](https://github.com/readthedocs/recommonmark/blob/master/CHANGELOG.md) + - [Commits](https://github.com/rtfd/recommonmark/commits) + + --- + updated-dependencies: + - dependency-name: black + dependency-version: 25.9.0 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + - dependency-name: docutils + dependency-version: 0.22.1 + dependency-type: direct:production + update-type: version-update:semver-patch + dependency-group: python-packages + ... +- Delete .github/ISSUE_TEMPLATE.md. [Jose Diaz-Gonzalez] +- Create feature.yaml. [Jose Diaz-Gonzalez] +- Delete .github/ISSUE_TEMPLATE/bug_report.md. [Jose Diaz-Gonzalez] +- Rename bug.md to bug.yaml. [Jose Diaz-Gonzalez] +- Chore: create bug template. [Jose Diaz-Gonzalez] +- Chore: Rename PULL_REQUEST.md to .github/PULL_REQUEST.md. [Jose Diaz- + Gonzalez] +- Chore: Rename ISSUE_TEMPLATE.md to .github/ISSUE_TEMPLATE.md. [Jose + Diaz-Gonzalez] +- Chore(deps): bump actions/setup-python from 5 to 6. [dependabot[bot]] + + Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6. + - [Release notes](https://github.com/actions/setup-python/releases) + - [Commits](https://github.com/actions/setup-python/compare/v5...v6) + + --- + updated-dependencies: + - dependency-name: actions/setup-python + dependency-version: '6' + dependency-type: direct:production + update-type: version-update:semver-major + ... +- Chore(deps): bump twine from 6.1.0 to 6.2.0 in the python-packages + group. [dependabot[bot]] + + Bumps the python-packages group with 1 update: [twine](https://github.com/pypa/twine). + + + Updates `twine` from 6.1.0 to 6.2.0 + - [Release notes](https://github.com/pypa/twine/releases) + - [Changelog](https://github.com/pypa/twine/blob/main/docs/changelog.rst) + - [Commits](https://github.com/pypa/twine/compare/6.1.0...6.2.0) + + --- + updated-dependencies: + - dependency-name: twine + dependency-version: 6.2.0 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + ... +- Chore(deps): bump more-itertools in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [more-itertools](https://github.com/more-itertools/more-itertools). + + + Updates `more-itertools` from 10.7.0 to 10.8.0 + - [Release notes](https://github.com/more-itertools/more-itertools/releases) + - [Commits](https://github.com/more-itertools/more-itertools/compare/v10.7.0...v10.8.0) + + --- + updated-dependencies: + - dependency-name: more-itertools + dependency-version: 10.8.0 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + ... +- Chore(deps): bump platformdirs in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [platformdirs](https://github.com/tox-dev/platformdirs). + + + Updates `platformdirs` from 4.3.8 to 4.4.0 + - [Release notes](https://github.com/tox-dev/platformdirs/releases) + - [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst) + - [Commits](https://github.com/tox-dev/platformdirs/compare/4.3.8...4.4.0) + + --- + updated-dependencies: + - dependency-name: platformdirs + dependency-version: 4.4.0 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + ... +- Chore(deps): bump actions/checkout from 4 to 5. [dependabot[bot]] + + Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5. + - [Release notes](https://github.com/actions/checkout/releases) + - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) + - [Commits](https://github.com/actions/checkout/compare/v4...v5) + + --- + updated-dependencies: + - dependency-name: actions/checkout + dependency-version: '5' + dependency-type: direct:production + update-type: version-update:semver-major + ... +- Chore(deps): bump requests in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [requests](https://github.com/psf/requests). + + + Updates `requests` from 2.32.4 to 2.32.5 + - [Release notes](https://github.com/psf/requests/releases) + - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) + - [Commits](https://github.com/psf/requests/compare/v2.32.4...v2.32.5) + + --- + updated-dependencies: + - dependency-name: requests + dependency-version: 2.32.5 + dependency-type: direct:production + update-type: version-update:semver-patch + dependency-group: python-packages + ... +- Chore: update Dockerfile to use Python 3.12 and improve dependency + installation. [Mateusz Hajder] +- Chore(deps): bump the python-packages group with 2 updates. + [dependabot[bot]] + + Bumps the python-packages group with 2 updates: [certifi](https://github.com/certifi/python-certifi) and [charset-normalizer](https://github.com/jawah/charset_normalizer). + + + Updates `certifi` from 2025.7.14 to 2025.8.3 + - [Commits](https://github.com/certifi/python-certifi/compare/2025.07.14...2025.08.03) + + Updates `charset-normalizer` from 3.4.2 to 3.4.3 + - [Release notes](https://github.com/jawah/charset_normalizer/releases) + - [Changelog](https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md) + - [Commits](https://github.com/jawah/charset_normalizer/compare/3.4.2...3.4.3) + + --- + updated-dependencies: + - dependency-name: certifi + dependency-version: 2025.8.3 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + - dependency-name: charset-normalizer + dependency-version: 3.4.3 + dependency-type: direct:production + update-type: version-update:semver-patch + dependency-group: python-packages + ... + + +0.50.3 (2025-08-08) +------------------- - Revert "Add conditional check for git checkout in development path" [Eric Wheeler] diff --git a/github_backup/__init__.py b/github_backup/__init__.py index e7d2f93..d942e9e 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.50.3" +__version__ = "0.51.0" From c8c585cbb5634ebd4db7c85a4fca1742d48537b2 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu, 6 Nov 2025 13:09:51 +0000 Subject: [PATCH 039/148] chore(deps): bump docutils in the python-packages group Bumps the python-packages group with 1 update: [docutils](https://github.com/rtfd/recommonmark). Updates `docutils` from 0.22.2 to 0.22.3 - [Changelog](https://github.com/readthedocs/recommonmark/blob/master/CHANGELOG.md) - [Commits](https://github.com/rtfd/recommonmark/commits) --- updated-dependencies: - dependency-name: docutils dependency-version: 0.22.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index bd9ebf2..8e05be0 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -5,7 +5,7 @@ certifi==2025.10.5 charset-normalizer==3.4.4 click==8.3.0 colorama==0.4.6 -docutils==0.22.2 +docutils==0.22.3 flake8==7.3.0 gitchangelog==3.0.4 idna==3.11 From 56db3ff0e81a63324e31935f1d669e4bfd3d5426 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 10 Nov 2025 13:59:47 +0000 Subject: [PATCH 040/148] chore(deps): bump black in the python-packages group Bumps the python-packages group with 1 update: [black](https://github.com/psf/black). Updates `black` from 25.9.0 to 25.11.0 - [Release notes](https://github.com/psf/black/releases) - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) - [Commits](https://github.com/psf/black/compare/25.9.0...25.11.0) --- updated-dependencies: - dependency-name: black dependency-version: 25.11.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index 8e05be0..b3e9f19 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -1,5 +1,5 @@ autopep8==2.3.2 -black==25.9.0 +black==25.11.0 bleach==6.3.0 certifi==2025.10.5 charset-normalizer==3.4.4 From a98ff7f23df8bb6356ec30a4c7e22bc39d9ee771 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed, 12 Nov 2025 13:11:06 +0000 Subject: [PATCH 041/148] chore(deps): bump certifi in the python-packages group Bumps the python-packages group with 1 update: [certifi](https://github.com/certifi/python-certifi). Updates `certifi` from 2025.10.5 to 2025.11.12 - [Commits](https://github.com/certifi/python-certifi/compare/2025.10.05...2025.11.12) --- updated-dependencies: - dependency-name: certifi dependency-version: 2025.11.12 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index b3e9f19..0a695b3 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -1,7 +1,7 @@ autopep8==2.3.2 black==25.11.0 bleach==6.3.0 -certifi==2025.10.5 +certifi==2025.11.12 charset-normalizer==3.4.4 click==8.3.0 colorama==0.4.6 From 7a9455db88884571faef1f17044003c4e6460836 Mon Sep 17 00:00:00 2001 From: Rodos Date: Fri, 14 Nov 2025 10:17:08 +1100 Subject: [PATCH 042/148] fix: Prevent duplicate attachment downloads Fixes bug where attachments were downloaded multiple times with incremented filenames (file.mov, file_1.mov, file_2.mov) when running backups without --skip-existing flag. I should not have used the --skip-existing flag for attachments, it did not do what I thought it did. The correct approach is to always use the manifest to guide what has already been downloaded and what now needs to be done. --- github_backup/github_backup.py | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index b0c2aef..d1828d0 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -919,12 +919,6 @@ def download_attachment_file(url, path, auth, as_app=False, fine=False): "error": None, } - if os.path.exists(path): - metadata["success"] = True - metadata["http_status"] = 200 # Assume success if already exists - metadata["size_bytes"] = os.path.getsize(path) - return metadata - # Create simple request (no API query params) request = Request(url) request.add_header("Accept", "application/octet-stream") @@ -1337,10 +1331,10 @@ def download_attachments( attachments_dir = os.path.join(item_cwd, "attachments", str(number)) manifest_path = os.path.join(attachments_dir, "manifest.json") - # Load existing manifest if skip_existing is enabled + # Load existing manifest to prevent duplicate downloads existing_urls = set() existing_metadata = [] - if args.skip_existing and os.path.exists(manifest_path): + if os.path.exists(manifest_path): try: with open(manifest_path, "r") as f: existing_manifest = json.load(f) @@ -1395,9 +1389,6 @@ def download_attachments( filename = get_attachment_filename(url) filepath = os.path.join(attachments_dir, filename) - # Check for collision BEFORE downloading - filepath = resolve_filename_collision(filepath) - # Download and get metadata metadata = download_attachment_file( url, From e4d1c789937fe1ccf7934613ccfbc63fd8b8ab9b Mon Sep 17 00:00:00 2001 From: Rodos Date: Fri, 14 Nov 2025 10:23:29 +1100 Subject: [PATCH 043/148] test: Add pytest infrastructure and attachment tests In making my last fix to attachments, I found it challenging not having tests to ensure there was no regression. Added pytest with minimal setup and isolated configuration. Created a separate test workflow to keep tests isolated from linting. Tests cover the key elements of the attachment logic: - URL extraction from issue bodies - Filename extraction from different URL types - Filename collision resolution - Manifest duplicate prevention --- .github/workflows/test.yml | 33 ++++ pytest.ini | 6 + release-requirements.txt | 1 + tests/__init__.py | 1 + tests/test_attachments.py | 353 +++++++++++++++++++++++++++++++++++++ 5 files changed, 394 insertions(+) create mode 100644 .github/workflows/test.yml create mode 100644 pytest.ini create mode 100644 tests/__init__.py create mode 100644 tests/test_attachments.py diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml new file mode 100644 index 0000000..fb43350 --- /dev/null +++ b/.github/workflows/test.yml @@ -0,0 +1,33 @@ +--- +name: "test" + +# yamllint disable-line rule:truthy +on: + pull_request: + branches: + - "*" + push: + branches: + - "main" + - "master" + +jobs: + test: + name: test + runs-on: ubuntu-24.04 + strategy: + matrix: + python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"] + + steps: + - name: Checkout repository + uses: actions/checkout@v5 + with: + fetch-depth: 0 + - name: Setup Python + uses: actions/setup-python@v6 + with: + python-version: ${{ matrix.python-version }} + cache: "pip" + - run: pip install -r release-requirements.txt + - run: pytest tests/ -v diff --git a/pytest.ini b/pytest.ini new file mode 100644 index 0000000..a1edb37 --- /dev/null +++ b/pytest.ini @@ -0,0 +1,6 @@ +[pytest] +testpaths = tests +python_files = test_*.py +python_classes = Test* +python_functions = test_* +addopts = -v diff --git a/release-requirements.txt b/release-requirements.txt index b3e9f19..2a9b2ba 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -8,6 +8,7 @@ colorama==0.4.6 docutils==0.22.3 flake8==7.3.0 gitchangelog==3.0.4 +pytest==8.3.3 idna==3.11 importlib-metadata==8.7.0 jaraco.classes==3.4.0 diff --git a/tests/__init__.py b/tests/__init__.py new file mode 100644 index 0000000..5675dbd --- /dev/null +++ b/tests/__init__.py @@ -0,0 +1 @@ +"""Tests for python-github-backup.""" diff --git a/tests/test_attachments.py b/tests/test_attachments.py new file mode 100644 index 0000000..07c1b33 --- /dev/null +++ b/tests/test_attachments.py @@ -0,0 +1,353 @@ +"""Behavioral tests for attachment functionality.""" + +import json +import os +import tempfile +from pathlib import Path +from unittest.mock import Mock + +import pytest + +from github_backup import github_backup + + +@pytest.fixture +def attachment_test_setup(tmp_path): + """Fixture providing setup and helper for attachment download tests.""" + from unittest.mock import patch + + issue_cwd = tmp_path / "issues" + issue_cwd.mkdir() + + # Mock args + args = Mock() + args.as_app = False + args.token_fine = None + args.token_classic = None + args.username = None + args.password = None + args.osx_keychain_item_name = None + args.osx_keychain_item_account = None + args.user = "testuser" + args.repository = "testrepo" + + repository = {"full_name": "testuser/testrepo"} + + def call_download(issue_data, issue_number=123): + """Call download_attachments with mocked HTTP downloads. + + Returns list of URLs that were actually downloaded. + """ + downloaded_urls = [] + + def mock_download(url, path, auth, as_app, fine): + downloaded_urls.append(url) + return { + "success": True, + "saved_as": os.path.basename(path), + "url": url, + } + + with patch( + "github_backup.github_backup.download_attachment_file", + side_effect=mock_download, + ): + github_backup.download_attachments( + args, str(issue_cwd), issue_data, issue_number, repository + ) + + return downloaded_urls + + return { + "issue_cwd": str(issue_cwd), + "args": args, + "repository": repository, + "call_download": call_download, + } + + +class TestURLExtraction: + """Test URL extraction with realistic issue content.""" + + def test_mixed_urls(self): + issue_data = { + "body": """ + ## Bug Report + + When uploading files, I see this error. Here's a screenshot: + https://github.com/user-attachments/assets/abc123def456 + + The logs show: https://github.com/user-attachments/files/789/error-log.txt + + This is similar to https://github.com/someorg/somerepo/issues/42 but different. + + You can also see the video at https://user-images.githubusercontent.com/12345/video-demo.mov + + Here's how to reproduce: + ```bash + # Don't extract this example URL: + curl https://github.com/user-attachments/assets/example999 + ``` + + More info at https://docs.example.com/guide + + Also see this inline code `https://github.com/user-attachments/files/111/inline.pdf` should not extract. + + Final attachment: https://github.com/user-attachments/files/222/report.pdf. + """, + "comment_data": [ + { + "body": "Here's another attachment: https://private-user-images.githubusercontent.com/98765/secret.png?jwt=token123" + }, + { + "body": """ + Example code: + ```python + url = "https://github.com/user-attachments/assets/code-example" + ``` + But this is real: https://github.com/user-attachments/files/333/actual.zip + """ + }, + ], + } + + # Extract URLs + urls = github_backup.extract_attachment_urls(issue_data) + + expected_urls = [ + "https://github.com/user-attachments/assets/abc123def456", + "https://github.com/user-attachments/files/789/error-log.txt", + "https://user-images.githubusercontent.com/12345/video-demo.mov", + "https://github.com/user-attachments/files/222/report.pdf", + "https://private-user-images.githubusercontent.com/98765/secret.png?jwt=token123", + "https://github.com/user-attachments/files/333/actual.zip", + ] + + assert set(urls) == set(expected_urls) + + def test_trailing_punctuation_stripped(self): + """URLs with trailing punctuation should have punctuation stripped.""" + issue_data = { + "body": """ + See this file: https://github.com/user-attachments/files/1/doc.pdf. + And this one (https://github.com/user-attachments/files/2/image.png). + Check it out! https://github.com/user-attachments/files/3/data.csv! + """ + } + + urls = github_backup.extract_attachment_urls(issue_data) + + expected = [ + "https://github.com/user-attachments/files/1/doc.pdf", + "https://github.com/user-attachments/files/2/image.png", + "https://github.com/user-attachments/files/3/data.csv", + ] + assert set(urls) == set(expected) + + def test_deduplication_across_body_and_comments(self): + """Same URL in body and comments should only appear once.""" + duplicate_url = "https://github.com/user-attachments/assets/abc123" + + issue_data = { + "body": f"First mention: {duplicate_url}", + "comment_data": [ + {"body": f"Second mention: {duplicate_url}"}, + {"body": f"Third mention: {duplicate_url}"}, + ], + } + + urls = github_backup.extract_attachment_urls(issue_data) + + assert set(urls) == {duplicate_url} + + +class TestFilenameExtraction: + """Test filename extraction from different URL types.""" + + def test_modern_assets_url(self): + """Modern assets URL returns UUID.""" + url = "https://github.com/user-attachments/assets/abc123def456" + filename = github_backup.get_attachment_filename(url) + assert filename == "abc123def456" + + def test_modern_files_url(self): + """Modern files URL returns filename.""" + url = "https://github.com/user-attachments/files/12345/report.pdf" + filename = github_backup.get_attachment_filename(url) + assert filename == "report.pdf" + + def test_legacy_cdn_url(self): + """Legacy CDN URL returns filename with extension.""" + url = "https://user-images.githubusercontent.com/123456/abc-def.png" + filename = github_backup.get_attachment_filename(url) + assert filename == "abc-def.png" + + def test_private_cdn_url(self): + """Private CDN URL returns filename.""" + url = "https://private-user-images.githubusercontent.com/98765/secret.png?jwt=token123" + filename = github_backup.get_attachment_filename(url) + assert filename == "secret.png" + + def test_repo_files_url(self): + """Repo-scoped files URL returns filename.""" + url = "https://github.com/owner/repo/files/789/document.txt" + filename = github_backup.get_attachment_filename(url) + assert filename == "document.txt" + + +class TestFilenameCollision: + """Test filename collision resolution.""" + + def test_collision_behavior(self): + """Test filename collision resolution with real files.""" + with tempfile.TemporaryDirectory() as tmpdir: + # No collision - file doesn't exist + result = github_backup.resolve_filename_collision( + os.path.join(tmpdir, "report.pdf") + ) + assert result == os.path.join(tmpdir, "report.pdf") + + # Create the file, now collision exists + Path(os.path.join(tmpdir, "report.pdf")).touch() + result = github_backup.resolve_filename_collision( + os.path.join(tmpdir, "report.pdf") + ) + assert result == os.path.join(tmpdir, "report_1.pdf") + + # Create report_1.pdf too + Path(os.path.join(tmpdir, "report_1.pdf")).touch() + result = github_backup.resolve_filename_collision( + os.path.join(tmpdir, "report.pdf") + ) + assert result == os.path.join(tmpdir, "report_2.pdf") + + def test_manifest_reserved(self): + """manifest.json is always treated as reserved.""" + with tempfile.TemporaryDirectory() as tmpdir: + # Even if manifest.json doesn't exist, should get manifest_1.json + result = github_backup.resolve_filename_collision( + os.path.join(tmpdir, "manifest.json") + ) + assert result == os.path.join(tmpdir, "manifest_1.json") + + +class TestManifestDuplicatePrevention: + """Test that manifest prevents duplicate downloads (the bug fix).""" + + def test_manifest_filters_existing_urls(self, attachment_test_setup): + """URLs in manifest are not re-downloaded.""" + setup = attachment_test_setup + + # Create manifest with existing URLs + attachments_dir = os.path.join(setup["issue_cwd"], "attachments", "123") + os.makedirs(attachments_dir) + manifest_path = os.path.join(attachments_dir, "manifest.json") + + manifest = { + "attachments": [ + { + "url": "https://github.com/user-attachments/assets/old1", + "success": True, + "saved_as": "old1.pdf", + }, + { + "url": "https://github.com/user-attachments/assets/old2", + "success": True, + "saved_as": "old2.pdf", + }, + ] + } + with open(manifest_path, "w") as f: + json.dump(manifest, f) + + # Issue data with 2 old URLs and 1 new URL + issue_data = { + "body": """ + Old: https://github.com/user-attachments/assets/old1 + Old: https://github.com/user-attachments/assets/old2 + New: https://github.com/user-attachments/assets/new1 + """ + } + + downloaded_urls = setup["call_download"](issue_data) + + # Should only download the NEW URL (old ones filtered by manifest) + assert len(downloaded_urls) == 1 + assert downloaded_urls[0] == "https://github.com/user-attachments/assets/new1" + + def test_no_manifest_downloads_all(self, attachment_test_setup): + """Without manifest, all URLs should be downloaded.""" + setup = attachment_test_setup + + # Issue data with 2 URLs + issue_data = { + "body": """ + https://github.com/user-attachments/assets/url1 + https://github.com/user-attachments/assets/url2 + """ + } + + downloaded_urls = setup["call_download"](issue_data) + + # Should download ALL URLs (no manifest to filter) + assert len(downloaded_urls) == 2 + assert set(downloaded_urls) == { + "https://github.com/user-attachments/assets/url1", + "https://github.com/user-attachments/assets/url2", + } + + def test_manifest_skips_permanent_failures(self, attachment_test_setup): + """Manifest skips permanent failures (404, 410) but retries transient (503).""" + setup = attachment_test_setup + + # Create manifest with different failure types + attachments_dir = os.path.join(setup["issue_cwd"], "attachments", "123") + os.makedirs(attachments_dir) + manifest_path = os.path.join(attachments_dir, "manifest.json") + + manifest = { + "attachments": [ + { + "url": "https://github.com/user-attachments/assets/success", + "success": True, + "saved_as": "success.pdf", + }, + { + "url": "https://github.com/user-attachments/assets/notfound", + "success": False, + "http_status": 404, + }, + { + "url": "https://github.com/user-attachments/assets/gone", + "success": False, + "http_status": 410, + }, + { + "url": "https://github.com/user-attachments/assets/unavailable", + "success": False, + "http_status": 503, + }, + ] + } + with open(manifest_path, "w") as f: + json.dump(manifest, f) + + # Issue data has all 4 URLs + issue_data = { + "body": """ + https://github.com/user-attachments/assets/success + https://github.com/user-attachments/assets/notfound + https://github.com/user-attachments/assets/gone + https://github.com/user-attachments/assets/unavailable + """ + } + + downloaded_urls = setup["call_download"](issue_data) + + # Should only retry 503 (transient failure) + # Success, 404, and 410 should be skipped + assert len(downloaded_urls) == 1 + assert ( + downloaded_urls[0] + == "https://github.com/user-attachments/assets/unavailable" + ) From 1ec0820936c420b52e77eaefdf903098e2f2cb8d Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Sun, 16 Nov 2025 02:01:39 +0000 Subject: [PATCH 044/148] Release version 0.51.1 --- CHANGES.rst | 90 ++++++++++++++++++++++++++++++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 90 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 50cbd09..269a77b 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,10 +1,98 @@ Changelog ========= -0.51.0 (2025-11-06) +0.51.1 (2025-11-16) ------------------- ------------------------ +Fix +~~~ +- Prevent duplicate attachment downloads. [Rodos] + + Fixes bug where attachments were downloaded multiple times with + incremented filenames (file.mov, file_1.mov, file_2.mov) when + running backups without --skip-existing flag. + + I should not have used the --skip-existing flag for attachments, + it did not do what I thought it did. + + The correct approach is to always use the manifest to guide what + has already been downloaded and what now needs to be done. + +Other +~~~~~ +- Chore(deps): bump certifi in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [certifi](https://github.com/certifi/python-certifi). + + + Updates `certifi` from 2025.10.5 to 2025.11.12 + - [Commits](https://github.com/certifi/python-certifi/compare/2025.10.05...2025.11.12) + + --- + updated-dependencies: + - dependency-name: certifi + dependency-version: 2025.11.12 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + ... +- Test: Add pytest infrastructure and attachment tests. [Rodos] + + In making my last fix to attachments, I found it challenging not + having tests to ensure there was no regression. + + Added pytest with minimal setup and isolated configuration. Created + a separate test workflow to keep tests isolated from linting. + + Tests cover the key elements of the attachment logic: + - URL extraction from issue bodies + - Filename extraction from different URL types + - Filename collision resolution + - Manifest duplicate prevention +- Chore(deps): bump black in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [black](https://github.com/psf/black). + + + Updates `black` from 25.9.0 to 25.11.0 + - [Release notes](https://github.com/psf/black/releases) + - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) + - [Commits](https://github.com/psf/black/compare/25.9.0...25.11.0) + + --- + updated-dependencies: + - dependency-name: black + dependency-version: 25.11.0 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + ... +- Chore(deps): bump docutils in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [docutils](https://github.com/rtfd/recommonmark). + + + Updates `docutils` from 0.22.2 to 0.22.3 + - [Changelog](https://github.com/readthedocs/recommonmark/blob/master/CHANGELOG.md) + - [Commits](https://github.com/rtfd/recommonmark/commits) + + --- + updated-dependencies: + - dependency-name: docutils + dependency-version: 0.22.3 + dependency-type: direct:production + update-type: version-update:semver-patch + dependency-group: python-packages + ... + + +0.51.0 (2025-11-06) +------------------- + Fix ~~~ - Remove Python 3.8 and 3.9 from CI matrix. [Rodos] diff --git a/github_backup/__init__.py b/github_backup/__init__.py index d942e9e..d280604 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.51.0" +__version__ = "0.51.1" From 90ba839c7d7e121ac5bc3865e2f9f3e02a9774ec Mon Sep 17 00:00:00 2001 From: Rodos Date: Thu, 13 Nov 2025 15:46:06 +1100 Subject: [PATCH 045/148] fix: Improve CA certificate detection with fallback chain The previous implementation incorrectly assumed empty get_ca_certs() meant broken SSL, causing false failures in GitHub Codespaces and other directory-based cert systems where certificates exist but aren't pre-loaded. It would then attempt to import certifi as a workaround, but certifi wasn't listed in requirements.txt, causing the fallback to fail with ImportError even though the system certificates would have worked fine. This commit replaces the naive check with a layered fallback approach that checks multiple certificate sources. First it checks for pre-loaded system certs (file-based systems). Then it verifies system cert paths exist (directory-based systems like Ubuntu/Debian/Codespaces). Finally it attempts to use certifi as an optional fallback only if needed. This approach eliminates hard dependencies (certifi is now optional), works in GitHub Codespaces without any setup, and fails gracefully with clear hints for resolution when SSL is actually broken rather than failing with ModuleNotFoundError. Fixes #444 --- github_backup/github_backup.py | 41 +++++++++++++++++++++------------- requirements.txt | 1 - 2 files changed, 26 insertions(+), 16 deletions(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index b0c2aef..b69ba4a 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -37,22 +37,33 @@ FILE_URI_PREFIX = "file://" logger = logging.getLogger(__name__) +# Setup SSL context with fallback chain https_ctx = ssl.create_default_context() -if not https_ctx.get_ca_certs(): - import warnings - - warnings.warn( - "\n\nYOUR DEFAULT CA CERTS ARE EMPTY.\n" - + "PLEASE POPULATE ANY OF:" - + "".join( - ["\n - " + x for x in ssl.get_default_verify_paths() if type(x) is str] - ) - + "\n", - stacklevel=2, - ) - import certifi - - https_ctx = ssl.create_default_context(cafile=certifi.where()) +if https_ctx.get_ca_certs(): + # Layer 1: Certificates pre-loaded from system (file-based) + pass +else: + paths = ssl.get_default_verify_paths() + if (paths.cafile and os.path.exists(paths.cafile)) or ( + paths.capath and os.path.exists(paths.capath) + ): + # Layer 2: Cert paths exist, will be lazy-loaded on first use (directory-based) + pass + else: + # Layer 3: Try certifi package as optional fallback + try: + import certifi + + https_ctx = ssl.create_default_context(cafile=certifi.where()) + except ImportError: + # All layers failed - no certificates available anywhere + sys.exit( + "\nERROR: No CA certificates found. Cannot connect to GitHub over SSL.\n\n" + "Solutions you can explore:\n" + " 1. pip install certifi\n" + " 2. Alpine: apk add ca-certificates\n" + " 3. Debian/Ubuntu: apt-get install ca-certificates\n\n" + ) def logging_subprocess( diff --git a/requirements.txt b/requirements.txt index 8b13789..e69de29 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1 +0,0 @@ - From 72d35a9b94a22b4a3fe4589749d6f9b4fc8d3970 Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Sun, 16 Nov 2025 23:55:36 +0000 Subject: [PATCH 046/148] Release version 0.51.2 --- CHANGES.rst | 30 +++++++++++++++++++++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 269a77b..ce23331 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,10 +1,38 @@ Changelog ========= -0.51.1 (2025-11-16) +0.51.2 (2025-11-16) ------------------- ------------------------ +Fix +~~~ +- Improve CA certificate detection with fallback chain. [Rodos] + + The previous implementation incorrectly assumed empty get_ca_certs() + meant broken SSL, causing false failures in GitHub Codespaces and other + directory-based cert systems where certificates exist but aren't pre-loaded. + It would then attempt to import certifi as a workaround, but certifi wasn't + listed in requirements.txt, causing the fallback to fail with ImportError + even though the system certificates would have worked fine. + + This commit replaces the naive check with a layered fallback approach that + checks multiple certificate sources. First it checks for pre-loaded system + certs (file-based systems). Then it verifies system cert paths exist + (directory-based systems like Ubuntu/Debian/Codespaces). Finally it attempts + to use certifi as an optional fallback only if needed. + + This approach eliminates hard dependencies (certifi is now optional), works + in GitHub Codespaces without any setup, and fails gracefully with clear hints + for resolution when SSL is actually broken rather than failing with + ModuleNotFoundError. + + Fixes #444 + + +0.51.1 (2025-11-16) +------------------- + Fix ~~~ - Prevent duplicate attachment downloads. [Rodos] diff --git a/github_backup/__init__.py b/github_backup/__init__.py index d280604..210a2d0 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.51.1" +__version__ = "0.51.2" From 755182967749cfdd482bb311812bc97442265941 Mon Sep 17 00:00:00 2001 From: Helio Machado <0x2b3bfa0+git@googlemail.com> Date: Mon, 17 Nov 2025 02:09:29 +0100 Subject: [PATCH 047/148] Use cursor based pagination --- github_backup/github_backup.py | 69 ++++++++++++++++++++-------------- 1 file changed, 40 insertions(+), 29 deletions(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 8abca62..14f0ed8 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -592,27 +592,26 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False): auth = get_auth(args, encode=not args.as_app) query_args = get_query_args(query_args) per_page = 100 - page = 0 + next_url = None while True: if single_request: - request_page, request_per_page = None, None + request_per_page = None else: - page = page + 1 - request_page, request_per_page = page, per_page + request_per_page = per_page request = _construct_request( request_per_page, - request_page, query_args, - template, + next_url or template, auth, as_app=args.as_app, fine=True if args.token_fine is not None else False, ) # noqa - r, errors = _get_response(request, auth, template) + r, errors = _get_response(request, auth, next_url or template) status_code = int(r.getcode()) + # Check if we got correct data try: response = json.loads(r.read().decode("utf-8")) @@ -644,15 +643,14 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False): retries += 1 time.sleep(5) request = _construct_request( - per_page, - page, + request_per_page, query_args, - template, + next_url or template, auth, as_app=args.as_app, fine=True if args.token_fine is not None else False, ) # noqa - r, errors = _get_response(request, auth, template) + r, errors = _get_response(request, auth, next_url or template) status_code = int(r.getcode()) try: @@ -682,7 +680,16 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False): if type(response) is list: for resp in response: yield resp - if len(response) < per_page: + # Parse Link header for next page URL (cursor-based pagination) + link_header = r.headers.get("Link", "") + next_url = None + if link_header: + # Parse Link header: ; rel="next" + for link in link_header.split(","): + if 'rel="next"' in link: + next_url = link[link.find("<") + 1:link.find(">")] + break + if not next_url: break elif type(response) is dict and single_request: yield response @@ -735,22 +742,27 @@ def _get_response(request, auth, template): def _construct_request( - per_page, page, query_args, template, auth, as_app=None, fine=False + per_page, query_args, template, auth, as_app=None, fine=False ): - all_query_args = {} - if per_page: - all_query_args["per_page"] = per_page - if page: - all_query_args["page"] = page - if query_args: - all_query_args.update(query_args) - - request_url = template - if all_query_args: - querystring = urlencode(all_query_args) - request_url = template + "?" + querystring + # If template is already a full URL with query params (from Link header), use it directly + if "?" in template and template.startswith("http"): + request_url = template + # Extract query string for logging + querystring = template.split("?", 1)[1] else: - querystring = "" + # Build URL with query parameters + all_query_args = {} + if per_page: + all_query_args["per_page"] = per_page + if query_args: + all_query_args.update(query_args) + + request_url = template + if all_query_args: + querystring = urlencode(all_query_args) + request_url = template + "?" + querystring + else: + querystring = "" request = Request(request_url) if auth is not None: @@ -766,7 +778,7 @@ def _construct_request( "Accept", "application/vnd.github.machine-man-preview+json" ) - log_url = template + log_url = template if "?" not in template else template.split("?")[0] if querystring: log_url += "?" + querystring logger.info("Requesting {}".format(log_url)) @@ -843,8 +855,7 @@ def download_file(url, path, auth, as_app=False, fine=False): return request = _construct_request( - per_page=100, - page=1, + per_page=None, query_args={}, template=url, auth=auth, From 5af522a34841bf7d56221449bac2a7dc3c8d97b1 Mon Sep 17 00:00:00 2001 From: Rodos Date: Mon, 17 Nov 2025 17:14:29 +1100 Subject: [PATCH 048/148] test: Add pagination tests for cursor and page-based Link headers --- tests/test_pagination.py | 153 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 153 insertions(+) create mode 100644 tests/test_pagination.py diff --git a/tests/test_pagination.py b/tests/test_pagination.py new file mode 100644 index 0000000..0d5bd82 --- /dev/null +++ b/tests/test_pagination.py @@ -0,0 +1,153 @@ +"""Tests for Link header pagination handling.""" + +import json +from unittest.mock import Mock, patch + +import pytest + +from github_backup import github_backup + + +class MockHTTPResponse: + """Mock HTTP response for paginated API calls.""" + + def __init__(self, data, link_header=None): + self._content = json.dumps(data).encode("utf-8") + self._link_header = link_header + self._read = False + self.reason = "OK" + + def getcode(self): + return 200 + + def read(self): + if self._read: + return b"" + self._read = True + return self._content + + def get_header(self, name, default=None): + """Mock method for headers.get().""" + return self.headers.get(name, default) + + @property + def headers(self): + headers = {"x-ratelimit-remaining": "5000"} + if self._link_header: + headers["Link"] = self._link_header + return headers + + +@pytest.fixture +def mock_args(): + """Mock args for retrieve_data_gen.""" + args = Mock() + args.as_app = False + args.token_fine = None + args.token_classic = "fake_token" + args.username = None + args.password = None + args.osx_keychain_item_name = None + args.osx_keychain_item_account = None + args.throttle_limit = None + args.throttle_pause = 0 + return args + + +def test_cursor_based_pagination(mock_args): + """Link header with 'after' cursor parameter works correctly.""" + + # Simulate issues endpoint behavior: returns cursor in Link header + responses = [ + # Issues endpoint returns 'after' cursor parameter (not 'page') + MockHTTPResponse( + data=[{"issue": i} for i in range(1, 101)], # Page 1 contents + link_header='; rel="next"', + ), + MockHTTPResponse( + data=[{"issue": i} for i in range(101, 151)], # Page 2 contents + link_header=None, # No Link header - signals end of pagination + ), + ] + requests_made = [] + + def mock_urlopen(request, *args, **kwargs): + url = request.get_full_url() + requests_made.append(url) + return responses[len(requests_made) - 1] + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + results = list( + github_backup.retrieve_data_gen( + mock_args, "https://api.github.com/repos/owner/repo/issues" + ) + ) + + # Verify all items retrieved and cursor was used in second request + assert len(results) == 150 + assert len(requests_made) == 2 + assert "after=ABC123" in requests_made[1] + + +def test_page_based_pagination(mock_args): + """Link header with 'page' parameter works correctly.""" + + # Simulate pulls/repos endpoint behavior: returns page numbers in Link header + responses = [ + # Pulls endpoint uses traditional 'page' parameter (not cursor) + MockHTTPResponse( + data=[{"pull": i} for i in range(1, 101)], # Page 1 contents + link_header='; rel="next"', + ), + MockHTTPResponse( + data=[{"pull": i} for i in range(101, 181)], # Page 2 contents + link_header=None, # No Link header - signals end of pagination + ), + ] + requests_made = [] + + def mock_urlopen(request, *args, **kwargs): + url = request.get_full_url() + requests_made.append(url) + return responses[len(requests_made) - 1] + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + results = list( + github_backup.retrieve_data_gen( + mock_args, "https://api.github.com/repos/owner/repo/pulls" + ) + ) + + # Verify all items retrieved and page parameter was used (not cursor) + assert len(results) == 180 + assert len(requests_made) == 2 + assert "page=2" in requests_made[1] + assert "after" not in requests_made[1] + + +def test_no_link_header_stops_pagination(mock_args): + """Pagination stops when Link header is absent.""" + + # Simulate endpoint with results that fit in a single page + responses = [ + MockHTTPResponse( + data=[{"label": i} for i in range(1, 51)], # Page contents + link_header=None, # No Link header - signals end of pagination + ) + ] + requests_made = [] + + def mock_urlopen(request, *args, **kwargs): + requests_made.append(request.get_full_url()) + return responses[len(requests_made) - 1] + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + results = list( + github_backup.retrieve_data_gen( + mock_args, "https://api.github.com/repos/owner/repo/labels" + ) + ) + + # Verify pagination stopped after first request + assert len(results) == 50 + assert len(requests_made) == 1 From 9ef496efada55c9e8eced5183037e1a1935db140 Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Tue, 18 Nov 2025 06:55:36 +0000 Subject: [PATCH 049/148] Release version 0.51.3 --- CHANGES.rst | 9 ++++++++- github_backup/__init__.py | 2 +- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index ce23331..3c7c16f 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,9 +1,16 @@ Changelog ========= -0.51.2 (2025-11-16) +0.51.3 (2025-11-18) ------------------- ------------------------ +- Test: Add pagination tests for cursor and page-based Link headers. + [Rodos] +- Use cursor based pagination. [Helio Machado] + + +0.51.2 (2025-11-16) +------------------- Fix ~~~ diff --git a/github_backup/__init__.py b/github_backup/__init__.py index 210a2d0..378947a 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.51.2" +__version__ = "0.51.3" From d3edef06227521169bf20bbd98fc8e28788ae57a Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue, 18 Nov 2025 13:24:06 +0000 Subject: [PATCH 050/148] chore(deps): bump the python-packages group with 3 updates Bumps the python-packages group with 3 updates: [click](https://github.com/pallets/click), [pytest](https://github.com/pytest-dev/pytest) and [keyring](https://github.com/jaraco/keyring). Updates `click` from 8.3.0 to 8.3.1 - [Release notes](https://github.com/pallets/click/releases) - [Changelog](https://github.com/pallets/click/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/click/compare/8.3.0...8.3.1) Updates `pytest` from 8.3.3 to 9.0.1 - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pytest-dev/pytest/compare/8.3.3...9.0.1) Updates `keyring` from 25.6.0 to 25.7.0 - [Release notes](https://github.com/jaraco/keyring/releases) - [Changelog](https://github.com/jaraco/keyring/blob/main/NEWS.rst) - [Commits](https://github.com/jaraco/keyring/compare/v25.6.0...v25.7.0) --- updated-dependencies: - dependency-name: click dependency-version: 8.3.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages - dependency-name: pytest dependency-version: 9.0.1 dependency-type: direct:production update-type: version-update:semver-major dependency-group: python-packages - dependency-name: keyring dependency-version: 25.7.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/release-requirements.txt b/release-requirements.txt index 3a1d550..aedbf64 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -3,16 +3,16 @@ black==25.11.0 bleach==6.3.0 certifi==2025.11.12 charset-normalizer==3.4.4 -click==8.3.0 +click==8.3.1 colorama==0.4.6 docutils==0.22.3 flake8==7.3.0 gitchangelog==3.0.4 -pytest==8.3.3 +pytest==9.0.1 idna==3.11 importlib-metadata==8.7.0 jaraco.classes==3.4.0 -keyring==25.6.0 +keyring==25.7.0 markdown-it-py==4.0.0 mccabe==0.7.0 mdurl==0.1.2 From c3855a94f1bf5866f41f84b15b2e50c53f9717be Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 24 Nov 2025 04:09:25 +0000 Subject: [PATCH 051/148] chore(deps): bump actions/checkout from 5 to 6 Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v5...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] --- .github/workflows/automatic-release.yml | 2 +- .github/workflows/docker.yml | 2 +- .github/workflows/lint.yml | 2 +- .github/workflows/test.yml | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/.github/workflows/automatic-release.yml b/.github/workflows/automatic-release.yml index 2160206..60c0b41 100644 --- a/.github/workflows/automatic-release.yml +++ b/.github/workflows/automatic-release.yml @@ -18,7 +18,7 @@ jobs: runs-on: ubuntu-24.04 steps: - name: Checkout repository - uses: actions/checkout@v5 + uses: actions/checkout@v6 with: fetch-depth: 0 ssh-key: ${{ secrets.DEPLOY_PRIVATE_KEY }} diff --git a/.github/workflows/docker.yml b/.github/workflows/docker.yml index 2c7cb38..f367b99 100644 --- a/.github/workflows/docker.yml +++ b/.github/workflows/docker.yml @@ -38,7 +38,7 @@ jobs: steps: - name: Checkout repository - uses: actions/checkout@v5 + uses: actions/checkout@v6 with: persist-credentials: false diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml index 02ad174..0ca0aa2 100644 --- a/.github/workflows/lint.yml +++ b/.github/workflows/lint.yml @@ -21,7 +21,7 @@ jobs: steps: - name: Checkout repository - uses: actions/checkout@v5 + uses: actions/checkout@v6 with: fetch-depth: 0 - name: Setup Python diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index fb43350..0c8b3af 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -21,7 +21,7 @@ jobs: steps: - name: Checkout repository - uses: actions/checkout@v5 + uses: actions/checkout@v6 with: fetch-depth: 0 - name: Setup Python From 9f6b401171afa2614aa1c9ea8e8756f8e0c8c257 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 24 Nov 2025 14:58:52 +0000 Subject: [PATCH 052/148] chore(deps): bump restructuredtext-lint in the python-packages group Bumps the python-packages group with 1 update: [restructuredtext-lint](https://github.com/twolfson/restructuredtext-lint). Updates `restructuredtext-lint` from 1.4.0 to 2.0.2 - [Changelog](https://github.com/twolfson/restructuredtext-lint/blob/master/CHANGELOG.rst) - [Commits](https://github.com/twolfson/restructuredtext-lint/compare/1.4.0...2.0.2) --- updated-dependencies: - dependency-name: restructuredtext-lint dependency-version: 2.0.2 dependency-type: direct:production update-type: version-update:semver-major dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index aedbf64..76df516 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -28,7 +28,7 @@ Pygments==2.19.2 readme-renderer==44.0 requests==2.32.5 requests-toolbelt==1.0.0 -restructuredtext-lint==1.4.0 +restructuredtext-lint==2.0.2 rfc3986==2.0.0 rich==14.2.0 setuptools==80.9.0 From 7840528fe25f95b7ed4f0aacab602288f1f73c74 Mon Sep 17 00:00:00 2001 From: Rodos Date: Sat, 29 Nov 2025 09:19:23 +1100 Subject: [PATCH 053/148] Skip DMCA'd repos which return a 451 response Log a warning and the link to the DMCA notice. Continue backing up other repositories instead of crashing. Closes #163 --- github_backup/github_backup.py | 87 +++++++++++++------- tests/test_http_451.py | 143 +++++++++++++++++++++++++++++++++ 2 files changed, 201 insertions(+), 29 deletions(-) create mode 100644 tests/test_http_451.py diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 14f0ed8..dcf79e8 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -37,6 +37,15 @@ FILE_URI_PREFIX = "file://" logger = logging.getLogger(__name__) + +class RepositoryUnavailableError(Exception): + """Raised when a repository is unavailable due to legal reasons (e.g., DMCA takedown).""" + + def __init__(self, message, dmca_url=None): + super().__init__(message) + self.dmca_url = dmca_url + + # Setup SSL context with fallback chain https_ctx = ssl.create_default_context() if https_ctx.get_ca_certs(): @@ -612,6 +621,19 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False): status_code = int(r.getcode()) + # Handle DMCA takedown (HTTP 451) - raise exception to skip entire repository + if status_code == 451: + dmca_url = None + try: + response_data = json.loads(r.read().decode("utf-8")) + dmca_url = response_data.get("block", {}).get("html_url") + except Exception: + pass + raise RepositoryUnavailableError( + "Repository unavailable due to legal reasons (HTTP 451)", + dmca_url=dmca_url + ) + # Check if we got correct data try: response = json.loads(r.read().decode("utf-8")) @@ -1668,40 +1690,47 @@ def backup_repositories(args, output_directory, repositories): continue # don't try to back anything else for a gist; it doesn't exist - download_wiki = args.include_wiki or args.include_everything - if repository["has_wiki"] and download_wiki: - fetch_repository( - repository["name"], - repo_url.replace(".git", ".wiki.git"), - os.path.join(repo_cwd, "wiki"), - skip_existing=args.skip_existing, - bare_clone=args.bare_clone, - lfs_clone=args.lfs_clone, - no_prune=args.no_prune, - ) - if args.include_issues or args.include_everything: - backup_issues(args, repo_cwd, repository, repos_template) + try: + download_wiki = args.include_wiki or args.include_everything + if repository["has_wiki"] and download_wiki: + fetch_repository( + repository["name"], + repo_url.replace(".git", ".wiki.git"), + os.path.join(repo_cwd, "wiki"), + skip_existing=args.skip_existing, + bare_clone=args.bare_clone, + lfs_clone=args.lfs_clone, + no_prune=args.no_prune, + ) + if args.include_issues or args.include_everything: + backup_issues(args, repo_cwd, repository, repos_template) - if args.include_pulls or args.include_everything: - backup_pulls(args, repo_cwd, repository, repos_template) + if args.include_pulls or args.include_everything: + backup_pulls(args, repo_cwd, repository, repos_template) - if args.include_milestones or args.include_everything: - backup_milestones(args, repo_cwd, repository, repos_template) + if args.include_milestones or args.include_everything: + backup_milestones(args, repo_cwd, repository, repos_template) - if args.include_labels or args.include_everything: - backup_labels(args, repo_cwd, repository, repos_template) + if args.include_labels or args.include_everything: + backup_labels(args, repo_cwd, repository, repos_template) - if args.include_hooks or args.include_everything: - backup_hooks(args, repo_cwd, repository, repos_template) + if args.include_hooks or args.include_everything: + backup_hooks(args, repo_cwd, repository, repos_template) - if args.include_releases or args.include_everything: - backup_releases( - args, - repo_cwd, - repository, - repos_template, - include_assets=args.include_assets or args.include_everything, - ) + if args.include_releases or args.include_everything: + backup_releases( + args, + repo_cwd, + repository, + repos_template, + include_assets=args.include_assets or args.include_everything, + ) + except RepositoryUnavailableError as e: + logger.warning(f"Repository {repository['full_name']} is unavailable (HTTP 451)") + if e.dmca_url: + logger.warning(f"DMCA notice: {e.dmca_url}") + logger.info(f"Skipping remaining resources for {repository['full_name']}") + continue if args.incremental: if last_update == "0000-00-00T00:00:00Z": diff --git a/tests/test_http_451.py b/tests/test_http_451.py new file mode 100644 index 0000000..7feca1d --- /dev/null +++ b/tests/test_http_451.py @@ -0,0 +1,143 @@ +"""Tests for HTTP 451 (DMCA takedown) handling.""" + +import json +from unittest.mock import Mock, patch + +import pytest + +from github_backup import github_backup + + +class TestHTTP451Exception: + """Test suite for HTTP 451 DMCA takedown exception handling.""" + + def test_repository_unavailable_error_raised(self): + """HTTP 451 should raise RepositoryUnavailableError with DMCA URL.""" + # Create mock args + args = Mock() + args.as_app = False + args.token_fine = None + args.token_classic = None + args.username = None + args.password = None + args.osx_keychain_item_name = None + args.osx_keychain_item_account = None + args.throttle_limit = None + args.throttle_pause = 0 + + # Mock HTTPError 451 response + mock_response = Mock() + mock_response.getcode.return_value = 451 + + dmca_data = { + "message": "Repository access blocked", + "block": { + "reason": "dmca", + "created_at": "2024-11-12T14:38:04Z", + "html_url": "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md" + } + } + mock_response.read.return_value = json.dumps(dmca_data).encode("utf-8") + mock_response.headers = {"x-ratelimit-remaining": "5000"} + mock_response.reason = "Unavailable For Legal Reasons" + + def mock_get_response(request, auth, template): + return mock_response, [] + + with patch("github_backup.github_backup._get_response", side_effect=mock_get_response): + with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info: + list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues")) + + # Check exception has DMCA URL + assert exc_info.value.dmca_url == "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md" + assert "451" in str(exc_info.value) + + def test_repository_unavailable_error_without_dmca_url(self): + """HTTP 451 without DMCA details should still raise exception.""" + args = Mock() + args.as_app = False + args.token_fine = None + args.token_classic = None + args.username = None + args.password = None + args.osx_keychain_item_name = None + args.osx_keychain_item_account = None + args.throttle_limit = None + args.throttle_pause = 0 + + mock_response = Mock() + mock_response.getcode.return_value = 451 + mock_response.read.return_value = b'{"message": "Blocked"}' + mock_response.headers = {"x-ratelimit-remaining": "5000"} + mock_response.reason = "Unavailable For Legal Reasons" + + def mock_get_response(request, auth, template): + return mock_response, [] + + with patch("github_backup.github_backup._get_response", side_effect=mock_get_response): + with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info: + list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues")) + + # Exception raised even without DMCA URL + assert exc_info.value.dmca_url is None + assert "451" in str(exc_info.value) + + def test_repository_unavailable_error_with_malformed_json(self): + """HTTP 451 with malformed JSON should still raise exception.""" + args = Mock() + args.as_app = False + args.token_fine = None + args.token_classic = None + args.username = None + args.password = None + args.osx_keychain_item_name = None + args.osx_keychain_item_account = None + args.throttle_limit = None + args.throttle_pause = 0 + + mock_response = Mock() + mock_response.getcode.return_value = 451 + mock_response.read.return_value = b"invalid json {" + mock_response.headers = {"x-ratelimit-remaining": "5000"} + mock_response.reason = "Unavailable For Legal Reasons" + + def mock_get_response(request, auth, template): + return mock_response, [] + + with patch("github_backup.github_backup._get_response", side_effect=mock_get_response): + with pytest.raises(github_backup.RepositoryUnavailableError): + list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues")) + + def test_other_http_errors_unchanged(self): + """Other HTTP errors should still raise generic Exception.""" + args = Mock() + args.as_app = False + args.token_fine = None + args.token_classic = None + args.username = None + args.password = None + args.osx_keychain_item_name = None + args.osx_keychain_item_account = None + args.throttle_limit = None + args.throttle_pause = 0 + + mock_response = Mock() + mock_response.getcode.return_value = 404 + mock_response.read.return_value = b'{"message": "Not Found"}' + mock_response.headers = {"x-ratelimit-remaining": "5000"} + mock_response.reason = "Not Found" + + def mock_get_response(request, auth, template): + return mock_response, [] + + with patch("github_backup.github_backup._get_response", side_effect=mock_get_response): + # Should raise generic Exception, not RepositoryUnavailableError + with pytest.raises(Exception) as exc_info: + list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/notfound/issues")) + + assert not isinstance(exc_info.value, github_backup.RepositoryUnavailableError) + assert "404" in str(exc_info.value) + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) From 8b7512c8d845ab3e845b807cdf9baa6357571af4 Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Fri, 28 Nov 2025 23:39:09 +0000 Subject: [PATCH 054/148] Release version 0.52.0 --- CHANGES.rst | 83 ++++++++++++++++++++++++++++++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 83 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 3c7c16f..396dfe8 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,9 +1,90 @@ Changelog ========= -0.51.3 (2025-11-18) +0.52.0 (2025-11-28) ------------------- ------------------------ +- Skip DMCA'd repos which return a 451 response. [Rodos] + + Log a warning and the link to the DMCA notice. Continue backing up + other repositories instead of crashing. + + Closes #163 +- Chore(deps): bump restructuredtext-lint in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [restructuredtext-lint](https://github.com/twolfson/restructuredtext-lint). + + + Updates `restructuredtext-lint` from 1.4.0 to 2.0.2 + - [Changelog](https://github.com/twolfson/restructuredtext-lint/blob/master/CHANGELOG.rst) + - [Commits](https://github.com/twolfson/restructuredtext-lint/compare/1.4.0...2.0.2) + + --- + updated-dependencies: + - dependency-name: restructuredtext-lint + dependency-version: 2.0.2 + dependency-type: direct:production + update-type: version-update:semver-major + dependency-group: python-packages + ... +- Chore(deps): bump actions/checkout from 5 to 6. [dependabot[bot]] + + Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6. + - [Release notes](https://github.com/actions/checkout/releases) + - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) + - [Commits](https://github.com/actions/checkout/compare/v5...v6) + + --- + updated-dependencies: + - dependency-name: actions/checkout + dependency-version: '6' + dependency-type: direct:production + update-type: version-update:semver-major + ... +- Chore(deps): bump the python-packages group with 3 updates. + [dependabot[bot]] + + Bumps the python-packages group with 3 updates: [click](https://github.com/pallets/click), [pytest](https://github.com/pytest-dev/pytest) and [keyring](https://github.com/jaraco/keyring). + + + Updates `click` from 8.3.0 to 8.3.1 + - [Release notes](https://github.com/pallets/click/releases) + - [Changelog](https://github.com/pallets/click/blob/main/CHANGES.rst) + - [Commits](https://github.com/pallets/click/compare/8.3.0...8.3.1) + + Updates `pytest` from 8.3.3 to 9.0.1 + - [Release notes](https://github.com/pytest-dev/pytest/releases) + - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) + - [Commits](https://github.com/pytest-dev/pytest/compare/8.3.3...9.0.1) + + Updates `keyring` from 25.6.0 to 25.7.0 + - [Release notes](https://github.com/jaraco/keyring/releases) + - [Changelog](https://github.com/jaraco/keyring/blob/main/NEWS.rst) + - [Commits](https://github.com/jaraco/keyring/compare/v25.6.0...v25.7.0) + + --- + updated-dependencies: + - dependency-name: click + dependency-version: 8.3.1 + dependency-type: direct:production + update-type: version-update:semver-patch + dependency-group: python-packages + - dependency-name: pytest + dependency-version: 9.0.1 + dependency-type: direct:production + update-type: version-update:semver-major + dependency-group: python-packages + - dependency-name: keyring + dependency-version: 25.7.0 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + ... + + +0.51.3 (2025-11-18) +------------------- - Test: Add pagination tests for cursor and page-based Link headers. [Rodos] - Use cursor based pagination. [Helio Machado] diff --git a/github_backup/__init__.py b/github_backup/__init__.py index 378947a..aa21288 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.51.3" +__version__ = "0.52.0" From 5739ac074551171b22e74bb32705b6a10ca5ce39 Mon Sep 17 00:00:00 2001 From: Rodos Date: Sat, 29 Nov 2025 16:50:53 +1100 Subject: [PATCH 055/148] Avoid rewriting unchanged JSON files for labels, milestones, releases, hooks, followers, and following This change reduces unnecessary writes when backing up metadata that changes infrequently. The implementation compares existing file content before writing and skips the write if the content is identical, preserving file timestamps. Key changes: - Added json_dump_if_changed() helper that compares content before writing - Uses atomic writes (temp file + rename) for all metadata files - NOT applied to issues/pulls (they use incremental_by_files logic) - Made log messages consistent and past tense ("Saved" instead of "Saving") - Added informative logging showing skip counts Fixes #133 --- github_backup/github_backup.py | 96 ++++++++++++-- tests/test_json_dump_if_changed.py | 198 +++++++++++++++++++++++++++++ 2 files changed, 283 insertions(+), 11 deletions(-) create mode 100644 tests/test_json_dump_if_changed.py diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index dcf79e8..9d39a64 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -1898,11 +1898,21 @@ def backup_milestones(args, repo_cwd, repository, repos_template): for milestone in _milestones: milestones[milestone["number"]] = milestone - logger.info("Saving {0} milestones to disk".format(len(list(milestones.keys())))) + written_count = 0 for number, milestone in list(milestones.items()): milestone_file = "{0}/{1}.json".format(milestone_cwd, number) - with codecs.open(milestone_file, "w", encoding="utf-8") as f: - json_dump(milestone, f) + if json_dump_if_changed(milestone, milestone_file): + written_count += 1 + + total = len(milestones) + if written_count == total: + logger.info("Saved {0} milestones to disk".format(total)) + elif written_count == 0: + logger.info("{0} milestones unchanged, skipped write".format(total)) + else: + logger.info("Saved {0} of {1} milestones to disk ({2} unchanged)".format( + written_count, total, total - written_count + )) def backup_labels(args, repo_cwd, repository, repos_template): @@ -1955,19 +1965,17 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F reverse=True, ) releases = releases[: args.number_of_latest_releases] - logger.info("Saving the latest {0} releases to disk".format(len(releases))) - else: - logger.info("Saving {0} releases to disk".format(len(releases))) # for each release, store it + written_count = 0 for release in releases: release_name = release["tag_name"] release_name_safe = release_name.replace("/", "__") output_filepath = os.path.join( release_cwd, "{0}.json".format(release_name_safe) ) - with codecs.open(output_filepath, "w+", encoding="utf-8") as f: - json_dump(release, f) + if json_dump_if_changed(release, output_filepath): + written_count += 1 if include_assets: assets = retrieve_data(args, release["assets_url"]) @@ -1984,6 +1992,17 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F fine=True if args.token_fine is not None else False, ) + # Log the results + total = len(releases) + if written_count == total: + logger.info("Saved {0} releases to disk".format(total)) + elif written_count == 0: + logger.info("{0} releases unchanged, skipped write".format(total)) + else: + logger.info("Saved {0} of {1} releases to disk ({2} unchanged)".format( + written_count, total, total - written_count + )) + def fetch_repository( name, @@ -2108,9 +2127,10 @@ def _backup_data(args, name, template, output_file, output_directory): mkdir_p(output_directory) data = retrieve_data(args, template) - logger.info("Writing {0} {1} to disk".format(len(data), name)) - with codecs.open(output_file, "w", encoding="utf-8") as f: - json_dump(data, f) + if json_dump_if_changed(data, output_file): + logger.info("Saved {0} {1} to disk".format(len(data), name)) + else: + logger.info("{0} {1} unchanged, skipped write".format(len(data), name)) def json_dump(data, output_file): @@ -2122,3 +2142,57 @@ def json_dump(data, output_file): indent=4, separators=(",", ": "), ) + + +def json_dump_if_changed(data, output_file_path): + """ + Write JSON data to file only if content has changed. + + Compares the serialized JSON data with the existing file content + and only writes if different. This prevents unnecessary file + modification timestamp updates and disk writes. + + Uses atomic writes (temp file + rename) to prevent corruption + if the process is interrupted during the write. + + Args: + data: The data to serialize as JSON + output_file_path: The path to the output file + + Returns: + True if file was written (content changed or new file) + False if write was skipped (content unchanged) + """ + # Serialize new data with consistent formatting matching json_dump() + new_content = json.dumps( + data, + ensure_ascii=False, + sort_keys=True, + indent=4, + separators=(",", ": "), + ) + + # Check if file exists and compare content + if os.path.exists(output_file_path): + try: + with codecs.open(output_file_path, "r", encoding="utf-8") as f: + existing_content = f.read() + if existing_content == new_content: + logger.debug( + "Content unchanged, skipping write: {0}".format(output_file_path) + ) + return False + except (OSError, UnicodeDecodeError) as e: + # If we can't read the existing file, write the new one + logger.debug( + "Error reading existing file {0}, will overwrite: {1}".format( + output_file_path, e + ) + ) + + # Write the file atomically using temp file + rename + temp_file = output_file_path + ".temp" + with codecs.open(temp_file, "w", encoding="utf-8") as f: + f.write(new_content) + os.rename(temp_file, output_file_path) # Atomic on POSIX systems + return True diff --git a/tests/test_json_dump_if_changed.py b/tests/test_json_dump_if_changed.py new file mode 100644 index 0000000..426baee --- /dev/null +++ b/tests/test_json_dump_if_changed.py @@ -0,0 +1,198 @@ +"""Tests for json_dump_if_changed functionality.""" + +import codecs +import json +import os +import tempfile + +import pytest + +from github_backup import github_backup + + +class TestJsonDumpIfChanged: + """Test suite for json_dump_if_changed function.""" + + def test_writes_new_file(self): + """Should write file when it doesn't exist.""" + with tempfile.TemporaryDirectory() as tmpdir: + output_file = os.path.join(tmpdir, "test.json") + test_data = {"key": "value", "number": 42} + + result = github_backup.json_dump_if_changed(test_data, output_file) + + assert result is True + assert os.path.exists(output_file) + + # Verify content matches expected format + with codecs.open(output_file, "r", encoding="utf-8") as f: + content = f.read() + loaded = json.loads(content) + assert loaded == test_data + + def test_skips_unchanged_file(self): + """Should skip write when content is identical.""" + with tempfile.TemporaryDirectory() as tmpdir: + output_file = os.path.join(tmpdir, "test.json") + test_data = {"key": "value", "number": 42} + + # First write + result1 = github_backup.json_dump_if_changed(test_data, output_file) + assert result1 is True + + # Get the initial mtime + mtime1 = os.path.getmtime(output_file) + + # Second write with same data + result2 = github_backup.json_dump_if_changed(test_data, output_file) + assert result2 is False + + # File should not have been modified + mtime2 = os.path.getmtime(output_file) + assert mtime1 == mtime2 + + def test_writes_when_content_changed(self): + """Should write file when content has changed.""" + with tempfile.TemporaryDirectory() as tmpdir: + output_file = os.path.join(tmpdir, "test.json") + test_data1 = {"key": "value1"} + test_data2 = {"key": "value2"} + + # First write + result1 = github_backup.json_dump_if_changed(test_data1, output_file) + assert result1 is True + + # Second write with different data + result2 = github_backup.json_dump_if_changed(test_data2, output_file) + assert result2 is True + + # Verify new content + with codecs.open(output_file, "r", encoding="utf-8") as f: + loaded = json.load(f) + assert loaded == test_data2 + + def test_uses_consistent_formatting(self): + """Should use same JSON formatting as json_dump.""" + with tempfile.TemporaryDirectory() as tmpdir: + output_file = os.path.join(tmpdir, "test.json") + test_data = {"z": "last", "a": "first", "m": "middle"} + + github_backup.json_dump_if_changed(test_data, output_file) + + with codecs.open(output_file, "r", encoding="utf-8") as f: + content = f.read() + + # Check for consistent formatting: + # - sorted keys + # - 4-space indent + # - comma-colon-space separator + expected = json.dumps( + test_data, + ensure_ascii=False, + sort_keys=True, + indent=4, + separators=(",", ": "), + ) + assert content == expected + + def test_atomic_write_always_used(self): + """Should always use temp file and rename for atomic writes.""" + with tempfile.TemporaryDirectory() as tmpdir: + output_file = os.path.join(tmpdir, "test.json") + test_data = {"key": "value"} + + result = github_backup.json_dump_if_changed(test_data, output_file) + + assert result is True + assert os.path.exists(output_file) + + # Temp file should not exist after atomic write + temp_file = output_file + ".temp" + assert not os.path.exists(temp_file) + + # Verify content + with codecs.open(output_file, "r", encoding="utf-8") as f: + loaded = json.load(f) + assert loaded == test_data + + def test_handles_unicode_content(self): + """Should correctly handle Unicode content.""" + with tempfile.TemporaryDirectory() as tmpdir: + output_file = os.path.join(tmpdir, "test.json") + test_data = { + "emoji": "🚀", + "chinese": "你好", + "arabic": "مرحبا", + "cyrillic": "Привет", + } + + result = github_backup.json_dump_if_changed(test_data, output_file) + assert result is True + + # Verify Unicode is preserved + with codecs.open(output_file, "r", encoding="utf-8") as f: + loaded = json.load(f) + assert loaded == test_data + + # Second write should skip + result2 = github_backup.json_dump_if_changed(test_data, output_file) + assert result2 is False + + def test_handles_complex_nested_data(self): + """Should handle complex nested data structures.""" + with tempfile.TemporaryDirectory() as tmpdir: + output_file = os.path.join(tmpdir, "test.json") + test_data = { + "users": [ + {"id": 1, "name": "Alice", "tags": ["admin", "user"]}, + {"id": 2, "name": "Bob", "tags": ["user"]}, + ], + "metadata": {"version": "1.0", "nested": {"deep": {"value": 42}}}, + } + + result = github_backup.json_dump_if_changed(test_data, output_file) + assert result is True + + # Verify structure is preserved + with codecs.open(output_file, "r", encoding="utf-8") as f: + loaded = json.load(f) + assert loaded == test_data + + def test_overwrites_on_unicode_decode_error(self): + """Should overwrite if existing file has invalid UTF-8.""" + with tempfile.TemporaryDirectory() as tmpdir: + output_file = os.path.join(tmpdir, "test.json") + test_data = {"key": "value"} + + # Write invalid UTF-8 bytes + with open(output_file, "wb") as f: + f.write(b"\xff\xfe invalid utf-8") + + # Should catch UnicodeDecodeError and overwrite + result = github_backup.json_dump_if_changed(test_data, output_file) + assert result is True + + # Verify new content was written + with codecs.open(output_file, "r", encoding="utf-8") as f: + loaded = json.load(f) + assert loaded == test_data + + def test_key_order_independence(self): + """Should treat differently-ordered dicts as same if keys/values match.""" + with tempfile.TemporaryDirectory() as tmpdir: + output_file = os.path.join(tmpdir, "test.json") + + # Write first dict + data1 = {"z": 1, "a": 2, "m": 3} + github_backup.json_dump_if_changed(data1, output_file) + + # Try to write same data but different order + data2 = {"a": 2, "m": 3, "z": 1} + result = github_backup.json_dump_if_changed(data2, output_file) + + # Should skip because content is the same (keys are sorted) + assert result is False + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) From 6ad1959d437afc8349f605f5f5d816ebdf0ab8e2 Mon Sep 17 00:00:00 2001 From: Rodos Date: Sat, 29 Nov 2025 21:16:22 +1100 Subject: [PATCH 056/148] fix: case-sensitive username filtering causing silent backup failures GitHub's API accepts usernames in any case but returns canonical case. The case-sensitive comparison in filter_repositories() filtered out all repositories when user-provided case didn't match GitHub's canonical case. Changed to case-insensitive comparison. Fixes #198 --- github_backup/github_backup.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index dcf79e8..a54e299 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -1587,7 +1587,9 @@ def filter_repositories(args, unfiltered_repositories): repositories = [] for r in unfiltered_repositories: # gists can be anonymous, so need to safely check owner - if r.get("owner", {}).get("login") == args.user or r.get("is_starred"): + # Use case-insensitive comparison to match GitHub's case-insensitive username behavior + owner_login = r.get("owner", {}).get("login", "") + if owner_login.lower() == args.user.lower() or r.get("is_starred"): repositories.append(r) name_regex = None From ff2681e1960f0176f176bb22b0c4682d74d89b6f Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Sun, 30 Nov 2025 04:30:48 +0000 Subject: [PATCH 057/148] Release version 0.53.0 --- CHANGES.rst | 37 ++++++++++++++++++++++++++++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 37 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 396dfe8..b84d655 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,9 +1,44 @@ Changelog ========= -0.52.0 (2025-11-28) +0.53.0 (2025-11-30) ------------------- ------------------------ + +Fix +~~~ +- Case-sensitive username filtering causing silent backup failures. + [Rodos] + + GitHub's API accepts usernames in any case but returns canonical case. + The case-sensitive comparison in filter_repositories() filtered out all + repositories when user-provided case didn't match GitHub's canonical case. + + Changed to case-insensitive comparison. + + Fixes #198 + +Other +~~~~~ +- Avoid rewriting unchanged JSON files for labels, milestones, releases, + hooks, followers, and following. [Rodos] + + This change reduces unnecessary writes when backing up metadata that changes + infrequently. The implementation compares existing file content before writing + and skips the write if the content is identical, preserving file timestamps. + + Key changes: + - Added json_dump_if_changed() helper that compares content before writing + - Uses atomic writes (temp file + rename) for all metadata files + - NOT applied to issues/pulls (they use incremental_by_files logic) + - Made log messages consistent and past tense ("Saved" instead of "Saving") + - Added informative logging showing skip counts + + Fixes #133 + + +0.52.0 (2025-11-28) +------------------- - Skip DMCA'd repos which return a 451 response. [Rodos] Log a warning and the link to the DMCA notice. Continue backing up diff --git a/github_backup/__init__.py b/github_backup/__init__.py index aa21288..3c5da5f 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.52.0" +__version__ = "0.53.0" From bf28b46954395a1e5e27c766743735dee6c73033 Mon Sep 17 00:00:00 2001 From: Rodos Date: Mon, 1 Dec 2025 15:53:26 +1100 Subject: [PATCH 058/148] docs: update README testing section and add fetch vs pull explanation --- README.rst | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 69d5524..9836107 100644 --- a/README.rst +++ b/README.rst @@ -308,6 +308,25 @@ Skip existing on incomplete backups The ``--skip-existing`` argument will skip a backup if the directory already exists, even if the backup in that directory failed (perhaps due to a blocking error). This may result in unexpected missing data in a regular backup. +Updates use fetch, not pull +--------------------------- + +When updating an existing repository backup, ``github-backup`` uses ``git fetch`` rather than ``git pull``. This is intentional - a backup tool should reliably download data without risk of failure. Using ``git pull`` would require handling merge conflicts, which adds complexity and could cause backups to fail unexpectedly. + +With fetch, **all branches and commits are downloaded** safely into remote-tracking branches. The working directory files won't change, but your backup is complete. + +If you look at files directly (e.g., ``cat README.md``), you'll see the old content. The new data is in the remote-tracking branches (confusingly named "remote" but stored locally). To view or use the latest files:: + + git show origin/main:README.md # view a file + git merge origin/main # update working directory + +All branches are backed up as remote refs (``origin/main``, ``origin/feature-branch``, etc.). + +If you want to browse files directly without merging, consider using ``--bare`` which skips the working directory entirely - the backup is just the git data. + +See `#269 `_ for more discussion. + + Github Backup Examples ====================== @@ -357,7 +376,12 @@ A huge thanks to all the contibuters! Testing ------- -This project currently contains no unit tests. To run linting:: +To run the test suite:: + + pip install pytest + pytest + +To run linting:: pip install flake8 flake8 --ignore=E501 From 12802103c470402c0ceccbbb1d8b767bd4ffcc82 Mon Sep 17 00:00:00 2001 From: Rodos Date: Mon, 1 Dec 2025 16:11:11 +1100 Subject: [PATCH 059/148] fix: send INFO/DEBUG to stdout, WARNING/ERROR to stderr Fixes #182 --- bin/github-backup | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/bin/github-backup b/bin/github-backup index b33d19f..d685bc9 100755 --- a/bin/github-backup +++ b/bin/github-backup @@ -16,12 +16,23 @@ from github_backup.github_backup import ( retrieve_repositories, ) -logging.basicConfig( - format="%(asctime)s.%(msecs)03d: %(message)s", +# INFO and DEBUG go to stdout, WARNING and above go to stderr +log_format = logging.Formatter( + fmt="%(asctime)s.%(msecs)03d: %(message)s", datefmt="%Y-%m-%dT%H:%M:%S", - level=logging.INFO, ) +stdout_handler = logging.StreamHandler(sys.stdout) +stdout_handler.setLevel(logging.DEBUG) +stdout_handler.addFilter(lambda r: r.levelno < logging.WARNING) +stdout_handler.setFormatter(log_format) + +stderr_handler = logging.StreamHandler(sys.stderr) +stderr_handler.setLevel(logging.WARNING) +stderr_handler.setFormatter(log_format) + +logging.basicConfig(level=logging.INFO, handlers=[stdout_handler, stderr_handler]) + def main(): args = parse_args() From 2a9d86a6bf2f1de3989e6a411b5a7dc326546e79 Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Wed, 3 Dec 2025 02:17:59 +0000 Subject: [PATCH 060/148] Release version 0.54.0 --- CHANGES.rst | 17 ++++++++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index b84d655..1b02e0d 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,10 +1,25 @@ Changelog ========= -0.53.0 (2025-11-30) +0.54.0 (2025-12-03) ------------------- ------------------------ +Fix +~~~ +- Send INFO/DEBUG to stdout, WARNING/ERROR to stderr. [Rodos] + + Fixes #182 + +Other +~~~~~ +- Docs: update README testing section and add fetch vs pull explanation. + [Rodos] + + +0.53.0 (2025-11-30) +------------------- + Fix ~~~ - Case-sensitive username filtering causing silent backup failures. diff --git a/github_backup/__init__.py b/github_backup/__init__.py index 3c5da5f..450ee12 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.53.0" +__version__ = "0.54.0" From 899ab5fdc286bd4064b78411e15a8cf44be4568c Mon Sep 17 00:00:00 2001 From: Rodos Date: Thu, 4 Dec 2025 10:07:43 +1100 Subject: [PATCH 061/148] fix: warn and skip when --starred-gists used for different user GitHub's API only allows retrieving starred gists for the authenticated user. Previously, using --starred-gists when backing up a different user would silently return no relevant data. Now warns and skips the retrieval entirely when the target user differs from the authenticated user. Uses case-insensitive comparison to match GitHub's username handling. Fixes #93 --- README.rst | 2 ++ github_backup/github_backup.py | 26 ++++++++++++++++---------- 2 files changed, 18 insertions(+), 10 deletions(-) diff --git a/README.rst b/README.rst index 9836107..a33db61 100644 --- a/README.rst +++ b/README.rst @@ -301,6 +301,8 @@ Starred gists vs starred repo behaviour The starred normal repo cloning (``--all-starred``) argument stores starred repos separately to the users own repositories. However, using ``--starred-gists`` will store starred gists within the same directory as the users own gists ``--gists``. Also, all gist repo directory names are IDs not the gist's name. +Note: ``--starred-gists`` only retrieves starred gists for the authenticated user, not the target user, due to a GitHub API limitation. + Skip existing on incomplete backups ----------------------------------- diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 0ad55d1..cdb536d 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -1565,16 +1565,22 @@ def retrieve_repositories(args, authenticated_user): repos.extend(gists) if args.include_starred_gists: - starred_gists_template = "https://{0}/gists/starred".format( - get_github_api_host(args) - ) - starred_gists = retrieve_data( - args, starred_gists_template, single_request=False - ) - # flag each repo as a starred gist for downstream processing - for item in starred_gists: - item.update({"is_gist": True, "is_starred": True}) - repos.extend(starred_gists) + if not authenticated_user.get("login") or args.user.lower() != authenticated_user["login"].lower(): + logger.warning( + "Cannot retrieve starred gists for '%s'. GitHub only allows access to the authenticated user's starred gists.", + args.user, + ) + else: + starred_gists_template = "https://{0}/gists/starred".format( + get_github_api_host(args) + ) + starred_gists = retrieve_data( + args, starred_gists_template, single_request=False + ) + # flag each repo as a starred gist for downstream processing + for item in starred_gists: + item.update({"is_gist": True, "is_starred": True}) + repos.extend(starred_gists) return repos From fdfaaec1ba072b0a98d1981b55de5ccb213e9625 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Sat, 6 Dec 2025 04:51:42 +0000 Subject: [PATCH 062/148] chore(deps): bump urllib3 from 2.5.0 to 2.6.0 Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.0. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](https://github.com/urllib3/urllib3/compare/2.5.0...2.6.0) --- updated-dependencies: - dependency-name: urllib3 dependency-version: 2.6.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index 76df516..b1323a0 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -35,6 +35,6 @@ setuptools==80.9.0 six==1.17.0 tqdm==4.67.1 twine==6.2.0 -urllib3==2.5.0 +urllib3==2.6.0 webencodings==0.5.1 zipp==3.23.0 From aba048a3e983074b2a0fba0d3e304c00cd090d79 Mon Sep 17 00:00:00 2001 From: Rodos Date: Sun, 7 Dec 2025 21:20:54 +1100 Subject: [PATCH 063/148] fix: warn when --private used without authentication --- bin/github-backup | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/bin/github-backup b/bin/github-backup index d685bc9..dcac622 100755 --- a/bin/github-backup +++ b/bin/github-backup @@ -9,6 +9,7 @@ from github_backup.github_backup import ( backup_repositories, check_git_lfs_install, filter_repositories, + get_auth, get_authenticated_user, logger, mkdir_p, @@ -37,6 +38,12 @@ logging.basicConfig(level=logging.INFO, handlers=[stdout_handler, stderr_handler def main(): args = parse_args() + if args.private and not get_auth(args): + logger.warning( + "The --private flag has no effect without authentication. " + "Use -t/--token, -f/--token-fine, or -u/--username to authenticate." + ) + if args.quiet: logger.setLevel(logging.WARNING) From 6e2a7e521ca1e9b8aae58bbe4eaebbb107d828bb Mon Sep 17 00:00:00 2001 From: Rodos Date: Sun, 7 Dec 2025 21:21:14 +1100 Subject: [PATCH 064/148] fix: --all-starred now clones repos without --repositories --- github_backup/github_backup.py | 14 ++- tests/test_all_starred.py | 161 +++++++++++++++++++++++++++++++++ 2 files changed, 167 insertions(+), 8 deletions(-) create mode 100644 tests/test_all_starred.py diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index cdb536d..bbacdae 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -561,7 +561,7 @@ def get_github_host(args): def read_file_contents(file_uri): - return open(file_uri[len(FILE_URI_PREFIX) :], "rt").readline().strip() + return open(file_uri[len(FILE_URI_PREFIX):], "rt").readline().strip() def get_github_repo_url(args, repository): @@ -1672,9 +1672,10 @@ def backup_repositories(args, output_directory, repositories): repo_url = get_github_repo_url(args, repository) include_gists = args.include_gists or args.include_starred_gists + include_starred = args.all_starred and repository.get("is_starred") if (args.include_repository or args.include_everything) or ( include_gists and repository.get("is_gist") - ): + ) or include_starred: repo_name = ( repository.get("name") if not repository.get("is_gist") @@ -2023,12 +2024,9 @@ def fetch_repository( ): if bare_clone: if os.path.exists(local_dir): - clone_exists = ( - subprocess.check_output( - ["git", "rev-parse", "--is-bare-repository"], cwd=local_dir - ) - == b"true\n" - ) + clone_exists = subprocess.check_output( + ["git", "rev-parse", "--is-bare-repository"], cwd=local_dir + ) == b"true\n" else: clone_exists = False else: diff --git a/tests/test_all_starred.py b/tests/test_all_starred.py new file mode 100644 index 0000000..f59a67e --- /dev/null +++ b/tests/test_all_starred.py @@ -0,0 +1,161 @@ +"""Tests for --all-starred flag behavior (issue #225).""" + +import pytest +from unittest.mock import Mock, patch + +from github_backup import github_backup + + +class TestAllStarredCloning: + """Test suite for --all-starred repository cloning behavior. + + Issue #225: --all-starred should clone starred repos without requiring --repositories. + """ + + def _create_mock_args(self, **overrides): + """Create a mock args object with sensible defaults.""" + args = Mock() + args.user = "testuser" + args.output_directory = "/tmp/backup" + args.include_repository = False + args.include_everything = False + args.include_gists = False + args.include_starred_gists = False + args.all_starred = False + args.skip_existing = False + args.bare_clone = False + args.lfs_clone = False + args.no_prune = False + args.include_wiki = False + args.include_issues = False + args.include_issue_comments = False + args.include_issue_events = False + args.include_pulls = False + args.include_pull_comments = False + args.include_pull_commits = False + args.include_pull_details = False + args.include_labels = False + args.include_hooks = False + args.include_milestones = False + args.include_releases = False + args.include_assets = False + args.include_attachments = False + args.incremental = False + args.incremental_by_files = False + args.github_host = None + args.prefer_ssh = False + args.token_classic = None + args.token_fine = None + args.username = None + args.password = None + args.as_app = False + args.osx_keychain_item_name = None + args.osx_keychain_item_account = None + + for key, value in overrides.items(): + setattr(args, key, value) + + return args + + @patch('github_backup.github_backup.fetch_repository') + @patch('github_backup.github_backup.get_github_repo_url') + def test_all_starred_clones_without_repositories_flag(self, mock_get_url, mock_fetch): + """--all-starred should clone starred repos without --repositories flag. + + This is the core fix for issue #225. + """ + args = self._create_mock_args(all_starred=True) + mock_get_url.return_value = "https://github.com/otheruser/awesome-project.git" + + # A starred repository (is_starred flag set by retrieve_repositories) + starred_repo = { + "name": "awesome-project", + "full_name": "otheruser/awesome-project", + "owner": {"login": "otheruser"}, + "private": False, + "fork": False, + "has_wiki": False, + "is_starred": True, # This flag is set for starred repos + } + + with patch('github_backup.github_backup.mkdir_p'): + github_backup.backup_repositories(args, "/tmp/backup", [starred_repo]) + + # fetch_repository should be called for the starred repo + assert mock_fetch.called, "--all-starred should trigger repository cloning" + mock_fetch.assert_called_once() + call_args = mock_fetch.call_args + assert call_args[0][0] == "awesome-project" # repo name + + @patch('github_backup.github_backup.fetch_repository') + @patch('github_backup.github_backup.get_github_repo_url') + def test_starred_repo_not_cloned_without_all_starred_flag(self, mock_get_url, mock_fetch): + """Starred repos should NOT be cloned if --all-starred is not set.""" + args = self._create_mock_args(all_starred=False) + mock_get_url.return_value = "https://github.com/otheruser/awesome-project.git" + + starred_repo = { + "name": "awesome-project", + "full_name": "otheruser/awesome-project", + "owner": {"login": "otheruser"}, + "private": False, + "fork": False, + "has_wiki": False, + "is_starred": True, + } + + with patch('github_backup.github_backup.mkdir_p'): + github_backup.backup_repositories(args, "/tmp/backup", [starred_repo]) + + # fetch_repository should NOT be called + assert not mock_fetch.called, "Starred repos should not be cloned without --all-starred" + + @patch('github_backup.github_backup.fetch_repository') + @patch('github_backup.github_backup.get_github_repo_url') + def test_non_starred_repo_not_cloned_with_only_all_starred(self, mock_get_url, mock_fetch): + """Non-starred repos should NOT be cloned when only --all-starred is set.""" + args = self._create_mock_args(all_starred=True) + mock_get_url.return_value = "https://github.com/testuser/my-project.git" + + # A regular (non-starred) repository + regular_repo = { + "name": "my-project", + "full_name": "testuser/my-project", + "owner": {"login": "testuser"}, + "private": False, + "fork": False, + "has_wiki": False, + # No is_starred flag + } + + with patch('github_backup.github_backup.mkdir_p'): + github_backup.backup_repositories(args, "/tmp/backup", [regular_repo]) + + # fetch_repository should NOT be called for non-starred repos + assert not mock_fetch.called, "Non-starred repos should not be cloned with only --all-starred" + + @patch('github_backup.github_backup.fetch_repository') + @patch('github_backup.github_backup.get_github_repo_url') + def test_repositories_flag_still_works(self, mock_get_url, mock_fetch): + """--repositories flag should still clone repos as before.""" + args = self._create_mock_args(include_repository=True) + mock_get_url.return_value = "https://github.com/testuser/my-project.git" + + regular_repo = { + "name": "my-project", + "full_name": "testuser/my-project", + "owner": {"login": "testuser"}, + "private": False, + "fork": False, + "has_wiki": False, + } + + with patch('github_backup.github_backup.mkdir_p'): + github_backup.backup_repositories(args, "/tmp/backup", [regular_repo]) + + # fetch_repository should be called + assert mock_fetch.called, "--repositories should trigger repository cloning" + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) From 58ad1c2378691802dbdf9e23d2137ea73bcc4690 Mon Sep 17 00:00:00 2001 From: Rodos Date: Sun, 7 Dec 2025 21:21:26 +1100 Subject: [PATCH 065/148] docs: fix RST formatting in Known blocking errors section --- README.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.rst b/README.rst index a33db61..9fd35fd 100644 --- a/README.rst +++ b/README.rst @@ -281,11 +281,11 @@ If the incremental argument is used, this will result in the next backup only re It's therefore recommended to only use the incremental argument if the output/result is being actively monitored, or complimented with periodic full non-incremental runs, to avoid unexpected missing data in a regular backup runs. -1. **Starred public repo hooks blocking** +**Starred public repo hooks blocking** - Since the ``--all`` argument includes ``--hooks``, if you use ``--all`` and ``--all-starred`` together to clone a users starred public repositories, the backup will likely error and block the backup continuing. +Since the ``--all`` argument includes ``--hooks``, if you use ``--all`` and ``--all-starred`` together to clone a users starred public repositories, the backup will likely error and block the backup continuing. - This is due to needing the correct permission for ``--hooks`` on public repos. +This is due to needing the correct permission for ``--hooks`` on public repos. "bare" is actually "mirror" From b80049e96e5d57e869203e09dc9db1e39329c68c Mon Sep 17 00:00:00 2001 From: Rodos Date: Sun, 7 Dec 2025 21:21:37 +1100 Subject: [PATCH 066/148] test: add missing test coverage for case sensitivity fix --- tests/test_case_sensitivity.py | 112 +++++++++++++++++++++++++++++++++ 1 file changed, 112 insertions(+) create mode 100644 tests/test_case_sensitivity.py diff --git a/tests/test_case_sensitivity.py b/tests/test_case_sensitivity.py new file mode 100644 index 0000000..1398d0d --- /dev/null +++ b/tests/test_case_sensitivity.py @@ -0,0 +1,112 @@ +"""Tests for case-insensitive username/organization filtering.""" + +import pytest +from unittest.mock import Mock + +from github_backup import github_backup + + +class TestCaseSensitivity: + """Test suite for case-insensitive username matching in filter_repositories.""" + + def test_filter_repositories_case_insensitive_user(self): + """Should filter repositories case-insensitively for usernames. + + Reproduces issue #198 where typing 'iamrodos' fails to match + repositories with owner.login='Iamrodos' (the canonical case from GitHub API). + """ + # Simulate user typing lowercase username + args = Mock() + args.user = "iamrodos" # lowercase (what user typed) + args.repository = None + args.name_regex = None + args.languages = None + args.exclude = None + args.fork = False + args.private = False + args.public = False + args.all = True + + # Simulate GitHub API returning canonical case + repos = [ + { + "name": "repo1", + "owner": {"login": "Iamrodos"}, # Capital I (canonical from API) + "private": False, + "fork": False, + }, + { + "name": "repo2", + "owner": {"login": "Iamrodos"}, + "private": False, + "fork": False, + }, + ] + + filtered = github_backup.filter_repositories(args, repos) + + # Should match despite case difference + assert len(filtered) == 2 + assert filtered[0]["name"] == "repo1" + assert filtered[1]["name"] == "repo2" + + def test_filter_repositories_case_insensitive_org(self): + """Should filter repositories case-insensitively for organizations. + + Tests the example from issue #198 where 'prai-org' doesn't match 'PRAI-Org'. + """ + args = Mock() + args.user = "prai-org" # lowercase (what user typed) + args.repository = None + args.name_regex = None + args.languages = None + args.exclude = None + args.fork = False + args.private = False + args.public = False + args.all = True + + repos = [ + { + "name": "repo1", + "owner": {"login": "PRAI-Org"}, # Different case (canonical from API) + "private": False, + "fork": False, + }, + ] + + filtered = github_backup.filter_repositories(args, repos) + + # Should match despite case difference + assert len(filtered) == 1 + assert filtered[0]["name"] == "repo1" + + def test_filter_repositories_case_variations(self): + """Should handle various case combinations correctly.""" + args = Mock() + args.user = "TeSt-UsEr" # Mixed case + args.repository = None + args.name_regex = None + args.languages = None + args.exclude = None + args.fork = False + args.private = False + args.public = False + args.all = True + + repos = [ + {"name": "repo1", "owner": {"login": "test-user"}, "private": False, "fork": False}, + {"name": "repo2", "owner": {"login": "TEST-USER"}, "private": False, "fork": False}, + {"name": "repo3", "owner": {"login": "TeSt-UsEr"}, "private": False, "fork": False}, + {"name": "repo4", "owner": {"login": "other-user"}, "private": False, "fork": False}, + ] + + filtered = github_backup.filter_repositories(args, repos) + + # Should match first 3 (all case variations of same user) + assert len(filtered) == 3 + assert set(r["name"] for r in filtered) == {"repo1", "repo2", "repo3"} + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) From 1d6d474408968f728b11aa50c55ec9bb7ddf068e Mon Sep 17 00:00:00 2001 From: Rodos Date: Sun, 7 Dec 2025 21:50:49 +1100 Subject: [PATCH 067/148] fix: improve error messages for inaccessible repos and empty wikis --- github_backup/github_backup.py | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index bbacdae..0282809 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -2041,11 +2041,14 @@ def fetch_repository( "git ls-remote " + remote_url, stdout=FNULL, stderr=FNULL, shell=True ) if initialized == 128: - logger.info( - "Skipping {0} ({1}) since it's not initialized".format( - name, masked_remote_url + if ".wiki.git" in remote_url: + logger.info( + "Skipping {0} wiki (wiki is enabled but has no content)".format(name) + ) + else: + logger.info( + "Skipping {0} (repository not accessible - may be empty, private, or credentials invalid)".format(name) ) - ) return if clone_exists: From eb5779ac23ba68dbe05981d1ded2a72500767504 Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Sun, 7 Dec 2025 13:59:35 +0000 Subject: [PATCH 068/148] Release version 0.55.0 --- CHANGES.rst | 41 ++++++++++++++++++++++++++++++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 41 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 1b02e0d..f15dd59 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,10 +1,49 @@ Changelog ========= -0.54.0 (2025-12-03) +0.55.0 (2025-12-07) ------------------- ------------------------ +Fix +~~~ +- Improve error messages for inaccessible repos and empty wikis. [Rodos] +- --all-starred now clones repos without --repositories. [Rodos] +- Warn when --private used without authentication. [Rodos] +- Warn and skip when --starred-gists used for different user. [Rodos] + + GitHub's API only allows retrieving starred gists for the authenticated + user. Previously, using --starred-gists when backing up a different user + would silently return no relevant data. + + Now warns and skips the retrieval entirely when the target user differs + from the authenticated user. Uses case-insensitive comparison to match + GitHub's username handling. + + Fixes #93 + +Other +~~~~~ +- Test: add missing test coverage for case sensitivity fix. [Rodos] +- Docs: fix RST formatting in Known blocking errors section. [Rodos] +- Chore(deps): bump urllib3 from 2.5.0 to 2.6.0. [dependabot[bot]] + + Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.0. + - [Release notes](https://github.com/urllib3/urllib3/releases) + - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) + - [Commits](https://github.com/urllib3/urllib3/compare/2.5.0...2.6.0) + + --- + updated-dependencies: + - dependency-name: urllib3 + dependency-version: 2.6.0 + dependency-type: direct:production + ... + + +0.54.0 (2025-12-03) +------------------- + Fix ~~~ - Send INFO/DEBUG to stdout, WARNING/ERROR to stderr. [Rodos] diff --git a/github_backup/__init__.py b/github_backup/__init__.py index 450ee12..8b19221 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.54.0" +__version__ = "0.55.0" From 2fbe8d272c2230d20e6a4d1ed13a40f47c53857a Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 8 Dec 2025 13:09:32 +0000 Subject: [PATCH 069/148] chore(deps): bump the python-packages group with 3 updates Bumps the python-packages group with 3 updates: [black](https://github.com/psf/black), [pytest](https://github.com/pytest-dev/pytest) and [platformdirs](https://github.com/tox-dev/platformdirs). Updates `black` from 25.11.0 to 25.12.0 - [Release notes](https://github.com/psf/black/releases) - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) - [Commits](https://github.com/psf/black/compare/25.11.0...25.12.0) Updates `pytest` from 9.0.1 to 9.0.2 - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pytest-dev/pytest/compare/9.0.1...9.0.2) Updates `platformdirs` from 4.5.0 to 4.5.1 - [Release notes](https://github.com/tox-dev/platformdirs/releases) - [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst) - [Commits](https://github.com/tox-dev/platformdirs/compare/4.5.0...4.5.1) --- updated-dependencies: - dependency-name: black dependency-version: 25.12.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages - dependency-name: pytest dependency-version: 9.0.2 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages - dependency-name: platformdirs dependency-version: 4.5.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/release-requirements.txt b/release-requirements.txt index b1323a0..d6e9b8e 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -1,5 +1,5 @@ autopep8==2.3.2 -black==25.11.0 +black==25.12.0 bleach==6.3.0 certifi==2025.11.12 charset-normalizer==3.4.4 @@ -8,7 +8,7 @@ colorama==0.4.6 docutils==0.22.3 flake8==7.3.0 gitchangelog==3.0.4 -pytest==9.0.1 +pytest==9.0.2 idna==3.11 importlib-metadata==8.7.0 jaraco.classes==3.4.0 @@ -21,7 +21,7 @@ mypy-extensions==1.1.0 packaging==25.0 pathspec==0.12.1 pkginfo==1.12.1.2 -platformdirs==4.5.0 +platformdirs==4.5.1 pycodestyle==2.14.0 pyflakes==3.4.0 Pygments==2.19.2 From 6d74af9126829b698a83cbe244093c9831b64f79 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Tue, 9 Dec 2025 13:10:12 +0000 Subject: [PATCH 070/148] chore(deps): bump urllib3 in the python-packages group Bumps the python-packages group with 1 update: [urllib3](https://github.com/urllib3/urllib3). Updates `urllib3` from 2.6.0 to 2.6.1 - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](https://github.com/urllib3/urllib3/compare/2.6.0...2.6.1) --- updated-dependencies: - dependency-name: urllib3 dependency-version: 2.6.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index d6e9b8e..5ca68cb 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -35,6 +35,6 @@ setuptools==80.9.0 six==1.17.0 tqdm==4.67.1 twine==6.2.0 -urllib3==2.6.0 +urllib3==2.6.1 webencodings==0.5.1 zipp==3.23.0 From 75e6f56773c0afc2d1bd1f8976603e673b6d1378 Mon Sep 17 00:00:00 2001 From: Rodos Date: Thu, 11 Dec 2025 20:27:03 +1100 Subject: [PATCH 071/148] docs: add "Restoring from Backup" section to README Clarifies that this tool is backup-only with no inbuilt restore. Documents that git repos can be pushed back, but issues/PRs have GitHub API limitations affecting all backup tools. Closes #246 --- README.rst | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/README.rst b/README.rst index 9fd35fd..f7bd30b 100644 --- a/README.rst +++ b/README.rst @@ -360,6 +360,25 @@ Debug an error/block or incomplete backup into a temporary directory. Omit "incr github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER +Restoring from Backup +===================== + +This tool creates backups only, there is no inbuilt restore command. + +**Git repositories, wikis, and gists** can be restored by pushing them back to GitHub as you would any git repository. For example, to restore a bare repository backup:: + + cd /tmp/white-house/repositories/petitions/repository + git push --mirror git@github.com:WhiteHouse/petitions.git + +**Issues, pull requests, comments, and other metadata** are saved as JSON files for archival purposes. The GitHub API does not support recreating this data faithfully, creating issues via the API has limitations: + +- New issue/PR numbers are assigned (original numbers cannot be set) +- Timestamps reflect creation time (original dates cannot be set) +- The API caller becomes the author (original authors cannot be set) +- Cross-references between issues and PRs will break + +These are GitHub API limitations that affect all backup and migration tools, not just this one. Recreating issues with these limitations via the GitHub API is an exercise for the reader. The JSON backups remain useful for searching, auditing, or manual reference. + Development =========== From e745b557557b808e19509df49352742af25c6201 Mon Sep 17 00:00:00 2001 From: Rodos Date: Thu, 11 Dec 2025 20:55:24 +1100 Subject: [PATCH 072/148] fix: replace deprecated git lfs clone with git clone + git lfs fetch --all git lfs clone is deprecated - modern git clone handles LFS automatically. Using git lfs fetch --all ensures all LFS objects across all refs are backed up, matching the existing bare clone behavior and providing complete LFS backups. Closes #379 --- README.rst | 2 ++ github_backup/github_backup.py | 10 ++++++---- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/README.rst b/README.rst index 9fd35fd..5630681 100644 --- a/README.rst +++ b/README.rst @@ -215,6 +215,8 @@ When you use the ``--lfs`` option, you will need to make sure you have Git LFS i Instructions on how to do this can be found on https://git-lfs.github.com. +LFS objects are fetched for all refs, not just the current checkout, ensuring a complete backup of all LFS content across all branches and history. + About Attachments ----------------- diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 0282809..f706741 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -2090,11 +2090,13 @@ def fetch_repository( git_command.pop() logging_subprocess(git_command, cwd=local_dir) else: - if lfs_clone: - git_command = ["git", "lfs", "clone", remote_url, local_dir] - else: - git_command = ["git", "clone", remote_url, local_dir] + git_command = ["git", "clone", remote_url, local_dir] logging_subprocess(git_command) + if lfs_clone: + git_command = ["git", "lfs", "fetch", "--all", "--prune"] + if no_prune: + git_command.pop() + logging_subprocess(git_command, cwd=local_dir) def backup_account(args, output_directory): From 3684756eaa8e7dfa799d12d01d4d2e65115345a3 Mon Sep 17 00:00:00 2001 From: Rodos Date: Thu, 11 Dec 2025 21:18:23 +1100 Subject: [PATCH 073/148] fix: add Windows support with entry_points and os.replace - Replace os.rename() with os.replace() for atomic file operations on Windows (os.rename fails if destination exists on Windows) - Add entry_points console_scripts for proper .exe generation on Windows - Create github_backup/cli.py with main() entry point - Add github_backup/__main__.py for python -m github_backup support - Keep bin/github-backup as thin wrapper for backwards compatibility Closes #112 --- bin/github-backup | 78 +++++--------------------------- github_backup/__main__.py | 13 ++++++ github_backup/cli.py | 82 ++++++++++++++++++++++++++++++++++ github_backup/github_backup.py | 12 ++--- setup.py | 6 ++- 5 files changed, 116 insertions(+), 75 deletions(-) create mode 100644 github_backup/__main__.py create mode 100644 github_backup/cli.py diff --git a/bin/github-backup b/bin/github-backup index dcac622..c922888 100755 --- a/bin/github-backup +++ b/bin/github-backup @@ -1,76 +1,18 @@ #!/usr/bin/env python +""" +Backwards-compatible wrapper script. -import logging -import os -import sys - -from github_backup.github_backup import ( - backup_account, - backup_repositories, - check_git_lfs_install, - filter_repositories, - get_auth, - get_authenticated_user, - logger, - mkdir_p, - parse_args, - retrieve_repositories, -) - -# INFO and DEBUG go to stdout, WARNING and above go to stderr -log_format = logging.Formatter( - fmt="%(asctime)s.%(msecs)03d: %(message)s", - datefmt="%Y-%m-%dT%H:%M:%S", -) - -stdout_handler = logging.StreamHandler(sys.stdout) -stdout_handler.setLevel(logging.DEBUG) -stdout_handler.addFilter(lambda r: r.levelno < logging.WARNING) -stdout_handler.setFormatter(log_format) - -stderr_handler = logging.StreamHandler(sys.stderr) -stderr_handler.setLevel(logging.WARNING) -stderr_handler.setFormatter(log_format) - -logging.basicConfig(level=logging.INFO, handlers=[stdout_handler, stderr_handler]) - +The recommended way to run github-backup is via the installed command +(pip install github-backup) or python -m github_backup. -def main(): - args = parse_args() +This script is kept for backwards compatibility with existing installations +that may reference this path directly. +""" - if args.private and not get_auth(args): - logger.warning( - "The --private flag has no effect without authentication. " - "Use -t/--token, -f/--token-fine, or -u/--username to authenticate." - ) - - if args.quiet: - logger.setLevel(logging.WARNING) - - output_directory = os.path.realpath(args.output_directory) - if not os.path.isdir(output_directory): - logger.info("Create output directory {0}".format(output_directory)) - mkdir_p(output_directory) - - if args.lfs_clone: - check_git_lfs_install() - - if args.log_level: - log_level = logging.getLevelName(args.log_level.upper()) - if isinstance(log_level, int): - logger.root.setLevel(log_level) - - if not args.as_app: - logger.info("Backing up user {0} to {1}".format(args.user, output_directory)) - authenticated_user = get_authenticated_user(args) - else: - authenticated_user = {"login": None} - - repositories = retrieve_repositories(args, authenticated_user) - repositories = filter_repositories(args, repositories) - backup_repositories(args, output_directory, repositories) - backup_account(args, output_directory) +import sys +from github_backup.cli import main +from github_backup.github_backup import logger if __name__ == "__main__": try: diff --git a/github_backup/__main__.py b/github_backup/__main__.py new file mode 100644 index 0000000..0b4a7c3 --- /dev/null +++ b/github_backup/__main__.py @@ -0,0 +1,13 @@ +"""Allow running as: python -m github_backup""" + +import sys + +from github_backup.cli import main +from github_backup.github_backup import logger + +if __name__ == "__main__": + try: + main() + except Exception as e: + logger.error(str(e)) + sys.exit(1) diff --git a/github_backup/cli.py b/github_backup/cli.py new file mode 100644 index 0000000..98f8d4a --- /dev/null +++ b/github_backup/cli.py @@ -0,0 +1,82 @@ +#!/usr/bin/env python +"""Command-line interface for github-backup.""" + +import logging +import os +import sys + +from github_backup.github_backup import ( + backup_account, + backup_repositories, + check_git_lfs_install, + filter_repositories, + get_auth, + get_authenticated_user, + logger, + mkdir_p, + parse_args, + retrieve_repositories, +) + +# INFO and DEBUG go to stdout, WARNING and above go to stderr +log_format = logging.Formatter( + fmt="%(asctime)s.%(msecs)03d: %(message)s", + datefmt="%Y-%m-%dT%H:%M:%S", +) + +stdout_handler = logging.StreamHandler(sys.stdout) +stdout_handler.setLevel(logging.DEBUG) +stdout_handler.addFilter(lambda r: r.levelno < logging.WARNING) +stdout_handler.setFormatter(log_format) + +stderr_handler = logging.StreamHandler(sys.stderr) +stderr_handler.setLevel(logging.WARNING) +stderr_handler.setFormatter(log_format) + +logging.basicConfig(level=logging.INFO, handlers=[stdout_handler, stderr_handler]) + + +def main(): + """Main entry point for github-backup CLI.""" + args = parse_args() + + if args.private and not get_auth(args): + logger.warning( + "The --private flag has no effect without authentication. " + "Use -t/--token, -f/--token-fine, or -u/--username to authenticate." + ) + + if args.quiet: + logger.setLevel(logging.WARNING) + + output_directory = os.path.realpath(args.output_directory) + if not os.path.isdir(output_directory): + logger.info("Create output directory {0}".format(output_directory)) + mkdir_p(output_directory) + + if args.lfs_clone: + check_git_lfs_install() + + if args.log_level: + log_level = logging.getLevelName(args.log_level.upper()) + if isinstance(log_level, int): + logger.root.setLevel(log_level) + + if not args.as_app: + logger.info("Backing up user {0} to {1}".format(args.user, output_directory)) + authenticated_user = get_authenticated_user(args) + else: + authenticated_user = {"login": None} + + repositories = retrieve_repositories(args, authenticated_user) + repositories = filter_repositories(args, repositories) + backup_repositories(args, output_directory, repositories) + backup_account(args, output_directory) + + +if __name__ == "__main__": + try: + main() + except Exception as e: + logger.error(str(e)) + sys.exit(1) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 0282809..14dd167 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -1038,7 +1038,7 @@ def download_attachment_file(url, path, auth, as_app=False, fine=False): bytes_downloaded += len(chunk) # Atomic rename to final location - os.rename(temp_path, path) + os.replace(temp_path, path) metadata["size_bytes"] = bytes_downloaded metadata["success"] = True @@ -1459,7 +1459,7 @@ def download_attachments( # Rename to add extension (already atomic from download) try: - os.rename(filepath, final_filepath) + os.replace(filepath, final_filepath) metadata["saved_as"] = os.path.basename(final_filepath) except Exception as e: logger.warning( @@ -1490,7 +1490,7 @@ def download_attachments( manifest_path = os.path.join(attachments_dir, "manifest.json") with open(manifest_path + ".temp", "w") as f: json.dump(manifest, f, indent=2) - os.rename(manifest_path + ".temp", manifest_path) # Atomic write + os.replace(manifest_path + ".temp", manifest_path) # Atomic write logger.debug( "Wrote manifest for {0} #{1}: {2} attachments".format( item_type_display, number, len(attachment_metadata_list) @@ -1811,7 +1811,7 @@ def backup_issues(args, repo_cwd, repository, repos_template): with codecs.open(issue_file + ".temp", "w", encoding="utf-8") as f: json_dump(issue, f) - os.rename(issue_file + ".temp", issue_file) # Unlike json_dump, this is atomic + os.replace(issue_file + ".temp", issue_file) # Atomic write def backup_pulls(args, repo_cwd, repository, repos_template): @@ -1886,7 +1886,7 @@ def backup_pulls(args, repo_cwd, repository, repos_template): with codecs.open(pull_file + ".temp", "w", encoding="utf-8") as f: json_dump(pull, f) - os.rename(pull_file + ".temp", pull_file) # Unlike json_dump, this is atomic + os.replace(pull_file + ".temp", pull_file) # Atomic write def backup_milestones(args, repo_cwd, repository, repos_template): @@ -2203,5 +2203,5 @@ def json_dump_if_changed(data, output_file_path): temp_file = output_file_path + ".temp" with codecs.open(temp_file, "w", encoding="utf-8") as f: f.write(new_content) - os.rename(temp_file, output_file_path) # Atomic on POSIX systems + os.replace(temp_file, output_file_path) # Atomic write return True diff --git a/setup.py b/setup.py index 374e6ec..7835a32 100644 --- a/setup.py +++ b/setup.py @@ -33,7 +33,11 @@ def open_file(fname): author="Jose Diaz-Gonzalez", author_email="github-backup@josediazgonzalez.com", packages=["github_backup"], - scripts=["bin/github-backup"], + entry_points={ + "console_scripts": [ + "github-backup=github_backup.cli:main", + ], + }, url="http://github.com/josegonzalez/python-github-backup", license="MIT", classifiers=[ From 2bb83d6d8b710dee274521b23cbc003e0c0240df Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Thu, 11 Dec 2025 16:50:28 +0000 Subject: [PATCH 074/148] Release version 0.56.0 --- CHANGES.rst | 96 ++++++++++++++++++++++++++++++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 96 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index f15dd59..37bdefc 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,10 +1,104 @@ Changelog ========= -0.55.0 (2025-12-07) +0.56.0 (2025-12-11) ------------------- ------------------------ +Fix +~~~ +- Replace deprecated git lfs clone with git clone + git lfs fetch --all. + [Rodos] + + git lfs clone is deprecated - modern git clone handles LFS automatically. + Using git lfs fetch --all ensures all LFS objects across all refs are + backed up, matching the existing bare clone behavior and providing + complete LFS backups. + + Closes #379 +- Add Windows support with entry_points and os.replace. [Rodos] + + - Replace os.rename() with os.replace() for atomic file operations + on Windows (os.rename fails if destination exists on Windows) + - Add entry_points console_scripts for proper .exe generation on Windows + - Create github_backup/cli.py with main() entry point + - Add github_backup/__main__.py for python -m github_backup support + - Keep bin/github-backup as thin wrapper for backwards compatibility + + Closes #112 + +Other +~~~~~ +- Docs: add "Restoring from Backup" section to README. [Rodos] + + Clarifies that this tool is backup-only with no inbuilt restore. + Documents that git repos can be pushed back, but issues/PRs have + GitHub API limitations affecting all backup tools. + + Closes #246 +- Chore(deps): bump urllib3 in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [urllib3](https://github.com/urllib3/urllib3). + + + Updates `urllib3` from 2.6.0 to 2.6.1 + - [Release notes](https://github.com/urllib3/urllib3/releases) + - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) + - [Commits](https://github.com/urllib3/urllib3/compare/2.6.0...2.6.1) + + --- + updated-dependencies: + - dependency-name: urllib3 + dependency-version: 2.6.1 + dependency-type: direct:production + update-type: version-update:semver-patch + dependency-group: python-packages + ... +- Chore(deps): bump the python-packages group with 3 updates. + [dependabot[bot]] + + Bumps the python-packages group with 3 updates: [black](https://github.com/psf/black), [pytest](https://github.com/pytest-dev/pytest) and [platformdirs](https://github.com/tox-dev/platformdirs). + + + Updates `black` from 25.11.0 to 25.12.0 + - [Release notes](https://github.com/psf/black/releases) + - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) + - [Commits](https://github.com/psf/black/compare/25.11.0...25.12.0) + + Updates `pytest` from 9.0.1 to 9.0.2 + - [Release notes](https://github.com/pytest-dev/pytest/releases) + - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) + - [Commits](https://github.com/pytest-dev/pytest/compare/9.0.1...9.0.2) + + Updates `platformdirs` from 4.5.0 to 4.5.1 + - [Release notes](https://github.com/tox-dev/platformdirs/releases) + - [Changelog](https://github.com/tox-dev/platformdirs/blob/main/CHANGES.rst) + - [Commits](https://github.com/tox-dev/platformdirs/compare/4.5.0...4.5.1) + + --- + updated-dependencies: + - dependency-name: black + dependency-version: 25.12.0 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + - dependency-name: pytest + dependency-version: 9.0.2 + dependency-type: direct:production + update-type: version-update:semver-patch + dependency-group: python-packages + - dependency-name: platformdirs + dependency-version: 4.5.1 + dependency-type: direct:production + update-type: version-update:semver-patch + dependency-group: python-packages + ... + + +0.55.0 (2025-12-07) +------------------- + Fix ~~~ - Improve error messages for inaccessible repos and empty wikis. [Rodos] diff --git a/github_backup/__init__.py b/github_backup/__init__.py index 8b19221..9dc8116 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.55.0" +__version__ = "0.56.0" From 3a513b6646e37e1c40ed066956b66079261e1b2e Mon Sep 17 00:00:00 2001 From: Rodos Date: Fri, 12 Dec 2025 09:55:13 +1100 Subject: [PATCH 075/148] docs: add stdin token example to README Add example showing how to pipe a token from stdin using file:///dev/stdin to avoid storing tokens in environment variables or command history. Closes #187 --- README.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.rst b/README.rst index 9fd35fd..55e21c8 100644 --- a/README.rst +++ b/README.rst @@ -359,6 +359,9 @@ Debug an error/block or incomplete backup into a temporary directory. Omit "incr github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER +Pipe a token from stdin to avoid storing it in environment variables or command history (Unix-like systems only):: + + my-secret-manager get github-token | github-backup user -t file:///dev/stdin -o /backup --repositories Development From ef990483e2bcc76257776b02fbcf239943d09897 Mon Sep 17 00:00:00 2001 From: Rodos Date: Fri, 12 Dec 2025 10:25:49 +1100 Subject: [PATCH 076/148] Add GitHub Apps documentation and remove outdated header - Add GitHub Apps authentication section with setup steps and CI/CD workflow example using actions/create-github-app-token - Remove outdated machine-man-preview header (graduated 2020) Closes #189 --- README.rst | 31 +++++++++++++++++++++++++++++++ github_backup/github_backup.py | 3 --- 2 files changed, 31 insertions(+), 3 deletions(-) diff --git a/README.rst b/README.rst index 55e21c8..272b606 100644 --- a/README.rst +++ b/README.rst @@ -174,6 +174,37 @@ Customise the permissions for your use case, but for a personal account full bac **Repository permissions**: Read access to contents, issues, metadata, pull requests, and webhooks. +GitHub Apps +~~~~~~~~~~~ + +GitHub Apps are ideal for organization backups in CI/CD. Tokens are scoped to specific repositories and expire after 1 hour. + +**One-time setup:** + +1. Create a GitHub App at *Settings -> Developer Settings -> GitHub Apps -> New GitHub App* +2. Set a name and homepage URL (can be any URL) +3. Uncheck "Webhook > Active" (not needed for backups) +4. Set permissions (same as fine-grained tokens above) +5. Click "Create GitHub App", then note the **App ID** shown on the next page +6. Under "Private keys", click "Generate a private key" and save the downloaded file +7. Go to *Install App* in your app's settings +8. Select the account/organization and which repositories to back up + +**CI/CD usage with GitHub Actions:** + +Store the App ID as a repository variable and the private key contents as a secret, then use ``actions/create-github-app-token``:: + + - uses: actions/create-github-app-token@v1 + id: app-token + with: + app-id: ${{ vars.APP_ID }} + private-key: ${{ secrets.APP_PRIVATE_KEY }} + + - run: github-backup myorg -t ${{ steps.app-token.outputs.token }} --as-app -o ./backup --all + +Note: Installation tokens expire after 1 hour. For long-running backups, use a fine-grained personal access token instead. + + Prefer SSH ~~~~~~~~~~ diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 0282809..21daa20 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -796,9 +796,6 @@ def _construct_request( else: auth = auth.encode("ascii") request.add_header("Authorization", "token ".encode("ascii") + auth) - request.add_header( - "Accept", "application/vnd.github.machine-man-preview+json" - ) log_url = template if "?" not in template else template.split("?")[0] if querystring: From f6e2f40b0986260a20eed20e29fe124c53d32941 Mon Sep 17 00:00:00 2001 From: Rodos Date: Fri, 12 Dec 2025 16:14:47 +1100 Subject: [PATCH 077/148] Add --skip-assets-on flag to skip release asset downloads (#135) Allow users to skip downloading release assets for specific repositories while still backing up release metadata. Useful for starred repos with large assets (e.g. syncthing with 27GB+). Usage: --skip-assets-on repo1 repo2 owner/repo3 Features: - Space-separated repos (consistent with --exclude) - Case-insensitive matching - Supports both repo name and owner/repo format --- README.rst | 7 +- github_backup/github_backup.py | 102 +++++++---- tests/test_skip_assets_on.py | 320 +++++++++++++++++++++++++++++++++ 3 files changed, 397 insertions(+), 32 deletions(-) create mode 100644 tests/test_skip_assets_on.py diff --git a/README.rst b/README.rst index f292c87..506b67b 100644 --- a/README.rst +++ b/README.rst @@ -50,8 +50,8 @@ CLI Help output:: [--keychain-name OSX_KEYCHAIN_ITEM_NAME] [--keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT] [--releases] [--latest-releases NUMBER_OF_LATEST_RELEASES] - [--skip-prerelease] [--assets] [--attachments] - [--exclude [REPOSITORY [REPOSITORY ...]] + [--skip-prerelease] [--assets] [--skip-assets-on [REPO ...]] + [--attachments] [--exclude [REPOSITORY [REPOSITORY ...]] [--throttle-limit THROTTLE_LIMIT] [--throttle-pause THROTTLE_PAUSE] USER @@ -133,6 +133,9 @@ CLI Help output:: --skip-prerelease skip prerelease and draft versions; only applies if including releases --assets include assets alongside release information; only applies if including releases + --skip-assets-on [REPO ...] + skip asset downloads for these repositories (e.g. + --skip-assets-on repo1 owner/repo2) --attachments download user-attachments from issues and pull requests to issues/attachments/{issue_number}/ and pulls/attachments/{pull_number}/ directories diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 0782514..b9c23a7 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -440,6 +440,12 @@ def parse_args(args=None): dest="include_assets", help="include assets alongside release information; only applies if including releases", ) + parser.add_argument( + "--skip-assets-on", + dest="skip_assets_on", + nargs="*", + help="skip asset downloads for these repositories", + ) parser.add_argument( "--attachments", action="store_true", @@ -561,7 +567,7 @@ def get_github_host(args): def read_file_contents(file_uri): - return open(file_uri[len(FILE_URI_PREFIX):], "rt").readline().strip() + return open(file_uri[len(FILE_URI_PREFIX) :], "rt").readline().strip() def get_github_repo_url(args, repository): @@ -631,7 +637,7 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False): pass raise RepositoryUnavailableError( "Repository unavailable due to legal reasons (HTTP 451)", - dmca_url=dmca_url + dmca_url=dmca_url, ) # Check if we got correct data @@ -709,7 +715,7 @@ def retrieve_data_gen(args, template, query_args=None, single_request=False): # Parse Link header: ; rel="next" for link in link_header.split(","): if 'rel="next"' in link: - next_url = link[link.find("<") + 1:link.find(">")] + next_url = link[link.find("<") + 1 : link.find(">")] break if not next_url: break @@ -763,9 +769,7 @@ def _get_response(request, auth, template): return r, errors -def _construct_request( - per_page, query_args, template, auth, as_app=None, fine=False -): +def _construct_request(per_page, query_args, template, auth, as_app=None, fine=False): # If template is already a full URL with query params (from Link header), use it directly if "?" in template and template.startswith("http"): request_url = template @@ -1480,9 +1484,11 @@ def download_attachments( manifest = { "issue_number": number, "issue_type": item_type, - "repository": f"{args.user}/{args.repository}" - if hasattr(args, "repository") and args.repository - else args.user, + "repository": ( + f"{args.user}/{args.repository}" + if hasattr(args, "repository") and args.repository + else args.user + ), "manifest_updated_at": datetime.now(timezone.utc).isoformat(), "attachments": attachment_metadata_list, } @@ -1538,9 +1544,7 @@ def retrieve_repositories(args, authenticated_user): else: repo_path = "{0}/{1}".format(args.user, args.repository) single_request = True - template = "https://{0}/repos/{1}".format( - get_github_api_host(args), repo_path - ) + template = "https://{0}/repos/{1}".format(get_github_api_host(args), repo_path) repos = retrieve_data(args, template, single_request=single_request) @@ -1565,7 +1569,10 @@ def retrieve_repositories(args, authenticated_user): repos.extend(gists) if args.include_starred_gists: - if not authenticated_user.get("login") or args.user.lower() != authenticated_user["login"].lower(): + if ( + not authenticated_user.get("login") + or args.user.lower() != authenticated_user["login"].lower() + ): logger.warning( "Cannot retrieve starred gists for '%s'. GitHub only allows access to the authenticated user's starred gists.", args.user, @@ -1673,9 +1680,11 @@ def backup_repositories(args, output_directory, repositories): include_gists = args.include_gists or args.include_starred_gists include_starred = args.all_starred and repository.get("is_starred") - if (args.include_repository or args.include_everything) or ( - include_gists and repository.get("is_gist") - ) or include_starred: + if ( + (args.include_repository or args.include_everything) + or (include_gists and repository.get("is_gist")) + or include_starred + ): repo_name = ( repository.get("name") if not repository.get("is_gist") @@ -1735,7 +1744,9 @@ def backup_repositories(args, output_directory, repositories): include_assets=args.include_assets or args.include_everything, ) except RepositoryUnavailableError as e: - logger.warning(f"Repository {repository['full_name']} is unavailable (HTTP 451)") + logger.warning( + f"Repository {repository['full_name']} is unavailable (HTTP 451)" + ) if e.dmca_url: logger.warning(f"DMCA notice: {e.dmca_url}") logger.info(f"Skipping remaining resources for {repository['full_name']}") @@ -1795,7 +1806,11 @@ def backup_issues(args, repo_cwd, repository, repos_template): modified = os.path.getmtime(issue_file) modified = datetime.fromtimestamp(modified).strftime("%Y-%m-%dT%H:%M:%SZ") if modified > issue["updated_at"]: - logger.info("Skipping issue {0} because it wasn't modified since last backup".format(number)) + logger.info( + "Skipping issue {0} because it wasn't modified since last backup".format( + number + ) + ) continue if args.include_issue_comments or args.include_everything: @@ -1869,7 +1884,11 @@ def backup_pulls(args, repo_cwd, repository, repos_template): modified = os.path.getmtime(pull_file) modified = datetime.fromtimestamp(modified).strftime("%Y-%m-%dT%H:%M:%SZ") if modified > pull["updated_at"]: - logger.info("Skipping pull request {0} because it wasn't modified since last backup".format(number)) + logger.info( + "Skipping pull request {0} because it wasn't modified since last backup".format( + number + ) + ) continue if args.include_pull_comments or args.include_everything: template = comments_regular_template.format(number) @@ -1919,9 +1938,11 @@ def backup_milestones(args, repo_cwd, repository, repos_template): elif written_count == 0: logger.info("{0} milestones unchanged, skipped write".format(total)) else: - logger.info("Saved {0} of {1} milestones to disk ({2} unchanged)".format( - written_count, total, total - written_count - )) + logger.info( + "Saved {0} of {1} milestones to disk ({2} unchanged)".format( + written_count, total, total - written_count + ) + ) def backup_labels(args, repo_cwd, repository, repos_template): @@ -1975,6 +1996,20 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F ) releases = releases[: args.number_of_latest_releases] + # Check if this repo should skip asset downloads (case-insensitive) + skip_assets = False + if include_assets: + repo_name = repository.get("name", "").lower() + repo_full_name = repository.get("full_name", "").lower() + skip_repos = [r.lower() for r in (args.skip_assets_on or [])] + skip_assets = repo_name in skip_repos or repo_full_name in skip_repos + if skip_assets: + logger.info( + "Skipping assets for {0} ({1} releases) due to --skip-assets-on".format( + repository.get("name"), len(releases) + ) + ) + # for each release, store it written_count = 0 for release in releases: @@ -1986,7 +2021,7 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F if json_dump_if_changed(release, output_filepath): written_count += 1 - if include_assets: + if include_assets and not skip_assets: assets = retrieve_data(args, release["assets_url"]) if len(assets) > 0: # give release asset files somewhere to live & download them (not including source archives) @@ -2008,9 +2043,11 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F elif written_count == 0: logger.info("{0} releases unchanged, skipped write".format(total)) else: - logger.info("Saved {0} of {1} releases to disk ({2} unchanged)".format( - written_count, total, total - written_count - )) + logger.info( + "Saved {0} of {1} releases to disk ({2} unchanged)".format( + written_count, total, total - written_count + ) + ) def fetch_repository( @@ -2024,9 +2061,12 @@ def fetch_repository( ): if bare_clone: if os.path.exists(local_dir): - clone_exists = subprocess.check_output( - ["git", "rev-parse", "--is-bare-repository"], cwd=local_dir - ) == b"true\n" + clone_exists = ( + subprocess.check_output( + ["git", "rev-parse", "--is-bare-repository"], cwd=local_dir + ) + == b"true\n" + ) else: clone_exists = False else: @@ -2047,7 +2087,9 @@ def fetch_repository( ) else: logger.info( - "Skipping {0} (repository not accessible - may be empty, private, or credentials invalid)".format(name) + "Skipping {0} (repository not accessible - may be empty, private, or credentials invalid)".format( + name + ) ) return diff --git a/tests/test_skip_assets_on.py b/tests/test_skip_assets_on.py new file mode 100644 index 0000000..2437e05 --- /dev/null +++ b/tests/test_skip_assets_on.py @@ -0,0 +1,320 @@ +"""Tests for --skip-assets-on flag behavior (issue #135).""" + +import pytest +from unittest.mock import Mock, patch + +from github_backup import github_backup + + +class TestSkipAssetsOn: + """Test suite for --skip-assets-on flag. + + Issue #135: Allow skipping asset downloads for specific repositories + while still backing up release metadata. + """ + + def _create_mock_args(self, **overrides): + """Create a mock args object with sensible defaults.""" + args = Mock() + args.user = "testuser" + args.output_directory = "/tmp/backup" + args.include_repository = False + args.include_everything = False + args.include_gists = False + args.include_starred_gists = False + args.all_starred = False + args.skip_existing = False + args.bare_clone = False + args.lfs_clone = False + args.no_prune = False + args.include_wiki = False + args.include_issues = False + args.include_issue_comments = False + args.include_issue_events = False + args.include_pulls = False + args.include_pull_comments = False + args.include_pull_commits = False + args.include_pull_details = False + args.include_labels = False + args.include_hooks = False + args.include_milestones = False + args.include_releases = True + args.include_assets = True + args.skip_assets_on = [] + args.include_attachments = False + args.incremental = False + args.incremental_by_files = False + args.github_host = None + args.prefer_ssh = False + args.token_classic = "test-token" + args.token_fine = None + args.username = None + args.password = None + args.as_app = False + args.osx_keychain_item_name = None + args.osx_keychain_item_account = None + args.skip_prerelease = False + args.number_of_latest_releases = None + + for key, value in overrides.items(): + setattr(args, key, value) + + return args + + def _create_mock_repository(self, name="test-repo", owner="testuser"): + """Create a mock repository object.""" + return { + "name": name, + "full_name": f"{owner}/{name}", + "owner": {"login": owner}, + "private": False, + "fork": False, + "has_wiki": False, + } + + def _create_mock_release(self, tag="v1.0.0"): + """Create a mock release object.""" + return { + "tag_name": tag, + "name": tag, + "prerelease": False, + "draft": False, + "assets_url": f"https://api.github.com/repos/testuser/test-repo/releases/{tag}/assets", + } + + def _create_mock_asset(self, name="asset.zip"): + """Create a mock asset object.""" + return { + "name": name, + "url": f"https://api.github.com/repos/testuser/test-repo/releases/assets/{name}", + } + + +class TestSkipAssetsOnArgumentParsing(TestSkipAssetsOn): + """Tests for --skip-assets-on argument parsing.""" + + def test_skip_assets_on_not_set_defaults_to_none(self): + """When --skip-assets-on is not specified, it should default to None.""" + args = github_backup.parse_args(["testuser"]) + assert args.skip_assets_on is None + + def test_skip_assets_on_single_repo(self): + """Single --skip-assets-on should create list with one item.""" + args = github_backup.parse_args(["testuser", "--skip-assets-on", "big-repo"]) + assert args.skip_assets_on == ["big-repo"] + + def test_skip_assets_on_multiple_repos(self): + """Multiple repos can be specified space-separated (like --exclude).""" + args = github_backup.parse_args( + [ + "testuser", + "--skip-assets-on", + "big-repo", + "another-repo", + "owner/third-repo", + ] + ) + assert args.skip_assets_on == ["big-repo", "another-repo", "owner/third-repo"] + + +class TestSkipAssetsOnBehavior(TestSkipAssetsOn): + """Tests for --skip-assets-on behavior in backup_releases.""" + + @patch("github_backup.github_backup.download_file") + @patch("github_backup.github_backup.retrieve_data") + @patch("github_backup.github_backup.mkdir_p") + @patch("github_backup.github_backup.json_dump_if_changed") + def test_assets_downloaded_when_not_skipped( + self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download + ): + """Assets should be downloaded when repo is not in skip list.""" + args = self._create_mock_args(skip_assets_on=[]) + repository = self._create_mock_repository(name="normal-repo") + release = self._create_mock_release() + asset = self._create_mock_asset() + + mock_json_dump.return_value = True + mock_retrieve.side_effect = [ + [release], # First call: get releases + [asset], # Second call: get assets + ] + + with patch("os.path.join", side_effect=lambda *args: "/".join(args)): + github_backup.backup_releases( + args, + "/tmp/backup/repositories/normal-repo", + repository, + "https://api.github.com/repos/{owner}/{repo}", + include_assets=True, + ) + + # download_file should have been called for the asset + mock_download.assert_called_once() + + @patch("github_backup.github_backup.download_file") + @patch("github_backup.github_backup.retrieve_data") + @patch("github_backup.github_backup.mkdir_p") + @patch("github_backup.github_backup.json_dump_if_changed") + def test_assets_skipped_when_repo_name_matches( + self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download + ): + """Assets should be skipped when repo name is in skip list.""" + args = self._create_mock_args(skip_assets_on=["big-repo"]) + repository = self._create_mock_repository(name="big-repo") + release = self._create_mock_release() + + mock_json_dump.return_value = True + mock_retrieve.return_value = [release] + + github_backup.backup_releases( + args, + "/tmp/backup/repositories/big-repo", + repository, + "https://api.github.com/repos/{owner}/{repo}", + include_assets=True, + ) + + # download_file should NOT have been called + mock_download.assert_not_called() + + @patch("github_backup.github_backup.download_file") + @patch("github_backup.github_backup.retrieve_data") + @patch("github_backup.github_backup.mkdir_p") + @patch("github_backup.github_backup.json_dump_if_changed") + def test_assets_skipped_when_full_name_matches( + self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download + ): + """Assets should be skipped when owner/repo format matches.""" + args = self._create_mock_args(skip_assets_on=["otheruser/big-repo"]) + repository = self._create_mock_repository(name="big-repo", owner="otheruser") + release = self._create_mock_release() + + mock_json_dump.return_value = True + mock_retrieve.return_value = [release] + + github_backup.backup_releases( + args, + "/tmp/backup/repositories/big-repo", + repository, + "https://api.github.com/repos/{owner}/{repo}", + include_assets=True, + ) + + # download_file should NOT have been called + mock_download.assert_not_called() + + @patch("github_backup.github_backup.download_file") + @patch("github_backup.github_backup.retrieve_data") + @patch("github_backup.github_backup.mkdir_p") + @patch("github_backup.github_backup.json_dump_if_changed") + def test_case_insensitive_matching( + self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download + ): + """Skip matching should be case-insensitive.""" + # User types uppercase, repo name is lowercase + args = self._create_mock_args(skip_assets_on=["BIG-REPO"]) + repository = self._create_mock_repository(name="big-repo") + release = self._create_mock_release() + + mock_json_dump.return_value = True + mock_retrieve.return_value = [release] + + github_backup.backup_releases( + args, + "/tmp/backup/repositories/big-repo", + repository, + "https://api.github.com/repos/{owner}/{repo}", + include_assets=True, + ) + + # download_file should NOT have been called (case-insensitive match) + assert not mock_download.called + + @patch("github_backup.github_backup.download_file") + @patch("github_backup.github_backup.retrieve_data") + @patch("github_backup.github_backup.mkdir_p") + @patch("github_backup.github_backup.json_dump_if_changed") + def test_multiple_skip_repos( + self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download + ): + """Multiple repos in skip list should all be skipped.""" + args = self._create_mock_args(skip_assets_on=["repo1", "repo2", "repo3"]) + repository = self._create_mock_repository(name="repo2") + release = self._create_mock_release() + + mock_json_dump.return_value = True + mock_retrieve.return_value = [release] + + github_backup.backup_releases( + args, + "/tmp/backup/repositories/repo2", + repository, + "https://api.github.com/repos/{owner}/{repo}", + include_assets=True, + ) + + # download_file should NOT have been called + mock_download.assert_not_called() + + @patch("github_backup.github_backup.download_file") + @patch("github_backup.github_backup.retrieve_data") + @patch("github_backup.github_backup.mkdir_p") + @patch("github_backup.github_backup.json_dump_if_changed") + def test_release_metadata_still_saved_when_assets_skipped( + self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download + ): + """Release JSON should still be saved even when assets are skipped.""" + args = self._create_mock_args(skip_assets_on=["big-repo"]) + repository = self._create_mock_repository(name="big-repo") + release = self._create_mock_release() + + mock_json_dump.return_value = True + mock_retrieve.return_value = [release] + + github_backup.backup_releases( + args, + "/tmp/backup/repositories/big-repo", + repository, + "https://api.github.com/repos/{owner}/{repo}", + include_assets=True, + ) + + # json_dump_if_changed should have been called for release metadata + mock_json_dump.assert_called_once() + # But download_file should NOT have been called + mock_download.assert_not_called() + + @patch("github_backup.github_backup.download_file") + @patch("github_backup.github_backup.retrieve_data") + @patch("github_backup.github_backup.mkdir_p") + @patch("github_backup.github_backup.json_dump_if_changed") + def test_non_matching_repo_still_downloads_assets( + self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download + ): + """Repos not in skip list should still download assets.""" + args = self._create_mock_args(skip_assets_on=["other-repo"]) + repository = self._create_mock_repository(name="normal-repo") + release = self._create_mock_release() + asset = self._create_mock_asset() + + mock_json_dump.return_value = True + mock_retrieve.side_effect = [ + [release], # First call: get releases + [asset], # Second call: get assets + ] + + with patch("os.path.join", side_effect=lambda *args: "/".join(args)): + github_backup.backup_releases( + args, + "/tmp/backup/repositories/normal-repo", + repository, + "https://api.github.com/repos/{owner}/{repo}", + include_assets=True, + ) + + # download_file SHOULD have been called + mock_download.assert_called_once() + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) From ba852b58307cbb1a44f8d383fe0dbfd54fc41c5b Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Fri, 12 Dec 2025 11:07:14 +0000 Subject: [PATCH 078/148] Release version 0.57.0 --- CHANGES.rst | 33 ++++++++++++++++++++++++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 33 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 37bdefc..1a8809e 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,9 +1,40 @@ Changelog ========= -0.56.0 (2025-12-11) +0.57.0 (2025-12-12) ------------------- ------------------------ +- Add GitHub Apps documentation and remove outdated header. [Rodos] + + - Add GitHub Apps authentication section with setup steps + and CI/CD workflow example using actions/create-github-app-token + - Remove outdated machine-man-preview header (graduated 2020) + + Closes #189 +- Docs: add stdin token example to README. [Rodos] + + Add example showing how to pipe a token from stdin using + file:///dev/stdin to avoid storing tokens in environment + variables or command history. + + Closes #187 +- Add --skip-assets-on flag to skip release asset downloads (#135) + [Rodos] + + Allow users to skip downloading release assets for specific repositories + while still backing up release metadata. Useful for starred repos with + large assets (e.g. syncthing with 27GB+). + + Usage: --skip-assets-on repo1 repo2 owner/repo3 + + Features: + - Space-separated repos (consistent with --exclude) + - Case-insensitive matching + - Supports both repo name and owner/repo format + + +0.56.0 (2025-12-11) +------------------- Fix ~~~ diff --git a/github_backup/__init__.py b/github_backup/__init__.py index 9dc8116..6e6e624 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.56.0" +__version__ = "0.57.0" From 59a70ff11aaa0c60c10d0116e6962118d70f46e5 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Fri, 12 Dec 2025 13:09:29 +0000 Subject: [PATCH 079/148] chore(deps): bump urllib3 in the python-packages group Bumps the python-packages group with 1 update: [urllib3](https://github.com/urllib3/urllib3). Updates `urllib3` from 2.6.1 to 2.6.2 - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](https://github.com/urllib3/urllib3/compare/2.6.1...2.6.2) --- updated-dependencies: - dependency-name: urllib3 dependency-version: 2.6.2 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index 5ca68cb..7a478f8 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -35,6 +35,6 @@ setuptools==80.9.0 six==1.17.0 tqdm==4.67.1 twine==6.2.0 -urllib3==2.6.1 +urllib3==2.6.2 webencodings==0.5.1 zipp==3.23.0 From 241949137deead07b8d4e0c7a4a1a28b7cedbf61 Mon Sep 17 00:00:00 2001 From: Rodos Date: Sat, 13 Dec 2025 11:22:53 +1100 Subject: [PATCH 080/148] chore: remove transitive deps from release-requirements.txt --- release-requirements.txt | 45 +++++++++------------------------------- 1 file changed, 10 insertions(+), 35 deletions(-) diff --git a/release-requirements.txt b/release-requirements.txt index 7a478f8..dd2d73f 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -1,40 +1,15 @@ +# Linting & Formatting autopep8==2.3.2 black==25.12.0 -bleach==6.3.0 -certifi==2025.11.12 -charset-normalizer==3.4.4 -click==8.3.1 -colorama==0.4.6 -docutils==0.22.3 flake8==7.3.0 -gitchangelog==3.0.4 + +# Testing pytest==9.0.2 -idna==3.11 -importlib-metadata==8.7.0 -jaraco.classes==3.4.0 -keyring==25.7.0 -markdown-it-py==4.0.0 -mccabe==0.7.0 -mdurl==0.1.2 -more-itertools==10.8.0 -mypy-extensions==1.1.0 -packaging==25.0 -pathspec==0.12.1 -pkginfo==1.12.1.2 -platformdirs==4.5.1 -pycodestyle==2.14.0 -pyflakes==3.4.0 -Pygments==2.19.2 -readme-renderer==44.0 -requests==2.32.5 -requests-toolbelt==1.0.0 -restructuredtext-lint==2.0.2 -rfc3986==2.0.0 -rich==14.2.0 -setuptools==80.9.0 -six==1.17.0 -tqdm==4.67.1 + +# Release & Publishing twine==6.2.0 -urllib3==2.6.2 -webencodings==0.5.1 -zipp==3.23.0 +gitchangelog==3.0.4 +setuptools==80.9.0 + +# Documentation +restructuredtext-lint==2.0.2 From 46140b0ff13dd512960f42365b35d5ebd011aff6 Mon Sep 17 00:00:00 2001 From: Rodos Date: Tue, 16 Dec 2025 21:44:16 +1100 Subject: [PATCH 081/148] Fix retry logic for HTTP 5xx errors and network failures Refactors error handling to retry all 5xx errors (not just 502), network errors (URLError, socket.error, IncompleteRead), and JSON parse errors with exponential backoff and jitter. Respects retry-after and rate limit headers per GitHub API requirements. Consolidates retry logic into make_request_with_retry() wrapper and adds clear logging for retry attempts and failures. Removes dead code from 2016 (errors list, _request_http_error, _request_url_error) that was intentionally disabled in commit 1e5a9048 to fix #29. Fixes #140, #110, #138 --- github_backup/github_backup.py | 369 +++++++++++++++------------------ tests/test_http_451.py | 55 +---- tests/test_pagination.py | 20 +- tests/test_retrieve_data.py | 365 ++++++++++++++++++++++++++++++++ 4 files changed, 545 insertions(+), 264 deletions(-) create mode 100644 tests/test_retrieve_data.py diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 4bd38ce..34d529a 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -12,6 +12,7 @@ import logging import os import platform +import random import re import select import socket @@ -19,6 +20,7 @@ import subprocess import sys import time +from collections.abc import Generator from datetime import datetime from http.client import IncompleteRead from urllib.error import HTTPError, URLError @@ -74,6 +76,9 @@ def __init__(self, message, dmca_url=None): " 3. Debian/Ubuntu: apt-get install ca-certificates\n\n" ) +# Retry configuration +MAX_RETRIES = 5 + def logging_subprocess( popenargs, stdout_log_level=logging.DEBUG, stderr_log_level=logging.ERROR, **kwargs @@ -603,170 +608,178 @@ def get_github_repo_url(args, repository): return repo_url -def retrieve_data_gen(args, template, query_args=None, single_request=False): - auth = get_auth(args, encode=not args.as_app) - query_args = get_query_args(query_args) - per_page = 100 - next_url = None +def calculate_retry_delay(attempt, headers): + """Calculate delay before next retry with exponential backoff.""" + # Respect retry-after header if present + if retry_after := headers.get("retry-after"): + return int(retry_after) - while True: - if single_request: - request_per_page = None - else: - request_per_page = per_page + # Respect rate limit reset time + if int(headers.get("x-ratelimit-remaining", 1)) < 1: + reset_time = int(headers.get("x-ratelimit-reset", 0)) + return max(10, reset_time - calendar.timegm(time.gmtime())) - request = _construct_request( - request_per_page, - query_args, - next_url or template, - auth, - as_app=args.as_app, - fine=True if args.token_fine is not None else False, - ) # noqa - r, errors = _get_response(request, auth, next_url or template) + # Exponential backoff with jitter for server errors (1s base, 120s max) + delay = min(1.0 * (2**attempt), 120.0) + return delay + random.uniform(0, delay * 0.1) - status_code = int(r.getcode()) - # Handle DMCA takedown (HTTP 451) - raise exception to skip entire repository - if status_code == 451: - dmca_url = None - try: - response_data = json.loads(r.read().decode("utf-8")) - dmca_url = response_data.get("block", {}).get("html_url") - except Exception: - pass - raise RepositoryUnavailableError( - "Repository unavailable due to legal reasons (HTTP 451)", - dmca_url=dmca_url, - ) +def retrieve_data(args, template, query_args=None, paginated=True): + """ + Fetch the data from GitHub API. - # Check if we got correct data - try: - response = json.loads(r.read().decode("utf-8")) - except IncompleteRead: - logger.warning("Incomplete read error detected") - read_error = True - except json.decoder.JSONDecodeError: - logger.warning("JSON decode error detected") - read_error = True - except TimeoutError: - logger.warning("Tiemout error detected") - read_error = True - else: - read_error = False + Handle both single requests and pagination with yield of individual dicts. + Handles throttling, retries, read errors, and DMCA takedowns. + """ + query_args = query_args or {} + auth = get_auth(args, encode=not args.as_app) + per_page = 100 - # be gentle with API request limit and throttle requests if remaining requests getting low - limit_remaining = int(r.headers.get("x-ratelimit-remaining", 0)) - if args.throttle_limit and limit_remaining <= args.throttle_limit: - logger.info( - "API request limit hit: {} requests left, pausing further requests for {}s".format( - limit_remaining, args.throttle_pause + def _extract_next_page_url(link_header): + for link in link_header.split(","): + if 'rel="next"' in link: + return link[link.find("<") + 1:link.find(">")] + return None + + def fetch_all() -> Generator[dict, None, None]: + next_url = None + + while True: + # FIRST: Fetch response + + for attempt in range(MAX_RETRIES): + request = _construct_request( + per_page=per_page if paginated else None, + query_args=query_args, + template=next_url or template, + auth=auth, + as_app=args.as_app, + fine=args.token_fine is not None, ) - ) - time.sleep(args.throttle_pause) - - retries = 0 - while retries < 3 and (status_code == 502 or read_error): - logger.warning("API request failed. Retrying in 5 seconds") - retries += 1 - time.sleep(5) - request = _construct_request( - request_per_page, - query_args, - next_url or template, - auth, - as_app=args.as_app, - fine=True if args.token_fine is not None else False, - ) # noqa - r, errors = _get_response(request, auth, next_url or template) - - status_code = int(r.getcode()) - try: - response = json.loads(r.read().decode("utf-8")) - read_error = False - except IncompleteRead: - logger.warning("Incomplete read error detected") - read_error = True - except json.decoder.JSONDecodeError: - logger.warning("JSON decode error detected") - read_error = True - except TimeoutError: - logger.warning("Tiemout error detected") - read_error = True - - if status_code != 200: - template = "API request returned HTTP {0}: {1}" - errors.append(template.format(status_code, r.reason)) - raise Exception(", ".join(errors)) - - if read_error: - template = "API request problem reading response for {0}" - errors.append(template.format(request)) - raise Exception(", ".join(errors)) - - if len(errors) == 0: - if type(response) is list: - for resp in response: - yield resp - # Parse Link header for next page URL (cursor-based pagination) - link_header = r.headers.get("Link", "") - next_url = None - if link_header: - # Parse Link header: ; rel="next" - for link in link_header.split(","): - if 'rel="next"' in link: - next_url = link[link.find("<") + 1 : link.find(">")] - break - if not next_url: - break - elif type(response) is dict and single_request: - yield response + http_response = make_request_with_retry(request, auth) + + match http_response.getcode(): + case 200: + # Success - Parse JSON response + try: + response = json.loads(http_response.read().decode("utf-8")) + break # Exit retry loop and handle the data returned + except ( + IncompleteRead, + json.decoder.JSONDecodeError, + TimeoutError, + ) as e: + logger.warning(f"{type(e).__name__} reading response") + if attempt < MAX_RETRIES - 1: + delay = calculate_retry_delay(attempt, {}) + logger.warning( + f"Retrying in {delay:.1f}s (attempt {attempt + 1}/{MAX_RETRIES})" + ) + time.sleep(delay) + continue # Next retry attempt + + case 451: + # DMCA takedown - extract URL if available, then raise + dmca_url = None + try: + response_data = json.loads( + http_response.read().decode("utf-8") + ) + dmca_url = response_data.get("block", {}).get("html_url") + except Exception: + pass + raise RepositoryUnavailableError( + "Repository unavailable due to legal reasons (HTTP 451)", + dmca_url=dmca_url, + ) + + case _: + raise Exception( + f"API request returned HTTP {http_response.getcode()}: {http_response.reason}" + ) + else: + logger.error( + f"Failed to read response after {MAX_RETRIES} attempts for {next_url or template}" + ) + raise Exception( + f"Failed to read response after {MAX_RETRIES} attempts for {next_url or template}" + ) + + # SECOND: Process and paginate - if len(errors) > 0: - raise Exception(", ".join(errors)) + # Pause before next request if rate limit is low + if ( + remaining := int(http_response.headers.get("x-ratelimit-remaining", 0)) + ) <= (args.throttle_limit or 0): + if args.throttle_limit: + logger.info( + f"Throttling: {remaining} requests left, pausing {args.throttle_pause}s" + ) + time.sleep(args.throttle_pause) - if single_request: - break + # Yield results + if isinstance(response, list): + yield from response + elif isinstance(response, dict): + yield response + # Check for more pages + if not paginated or not ( + next_url := _extract_next_page_url( + http_response.headers.get("Link", "") + ) + ): + break # No more data -def retrieve_data(args, template, query_args=None, single_request=False): - return list(retrieve_data_gen(args, template, query_args, single_request)) + return list(fetch_all()) -def get_query_args(query_args=None): - if not query_args: - query_args = {} - return query_args +def make_request_with_retry(request, auth): + """Make HTTP request with automatic retry for transient errors.""" + def is_retryable_status(status_code, headers): + # Server errors are always retryable + if status_code in (500, 502, 503, 504): + return True + # Rate limit (403/429) is retryable if limit exhausted + if status_code in (403, 429): + return int(headers.get("x-ratelimit-remaining", 1)) < 1 + return False -def _get_response(request, auth, template): - retry_timeout = 3 - errors = [] - # We'll make requests in a loop so we can - # delay and retry in the case of rate-limiting - while True: - should_continue = False + for attempt in range(MAX_RETRIES): try: - r = urlopen(request, context=https_ctx) + return urlopen(request, context=https_ctx) + except HTTPError as exc: - errors, should_continue = _request_http_error(exc, auth, errors) # noqa - r = exc - except URLError as e: - logger.warning(e.reason) - should_continue, retry_timeout = _request_url_error(template, retry_timeout) - if not should_continue: - raise - except socket.error as e: - logger.warning(e.strerror) - should_continue, retry_timeout = _request_url_error(template, retry_timeout) - if not should_continue: + # HTTPError can be used as a response-like object + if not is_retryable_status(exc.code, exc.headers): + raise # Non-retryable error + + if attempt >= MAX_RETRIES - 1: + logger.error(f"HTTP {exc.code} failed after {MAX_RETRIES} attempts") raise - if should_continue: - continue + delay = calculate_retry_delay(attempt, exc.headers) + logger.warning( + f"HTTP {exc.code}, retrying in {delay:.1f}s " + f"(attempt {attempt + 1}/{MAX_RETRIES})" + ) + if auth is None and exc.code in (403, 429): + logger.info("Hint: Authenticate to raise your GitHub rate limit") + time.sleep(delay) - break - return r, errors + except (URLError, socket.error) as e: + if attempt >= MAX_RETRIES - 1: + logger.error(f"Connection error failed after {MAX_RETRIES} attempts: {e}") + raise + delay = calculate_retry_delay(attempt, {}) + logger.warning( + f"Connection error: {e}, retrying in {delay:.1f}s " + f"(attempt {attempt + 1}/{MAX_RETRIES})" + ) + time.sleep(delay) + + raise Exception(f"Request failed after {MAX_RETRIES} attempts") # pragma: no cover def _construct_request(per_page, query_args, template, auth, as_app=None, fine=False): @@ -808,52 +821,6 @@ def _construct_request(per_page, query_args, template, auth, as_app=None, fine=F return request -def _request_http_error(exc, auth, errors): - # HTTPError behaves like a Response so we can - # check the status code and headers to see exactly - # what failed. - - should_continue = False - headers = exc.headers - limit_remaining = int(headers.get("x-ratelimit-remaining", 0)) - - if exc.code == 403 and limit_remaining < 1: - # The X-RateLimit-Reset header includes a - # timestamp telling us when the limit will reset - # so we can calculate how long to wait rather - # than inefficiently polling: - gm_now = calendar.timegm(time.gmtime()) - reset = int(headers.get("x-ratelimit-reset", 0)) or gm_now - # We'll never sleep for less than 10 seconds: - delta = max(10, reset - gm_now) - - limit = headers.get("x-ratelimit-limit") - logger.warning( - "Exceeded rate limit of {} requests; waiting {} seconds to reset".format( - limit, delta - ) - ) # noqa - - if auth is None: - logger.info("Hint: Authenticate to raise your GitHub rate limit") - - time.sleep(delta) - should_continue = True - return errors, should_continue - - -def _request_url_error(template, retry_timeout): - # In case of a connection timing out, we can retry a few time - # But we won't crash and not back-up the rest now - logger.info("'{}' timed out".format(template)) - retry_timeout -= 1 - - if retry_timeout >= 0: - return True, retry_timeout - - raise Exception("'{}' timed out to much, skipping!".format(template)) - - class S3HTTPRedirectHandler(HTTPRedirectHandler): """ A subclassed redirect handler for downloading Github assets from S3. @@ -1503,7 +1470,7 @@ def download_attachments( def get_authenticated_user(args): template = "https://{0}/user".format(get_github_api_host(args)) - data = retrieve_data(args, template, single_request=True) + data = retrieve_data(args, template, paginated=False) return data[0] @@ -1517,7 +1484,7 @@ def check_git_lfs_install(): def retrieve_repositories(args, authenticated_user): logger.info("Retrieving repositories") - single_request = False + paginated = True if args.user == authenticated_user["login"]: # we must use the /user/repos API to be able to access private repos template = "https://{0}/user/repos".format(get_github_api_host(args)) @@ -1540,16 +1507,16 @@ def retrieve_repositories(args, authenticated_user): repo_path = args.repository else: repo_path = "{0}/{1}".format(args.user, args.repository) - single_request = True + paginated = False template = "https://{0}/repos/{1}".format(get_github_api_host(args), repo_path) - repos = retrieve_data(args, template, single_request=single_request) + repos = retrieve_data(args, template, paginated=paginated) if args.all_starred: starred_template = "https://{0}/users/{1}/starred".format( get_github_api_host(args), args.user ) - starred_repos = retrieve_data(args, starred_template, single_request=False) + starred_repos = retrieve_data(args, starred_template) # flag each repo as starred for downstream processing for item in starred_repos: item.update({"is_starred": True}) @@ -1559,7 +1526,7 @@ def retrieve_repositories(args, authenticated_user): gists_template = "https://{0}/users/{1}/gists".format( get_github_api_host(args), args.user ) - gists = retrieve_data(args, gists_template, single_request=False) + gists = retrieve_data(args, gists_template) # flag each repo as a gist for downstream processing for item in gists: item.update({"is_gist": True}) @@ -1578,9 +1545,7 @@ def retrieve_repositories(args, authenticated_user): starred_gists_template = "https://{0}/gists/starred".format( get_github_api_host(args) ) - starred_gists = retrieve_data( - args, starred_gists_template, single_request=False - ) + starred_gists = retrieve_data(args, starred_gists_template) # flag each repo as a starred gist for downstream processing for item in starred_gists: item.update({"is_gist": True, "is_starred": True}) @@ -1849,14 +1814,14 @@ def backup_pulls(args, repo_cwd, repository, repos_template): pull_states = ["open", "closed"] for pull_state in pull_states: query_args["state"] = pull_state - _pulls = retrieve_data_gen(args, _pulls_template, query_args=query_args) + _pulls = retrieve_data(args, _pulls_template, query_args=query_args) for pull in _pulls: if args.since and pull["updated_at"] < args.since: break if not args.since or pull["updated_at"] >= args.since: pulls[pull["number"]] = pull else: - _pulls = retrieve_data_gen(args, _pulls_template, query_args=query_args) + _pulls = retrieve_data(args, _pulls_template, query_args=query_args) for pull in _pulls: if args.since and pull["updated_at"] < args.since: break @@ -1864,7 +1829,7 @@ def backup_pulls(args, repo_cwd, repository, repos_template): pulls[pull["number"]] = retrieve_data( args, _pulls_template + "/{}".format(pull["number"]), - single_request=True, + paginated=False, )[0] logger.info("Saving {0} pull requests to disk".format(len(list(pulls.keys())))) diff --git a/tests/test_http_451.py b/tests/test_http_451.py index 7feca1d..51218d2 100644 --- a/tests/test_http_451.py +++ b/tests/test_http_451.py @@ -13,7 +13,6 @@ class TestHTTP451Exception: def test_repository_unavailable_error_raised(self): """HTTP 451 should raise RepositoryUnavailableError with DMCA URL.""" - # Create mock args args = Mock() args.as_app = False args.token_fine = None @@ -25,7 +24,6 @@ def test_repository_unavailable_error_raised(self): args.throttle_limit = None args.throttle_pause = 0 - # Mock HTTPError 451 response mock_response = Mock() mock_response.getcode.return_value = 451 @@ -41,14 +39,10 @@ def test_repository_unavailable_error_raised(self): mock_response.headers = {"x-ratelimit-remaining": "5000"} mock_response.reason = "Unavailable For Legal Reasons" - def mock_get_response(request, auth, template): - return mock_response, [] - - with patch("github_backup.github_backup._get_response", side_effect=mock_get_response): + with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response): with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info: - list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues")) + github_backup.retrieve_data(args, "https://api.github.com/repos/test/dmca/issues") - # Check exception has DMCA URL assert exc_info.value.dmca_url == "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md" assert "451" in str(exc_info.value) @@ -71,14 +65,10 @@ def test_repository_unavailable_error_without_dmca_url(self): mock_response.headers = {"x-ratelimit-remaining": "5000"} mock_response.reason = "Unavailable For Legal Reasons" - def mock_get_response(request, auth, template): - return mock_response, [] - - with patch("github_backup.github_backup._get_response", side_effect=mock_get_response): + with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response): with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info: - list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues")) + github_backup.retrieve_data(args, "https://api.github.com/repos/test/dmca/issues") - # Exception raised even without DMCA URL assert exc_info.value.dmca_url is None assert "451" in str(exc_info.value) @@ -101,42 +91,9 @@ def test_repository_unavailable_error_with_malformed_json(self): mock_response.headers = {"x-ratelimit-remaining": "5000"} mock_response.reason = "Unavailable For Legal Reasons" - def mock_get_response(request, auth, template): - return mock_response, [] - - with patch("github_backup.github_backup._get_response", side_effect=mock_get_response): + with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response): with pytest.raises(github_backup.RepositoryUnavailableError): - list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/dmca/issues")) - - def test_other_http_errors_unchanged(self): - """Other HTTP errors should still raise generic Exception.""" - args = Mock() - args.as_app = False - args.token_fine = None - args.token_classic = None - args.username = None - args.password = None - args.osx_keychain_item_name = None - args.osx_keychain_item_account = None - args.throttle_limit = None - args.throttle_pause = 0 - - mock_response = Mock() - mock_response.getcode.return_value = 404 - mock_response.read.return_value = b'{"message": "Not Found"}' - mock_response.headers = {"x-ratelimit-remaining": "5000"} - mock_response.reason = "Not Found" - - def mock_get_response(request, auth, template): - return mock_response, [] - - with patch("github_backup.github_backup._get_response", side_effect=mock_get_response): - # Should raise generic Exception, not RepositoryUnavailableError - with pytest.raises(Exception) as exc_info: - list(github_backup.retrieve_data_gen(args, "https://api.github.com/repos/test/notfound/issues")) - - assert not isinstance(exc_info.value, github_backup.RepositoryUnavailableError) - assert "404" in str(exc_info.value) + github_backup.retrieve_data(args, "https://api.github.com/repos/test/dmca/issues") if __name__ == "__main__": diff --git a/tests/test_pagination.py b/tests/test_pagination.py index 0d5bd82..75dfccd 100644 --- a/tests/test_pagination.py +++ b/tests/test_pagination.py @@ -40,7 +40,7 @@ def headers(self): @pytest.fixture def mock_args(): - """Mock args for retrieve_data_gen.""" + """Mock args for retrieve_data.""" args = Mock() args.as_app = False args.token_fine = None @@ -77,10 +77,8 @@ def mock_urlopen(request, *args, **kwargs): return responses[len(requests_made) - 1] with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): - results = list( - github_backup.retrieve_data_gen( - mock_args, "https://api.github.com/repos/owner/repo/issues" - ) + results = github_backup.retrieve_data( + mock_args, "https://api.github.com/repos/owner/repo/issues" ) # Verify all items retrieved and cursor was used in second request @@ -112,10 +110,8 @@ def mock_urlopen(request, *args, **kwargs): return responses[len(requests_made) - 1] with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): - results = list( - github_backup.retrieve_data_gen( - mock_args, "https://api.github.com/repos/owner/repo/pulls" - ) + results = github_backup.retrieve_data( + mock_args, "https://api.github.com/repos/owner/repo/pulls" ) # Verify all items retrieved and page parameter was used (not cursor) @@ -142,10 +138,8 @@ def mock_urlopen(request, *args, **kwargs): return responses[len(requests_made) - 1] with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): - results = list( - github_backup.retrieve_data_gen( - mock_args, "https://api.github.com/repos/owner/repo/labels" - ) + results = github_backup.retrieve_data( + mock_args, "https://api.github.com/repos/owner/repo/labels" ) # Verify pagination stopped after first request diff --git a/tests/test_retrieve_data.py b/tests/test_retrieve_data.py new file mode 100644 index 0000000..c358ff0 --- /dev/null +++ b/tests/test_retrieve_data.py @@ -0,0 +1,365 @@ +"""Tests for retrieve_data function.""" + +import json +import socket +from unittest.mock import Mock, patch +from urllib.error import HTTPError, URLError + +import pytest + +from github_backup import github_backup +from github_backup.github_backup import ( + MAX_RETRIES, + calculate_retry_delay, + make_request_with_retry, +) + + +class TestCalculateRetryDelay: + def test_respects_retry_after_header(self): + headers = {'retry-after': '30'} + assert calculate_retry_delay(0, headers) == 30 + + def test_respects_rate_limit_reset(self): + import time + import calendar + # Set reset time 60 seconds in the future + future_reset = calendar.timegm(time.gmtime()) + 60 + headers = { + 'x-ratelimit-remaining': '0', + 'x-ratelimit-reset': str(future_reset) + } + delay = calculate_retry_delay(0, headers) + # Should be approximately 60 seconds (with some tolerance for execution time) + assert 55 <= delay <= 65 + + def test_exponential_backoff(self): + delay_0 = calculate_retry_delay(0, {}) + delay_1 = calculate_retry_delay(1, {}) + delay_2 = calculate_retry_delay(2, {}) + # Base delay is 1s, so delays should be roughly 1, 2, 4 (plus jitter) + assert 0.9 <= delay_0 <= 1.2 # ~1s + up to 10% jitter + assert 1.8 <= delay_1 <= 2.4 # ~2s + up to 10% jitter + assert 3.6 <= delay_2 <= 4.8 # ~4s + up to 10% jitter + + def test_max_delay_cap(self): + # Very high attempt number should not exceed 120s + jitter + delay = calculate_retry_delay(100, {}) + assert delay <= 120 * 1.1 # 120s max + 10% jitter + + def test_minimum_rate_limit_delay(self): + import time + import calendar + # Set reset time in the past (already reset) + past_reset = calendar.timegm(time.gmtime()) - 100 + headers = { + 'x-ratelimit-remaining': '0', + 'x-ratelimit-reset': str(past_reset) + } + delay = calculate_retry_delay(0, headers) + # Should be minimum 10 seconds even if reset time is in past + assert delay >= 10 + + +class TestRetrieveDataRetry: + """Tests for retry behavior in retrieve_data.""" + + @pytest.fixture + def mock_args(self): + args = Mock() + args.as_app = False + args.token_fine = None + args.token_classic = "fake_token" + args.username = None + args.password = None + args.osx_keychain_item_name = None + args.osx_keychain_item_account = None + args.throttle_limit = None + args.throttle_pause = 0 + return args + + def test_json_parse_error_retries_and_fails(self, mock_args): + """HTTP 200 with invalid JSON should retry and eventually fail.""" + mock_response = Mock() + mock_response.getcode.return_value = 200 + mock_response.read.return_value = b"not valid json {" + mock_response.headers = {"x-ratelimit-remaining": "5000"} + + call_count = 0 + + def mock_make_request(*args, **kwargs): + nonlocal call_count + call_count += 1 + return mock_response + + with patch("github_backup.github_backup.make_request_with_retry", side_effect=mock_make_request): + with patch("github_backup.github_backup.calculate_retry_delay", return_value=0): # No delay in tests + with pytest.raises(Exception) as exc_info: + github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/repo/issues") + + assert "Failed to read response after" in str(exc_info.value) + assert call_count == MAX_RETRIES + + def test_json_parse_error_recovers_on_retry(self, mock_args): + """HTTP 200 with invalid JSON should succeed if retry returns valid JSON.""" + bad_response = Mock() + bad_response.getcode.return_value = 200 + bad_response.read.return_value = b"not valid json {" + bad_response.headers = {"x-ratelimit-remaining": "5000"} + + good_response = Mock() + good_response.getcode.return_value = 200 + good_response.read.return_value = json.dumps([{"id": 1}]).encode("utf-8") + good_response.headers = {"x-ratelimit-remaining": "5000", "Link": ""} + + responses = [bad_response, bad_response, good_response] + call_count = 0 + + def mock_make_request(*args, **kwargs): + nonlocal call_count + result = responses[call_count] + call_count += 1 + return result + + with patch("github_backup.github_backup.make_request_with_retry", side_effect=mock_make_request): + with patch("github_backup.github_backup.calculate_retry_delay", return_value=0): + result = github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/repo/issues") + + assert result == [{"id": 1}] + assert call_count == 3 # Failed twice, succeeded on third + + def test_http_error_raises_exception(self, mock_args): + """Non-success HTTP status codes should raise Exception.""" + mock_response = Mock() + mock_response.getcode.return_value = 404 + mock_response.read.return_value = b'{"message": "Not Found"}' + mock_response.headers = {"x-ratelimit-remaining": "5000"} + mock_response.reason = "Not Found" + + with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response): + with pytest.raises(Exception) as exc_info: + github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/notfound/issues") + + assert not isinstance(exc_info.value, github_backup.RepositoryUnavailableError) + assert "404" in str(exc_info.value) + + +class TestMakeRequestWithRetry: + """Tests for HTTP error retry behavior in make_request_with_retry.""" + + def test_502_error_retries_and_succeeds(self): + """HTTP 502 should retry and succeed if subsequent request works.""" + good_response = Mock() + good_response.read.return_value = b'{"ok": true}' + + call_count = 0 + fail_count = MAX_RETRIES - 1 # Fail all but last attempt + + def mock_urlopen(*args, **kwargs): + nonlocal call_count + call_count += 1 + if call_count <= fail_count: + raise HTTPError( + url="https://api.github.com/test", + code=502, + msg="Bad Gateway", + hdrs={"x-ratelimit-remaining": "5000"}, + fp=None, + ) + return good_response + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + with patch("github_backup.github_backup.calculate_retry_delay", return_value=0): + result = make_request_with_retry(Mock(), None) + + assert result == good_response + assert call_count == MAX_RETRIES + + def test_503_error_retries_until_exhausted(self): + """HTTP 503 should retry MAX_RETRIES times then raise.""" + call_count = 0 + + def mock_urlopen(*args, **kwargs): + nonlocal call_count + call_count += 1 + raise HTTPError( + url="https://api.github.com/test", + code=503, + msg="Service Unavailable", + hdrs={"x-ratelimit-remaining": "5000"}, + fp=None, + ) + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + with patch("github_backup.github_backup.calculate_retry_delay", return_value=0): + with pytest.raises(HTTPError) as exc_info: + make_request_with_retry(Mock(), None) + + assert exc_info.value.code == 503 + assert call_count == MAX_RETRIES + + def test_404_error_not_retried(self): + """HTTP 404 should not be retried - raise immediately.""" + call_count = 0 + + def mock_urlopen(*args, **kwargs): + nonlocal call_count + call_count += 1 + raise HTTPError( + url="https://api.github.com/test", + code=404, + msg="Not Found", + hdrs={"x-ratelimit-remaining": "5000"}, + fp=None, + ) + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + with pytest.raises(HTTPError) as exc_info: + make_request_with_retry(Mock(), None) + + assert exc_info.value.code == 404 + assert call_count == 1 # No retries + + def test_rate_limit_403_retried_when_remaining_zero(self): + """HTTP 403 with x-ratelimit-remaining=0 should retry.""" + good_response = Mock() + call_count = 0 + + def mock_urlopen(*args, **kwargs): + nonlocal call_count + call_count += 1 + if call_count == 1: + raise HTTPError( + url="https://api.github.com/test", + code=403, + msg="Forbidden", + hdrs={"x-ratelimit-remaining": "0"}, + fp=None, + ) + return good_response + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + with patch("github_backup.github_backup.calculate_retry_delay", return_value=0): + result = make_request_with_retry(Mock(), None) + + assert result == good_response + assert call_count == 2 + + def test_403_not_retried_when_remaining_nonzero(self): + """HTTP 403 with x-ratelimit-remaining>0 should not retry (permission error).""" + call_count = 0 + + def mock_urlopen(*args, **kwargs): + nonlocal call_count + call_count += 1 + raise HTTPError( + url="https://api.github.com/test", + code=403, + msg="Forbidden", + hdrs={"x-ratelimit-remaining": "5000"}, + fp=None, + ) + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + with pytest.raises(HTTPError) as exc_info: + make_request_with_retry(Mock(), None) + + assert exc_info.value.code == 403 + assert call_count == 1 # No retries + + def test_connection_error_retries_and_succeeds(self): + """URLError (connection error) should retry and succeed if subsequent request works.""" + good_response = Mock() + call_count = 0 + fail_count = MAX_RETRIES - 1 # Fail all but last attempt + + def mock_urlopen(*args, **kwargs): + nonlocal call_count + call_count += 1 + if call_count <= fail_count: + raise URLError("Connection refused") + return good_response + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + with patch("github_backup.github_backup.calculate_retry_delay", return_value=0): + result = make_request_with_retry(Mock(), None) + + assert result == good_response + assert call_count == MAX_RETRIES + + def test_socket_error_retries_until_exhausted(self): + """socket.error should retry MAX_RETRIES times then raise.""" + call_count = 0 + + def mock_urlopen(*args, **kwargs): + nonlocal call_count + call_count += 1 + raise socket.error("Connection reset by peer") + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + with patch("github_backup.github_backup.calculate_retry_delay", return_value=0): + with pytest.raises(socket.error): + make_request_with_retry(Mock(), None) + + assert call_count == MAX_RETRIES + + +class TestRetrieveDataThrottling: + """Tests for throttling behavior in retrieve_data.""" + + @pytest.fixture + def mock_args(self): + args = Mock() + args.as_app = False + args.token_fine = None + args.token_classic = "fake_token" + args.username = None + args.password = None + args.osx_keychain_item_name = None + args.osx_keychain_item_account = None + args.throttle_limit = 10 # Throttle when remaining <= 10 + args.throttle_pause = 5 # Pause 5 seconds + return args + + def test_throttling_pauses_when_rate_limit_low(self, mock_args): + """Should pause when x-ratelimit-remaining is at or below throttle_limit.""" + mock_response = Mock() + mock_response.getcode.return_value = 200 + mock_response.read.return_value = json.dumps([{"id": 1}]).encode("utf-8") + mock_response.headers = {"x-ratelimit-remaining": "5", "Link": ""} # Below throttle_limit + + with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response): + with patch("github_backup.github_backup.time.sleep") as mock_sleep: + github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/repo/issues") + + mock_sleep.assert_called_once_with(5) # throttle_pause value + + +class TestRetrieveDataSingleItem: + """Tests for single item (dict) responses in retrieve_data.""" + + @pytest.fixture + def mock_args(self): + args = Mock() + args.as_app = False + args.token_fine = None + args.token_classic = "fake_token" + args.username = None + args.password = None + args.osx_keychain_item_name = None + args.osx_keychain_item_account = None + args.throttle_limit = None + args.throttle_pause = 0 + return args + + def test_dict_response_returned_as_list(self, mock_args): + """Single dict response should be returned as a list with one item.""" + mock_response = Mock() + mock_response.getcode.return_value = 200 + mock_response.read.return_value = json.dumps({"login": "testuser", "id": 123}).encode("utf-8") + mock_response.headers = {"x-ratelimit-remaining": "5000", "Link": ""} + + with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response): + result = github_backup.retrieve_data(mock_args, "https://api.github.com/user") + + assert result == [{"login": "testuser", "id": 123}] From c70cc43f5774fd2cbbff126255604b2e159c3cc5 Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Tue, 16 Dec 2025 15:17:23 +0000 Subject: [PATCH 082/148] Release version 0.58.0 --- CHANGES.rst | 31 ++++++++++++++++++++++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 31 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 1a8809e..697b39f 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,9 +1,38 @@ Changelog ========= -0.57.0 (2025-12-12) +0.58.0 (2025-12-16) ------------------- ------------------------ +- Fix retry logic for HTTP 5xx errors and network failures. [Rodos] + + Refactors error handling to retry all 5xx errors (not just 502), network errors (URLError, socket.error, IncompleteRead), and JSON parse errors with exponential backoff and jitter. Respects retry-after and rate limit headers per GitHub API requirements. Consolidates retry logic into make_request_with_retry() wrapper and adds clear logging for retry attempts and failures. Removes dead code from 2016 (errors list, _request_http_error, _request_url_error) that was intentionally disabled in commit 1e5a9048 to fix #29. + + Fixes #140, #110, #138 +- Chore: remove transitive deps from release-requirements.txt. [Rodos] +- Chore(deps): bump urllib3 in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [urllib3](https://github.com/urllib3/urllib3). + + + Updates `urllib3` from 2.6.1 to 2.6.2 + - [Release notes](https://github.com/urllib3/urllib3/releases) + - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) + - [Commits](https://github.com/urllib3/urllib3/compare/2.6.1...2.6.2) + + --- + updated-dependencies: + - dependency-name: urllib3 + dependency-version: 2.6.2 + dependency-type: direct:production + update-type: version-update:semver-patch + dependency-group: python-packages + ... + + +0.57.0 (2025-12-12) +------------------- - Add GitHub Apps documentation and remove outdated header. [Rodos] - Add GitHub Apps authentication section with setup steps diff --git a/github_backup/__init__.py b/github_backup/__init__.py index 6e6e624..45dbfca 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.57.0" +__version__ = "0.58.0" From db36c3c137ced1469c8ccf6f5619d10bb04d169a Mon Sep 17 00:00:00 2001 From: Rodos Date: Sat, 20 Dec 2025 19:16:11 +1100 Subject: [PATCH 083/148] chore: remove deprecated -u/-p password authentication options --- README.rst | 90 +++++++++++++++++----------------- github_backup/cli.py | 2 +- github_backup/github_backup.py | 23 --------- tests/test_all_starred.py | 2 - tests/test_attachments.py | 2 - tests/test_http_451.py | 6 --- tests/test_pagination.py | 2 - tests/test_retrieve_data.py | 6 --- tests/test_skip_assets_on.py | 2 - 9 files changed, 47 insertions(+), 88 deletions(-) diff --git a/README.rst b/README.rst index e4300a7..943f8ec 100644 --- a/README.rst +++ b/README.rst @@ -36,23 +36,26 @@ Show the CLI help output:: CLI Help output:: - github-backup [-h] [-u USERNAME] [-p PASSWORD] [-t TOKEN_CLASSIC] - [-f TOKEN_FINE] [--as-app] [-o OUTPUT_DIRECTORY] - [-l LOG_LEVEL] [-i] [--starred] [--all-starred] - [--watched] [--followers] [--following] [--all] [--issues] - [--issue-comments] [--issue-events] [--pulls] + github-backup [-h] [-t TOKEN_CLASSIC] [-f TOKEN_FINE] [-q] [--as-app] + [-o OUTPUT_DIRECTORY] [-l LOG_LEVEL] [-i] + [--incremental-by-files] + [--starred] [--all-starred] + [--watched] [--followers] [--following] [--all] + [--issues] [--issue-comments] [--issue-events] [--pulls] [--pull-comments] [--pull-commits] [--pull-details] [--labels] [--hooks] [--milestones] [--repositories] - [--bare] [--lfs] [--wikis] [--gists] [--starred-gists] - [--skip-archived] [--skip-existing] [-L [LANGUAGES ...]] - [-N NAME_REGEX] [-H GITHUB_HOST] [-O] [-R REPOSITORY] - [-P] [-F] [--prefer-ssh] [-v] + [--bare] [--no-prune] [--lfs] [--wikis] [--gists] + [--starred-gists] [--skip-archived] [--skip-existing] + [-L [LANGUAGES ...]] [-N NAME_REGEX] [-H GITHUB_HOST] + [-O] [-R REPOSITORY] [-P] [-F] [--prefer-ssh] [-v] [--keychain-name OSX_KEYCHAIN_ITEM_NAME] [--keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT] [--releases] [--latest-releases NUMBER_OF_LATEST_RELEASES] - [--skip-prerelease] [--assets] [--skip-assets-on [REPO ...]] - [--attachments] [--exclude [REPOSITORY [REPOSITORY ...]] - [--throttle-limit THROTTLE_LIMIT] [--throttle-pause THROTTLE_PAUSE] + [--skip-prerelease] [--assets] + [--skip-assets-on [SKIP_ASSETS_ON ...]] [--attachments] + [--throttle-limit THROTTLE_LIMIT] + [--throttle-pause THROTTLE_PAUSE] + [--exclude [EXCLUDE ...]] USER Backup a github account @@ -60,27 +63,25 @@ CLI Help output:: positional arguments: USER github username - optional arguments: + options: -h, --help show this help message and exit - -u USERNAME, --username USERNAME - username for basic auth - -p PASSWORD, --password PASSWORD - password for basic auth. If a username is given but - not a password, the password will be prompted for. - -f TOKEN_FINE, --token-fine TOKEN_FINE - fine-grained personal access token or path to token - (file://...) - -t TOKEN_CLASSIC, --token TOKEN_CLASSIC + -t, --token TOKEN_CLASSIC personal access, OAuth, or JSON Web token, or path to token (file://...) + -f, --token-fine TOKEN_FINE + fine-grained personal access token (github_pat_....), + or path to token (file://...) + -q, --quiet supress log messages less severe than warning, e.g. + info --as-app authenticate as github app instead of as a user. - -o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY + -o, --output-directory OUTPUT_DIRECTORY directory at which to backup the repositories - -l LOG_LEVEL, --log-level LOG_LEVEL + -l, --log-level LOG_LEVEL log level to use (default: info, possible levels: debug, info, warning, error, critical) -i, --incremental incremental backup - --incremental-by-files incremental backup using modified time of files + --incremental-by-files + incremental backup based on modification date of files --starred include JSON output of starred repositories in backup --all-starred include starred repositories in backup [*] --watched include JSON output of watched repositories in backup @@ -100,20 +101,22 @@ CLI Help output:: --milestones include milestones in backup --repositories include repository clone in backup --bare clone bare repositories + --no-prune disable prune option for git fetch --lfs clone LFS repositories (requires Git LFS to be installed, https://git-lfs.github.com) [*] --wikis include wiki clone in backup --gists include gists in backup [*] --starred-gists include starred gists in backup [*] + --skip-archived skip project if it is archived --skip-existing skip project if a backup directory exists - -L [LANGUAGES [LANGUAGES ...]], --languages [LANGUAGES [LANGUAGES ...]] + -L, --languages [LANGUAGES ...] only allow these languages - -N NAME_REGEX, --name-regex NAME_REGEX + -N, --name-regex NAME_REGEX python regex to match names against - -H GITHUB_HOST, --github-host GITHUB_HOST + -H, --github-host GITHUB_HOST GitHub Enterprise hostname -O, --organization whether or not this is an organization user - -R REPOSITORY, --repository REPOSITORY + -R, --repository REPOSITORY name of repository to limit backup to -P, --private include private repositories [*] -F, --fork include forked repositories [*] @@ -128,19 +131,16 @@ CLI Help output:: --releases include release information, not including assets or binaries --latest-releases NUMBER_OF_LATEST_RELEASES - include certain number of the latest releases; - only applies if including releases - --skip-prerelease skip prerelease and draft versions; only applies if including releases + include certain number of the latest releases; only + applies if including releases + --skip-prerelease skip prerelease and draft versions; only applies if + including releases --assets include assets alongside release information; only applies if including releases - --skip-assets-on [REPO ...] - skip asset downloads for these repositories (e.g. - --skip-assets-on repo1 owner/repo2) - --attachments download user-attachments from issues and pull requests - to issues/attachments/{issue_number}/ and - pulls/attachments/{pull_number}/ directories - --exclude [REPOSITORY [REPOSITORY ...]] - names of repositories to exclude from backup. + --skip-assets-on [SKIP_ASSETS_ON ...] + skip asset downloads for these repositories + --attachments download user-attachments from issues and pull + requests --throttle-limit THROTTLE_LIMIT start throttling of GitHub API requests after this amount of API requests remain @@ -148,6 +148,8 @@ CLI Help output:: wait this amount of seconds when API request throttling is active (default: 30.0, requires --throttle-limit to be set) + --exclude [EXCLUDE ...] + names of repositories to exclude Usage Details @@ -156,13 +158,13 @@ Usage Details Authentication -------------- -**Password-based authentication** will fail if you have two-factor authentication enabled, and will `be deprecated `_ by 2023 EOY. +GitHub requires token-based authentication for API access. Password authentication was `removed in November 2020 `_. -``--username`` is used for basic password authentication and separate from the positional argument ``USER``, which specifies the user account you wish to back up. +The positional argument ``USER`` specifies the user or organization account you wish to back up. -**Classic tokens** are `slightly less secure `_ as they provide very coarse-grained permissions. +**Fine-grained tokens** (``-f TOKEN_FINE``) are recommended for most use cases, especially long-running backups (e.g. cron jobs), as they provide precise permission control. -If you need authentication for long-running backups (e.g. for a cron job) it is recommended to use **fine-grained personal access token** ``-f TOKEN_FINE``. +**Classic tokens** (``-t TOKEN``) are `slightly less secure `_ as they provide very coarse-grained permissions. Fine Tokens diff --git a/github_backup/cli.py b/github_backup/cli.py index 98f8d4a..54849d4 100644 --- a/github_backup/cli.py +++ b/github_backup/cli.py @@ -43,7 +43,7 @@ def main(): if args.private and not get_auth(args): logger.warning( "The --private flag has no effect without authentication. " - "Use -t/--token, -f/--token-fine, or -u/--username to authenticate." + "Use -t/--token or -f/--token-fine to authenticate." ) if args.quiet: diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 34d529a..d62afc3 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -7,7 +7,6 @@ import calendar import codecs import errno -import getpass import json import logging import os @@ -24,7 +23,6 @@ from datetime import datetime from http.client import IncompleteRead from urllib.error import HTTPError, URLError -from urllib.parse import quote as urlquote from urllib.parse import urlencode, urlparse from urllib.request import HTTPRedirectHandler, Request, build_opener, urlopen @@ -149,17 +147,6 @@ def mask_password(url, secret="*****"): def parse_args(args=None): parser = argparse.ArgumentParser(description="Backup a github account") parser.add_argument("user", metavar="USER", type=str, help="github username") - parser.add_argument( - "-u", "--username", dest="username", help="username for basic auth" - ) - parser.add_argument( - "-p", - "--password", - dest="password", - help="password for basic auth. " - "If a username is given but not a password, the " - "password will be prompted for.", - ) parser.add_argument( "-t", "--token", @@ -533,16 +520,6 @@ def get_auth(args, encode=True, for_git_cli=False): auth = args.token_classic else: auth = "x-access-token:" + args.token_classic - elif args.username: - if not args.password: - args.password = getpass.getpass() - if encode: - password = args.password - else: - password = urlquote(args.password) - auth = args.username + ":" + password - elif args.password: - raise Exception("You must specify a username for basic auth") if not auth: return None diff --git a/tests/test_all_starred.py b/tests/test_all_starred.py index f59a67e..0fab048 100644 --- a/tests/test_all_starred.py +++ b/tests/test_all_starred.py @@ -46,8 +46,6 @@ def _create_mock_args(self, **overrides): args.prefer_ssh = False args.token_classic = None args.token_fine = None - args.username = None - args.password = None args.as_app = False args.osx_keychain_item_name = None args.osx_keychain_item_account = None diff --git a/tests/test_attachments.py b/tests/test_attachments.py index 07c1b33..b338caf 100644 --- a/tests/test_attachments.py +++ b/tests/test_attachments.py @@ -24,8 +24,6 @@ def attachment_test_setup(tmp_path): args.as_app = False args.token_fine = None args.token_classic = None - args.username = None - args.password = None args.osx_keychain_item_name = None args.osx_keychain_item_account = None args.user = "testuser" diff --git a/tests/test_http_451.py b/tests/test_http_451.py index 51218d2..d53d65c 100644 --- a/tests/test_http_451.py +++ b/tests/test_http_451.py @@ -17,8 +17,6 @@ def test_repository_unavailable_error_raised(self): args.as_app = False args.token_fine = None args.token_classic = None - args.username = None - args.password = None args.osx_keychain_item_name = None args.osx_keychain_item_account = None args.throttle_limit = None @@ -52,8 +50,6 @@ def test_repository_unavailable_error_without_dmca_url(self): args.as_app = False args.token_fine = None args.token_classic = None - args.username = None - args.password = None args.osx_keychain_item_name = None args.osx_keychain_item_account = None args.throttle_limit = None @@ -78,8 +74,6 @@ def test_repository_unavailable_error_with_malformed_json(self): args.as_app = False args.token_fine = None args.token_classic = None - args.username = None - args.password = None args.osx_keychain_item_name = None args.osx_keychain_item_account = None args.throttle_limit = None diff --git a/tests/test_pagination.py b/tests/test_pagination.py index 75dfccd..831b913 100644 --- a/tests/test_pagination.py +++ b/tests/test_pagination.py @@ -45,8 +45,6 @@ def mock_args(): args.as_app = False args.token_fine = None args.token_classic = "fake_token" - args.username = None - args.password = None args.osx_keychain_item_name = None args.osx_keychain_item_account = None args.throttle_limit = None diff --git a/tests/test_retrieve_data.py b/tests/test_retrieve_data.py index c358ff0..adb1152 100644 --- a/tests/test_retrieve_data.py +++ b/tests/test_retrieve_data.py @@ -70,8 +70,6 @@ def mock_args(self): args.as_app = False args.token_fine = None args.token_classic = "fake_token" - args.username = None - args.password = None args.osx_keychain_item_name = None args.osx_keychain_item_account = None args.throttle_limit = None @@ -313,8 +311,6 @@ def mock_args(self): args.as_app = False args.token_fine = None args.token_classic = "fake_token" - args.username = None - args.password = None args.osx_keychain_item_name = None args.osx_keychain_item_account = None args.throttle_limit = 10 # Throttle when remaining <= 10 @@ -344,8 +340,6 @@ def mock_args(self): args.as_app = False args.token_fine = None args.token_classic = "fake_token" - args.username = None - args.password = None args.osx_keychain_item_name = None args.osx_keychain_item_account = None args.throttle_limit = None diff --git a/tests/test_skip_assets_on.py b/tests/test_skip_assets_on.py index 2437e05..ce28287 100644 --- a/tests/test_skip_assets_on.py +++ b/tests/test_skip_assets_on.py @@ -48,8 +48,6 @@ def _create_mock_args(self, **overrides): args.prefer_ssh = False args.token_classic = "test-token" args.token_fine = None - args.username = None - args.password = None args.as_app = False args.osx_keychain_item_name = None args.osx_keychain_item_account = None From 3c43e0f481e6f4a9f5885ca92e9c87552f3010ee Mon Sep 17 00:00:00 2001 From: Rodos Date: Sat, 20 Dec 2025 18:04:25 +1100 Subject: [PATCH 084/148] Add --starred-skip-size-over flag to limit starred repo size (#108) Allow users to skip starred repositories exceeding a size threshold when using --all-starred. Size is specified in MB and checked against the GitHub API's repository size field. - Only affects starred repos; user's own repos always included - Logs each skipped repo with name and size Closes #108 --- README.rst | 20 ++- github_backup/github_backup.py | 26 ++++ tests/test_case_sensitivity.py | 6 + tests/test_starred_skip_size_over.py | 224 +++++++++++++++++++++++++++ 4 files changed, 272 insertions(+), 4 deletions(-) create mode 100644 tests/test_starred_skip_size_over.py diff --git a/README.rst b/README.rst index 943f8ec..ffa80ac 100644 --- a/README.rst +++ b/README.rst @@ -39,7 +39,7 @@ CLI Help output:: github-backup [-h] [-t TOKEN_CLASSIC] [-f TOKEN_FINE] [-q] [--as-app] [-o OUTPUT_DIRECTORY] [-l LOG_LEVEL] [-i] [--incremental-by-files] - [--starred] [--all-starred] + [--starred] [--all-starred] [--starred-skip-size-over MB] [--watched] [--followers] [--following] [--all] [--issues] [--issue-comments] [--issue-events] [--pulls] [--pull-comments] [--pull-commits] [--pull-details] @@ -84,6 +84,8 @@ CLI Help output:: incremental backup based on modification date of files --starred include JSON output of starred repositories in backup --all-starred include starred repositories in backup [*] + --starred-skip-size-over MB + skip starred repositories larger than this size in MB --watched include JSON output of watched repositories in backup --followers include JSON output of followers in backup --following include JSON output of following users in backup @@ -292,10 +294,20 @@ All is not everything The ``--all`` argument does not include: cloning private repos (``-P, --private``), cloning forks (``-F, --fork``), cloning starred repositories (``--all-starred``), ``--pull-details``, cloning LFS repositories (``--lfs``), cloning gists (``--gists``) or cloning starred gist repos (``--starred-gists``). See examples for more. -Cloning all starred size ------------------------- +Starred repository size +----------------------- + +Using the ``--all-starred`` argument to clone all starred repositories may use a large amount of storage space. + +To see your starred repositories sorted by size (requires `GitHub CLI `_):: + + gh api user/starred --paginate --jq 'sort_by(-.size)[]|"\(.full_name) \(.size/1024|round)MB"' + +To limit which starred repositories are cloned, use ``--starred-skip-size-over SIZE`` where SIZE is in MB. For example, ``--starred-skip-size-over 500`` will skip any starred repository where the git repository size (code and history) exceeds 500 MB. Note that this size limit only applies to the repository itself, not issues, release assets or other metadata. This filter only affects starred repositories; your own repositories are always included regardless of size. + +For finer control, avoid using ``--assets`` with starred repos, or use ``--skip-assets-on`` for specific repositories with large release binaries. -Using the ``--all-starred`` argument to clone all starred repositories may use a large amount of storage space, especially if ``--all`` or more arguments are used. e.g. commonly starred repos can have tens of thousands of issues, many large assets and the repo itself etc. Consider just storing links to starred repos in JSON format with ``--starred``. +Alternatively, consider just storing links to starred repos in JSON format with ``--starred``. Incremental Backup ------------------ diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index d62afc3..1d4e354 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -211,6 +211,13 @@ def parse_args(args=None): dest="all_starred", help="include starred repositories in backup [*]", ) + parser.add_argument( + "--starred-skip-size-over", + type=int, + metavar="MB", + dest="starred_skip_size_over", + help="skip starred repositories larger than this size in MB", + ) parser.add_argument( "--watched", action="store_true", @@ -1570,6 +1577,25 @@ def filter_repositories(args, unfiltered_repositories): ] if args.skip_archived: repositories = [r for r in repositories if not r.get("archived")] + if args.starred_skip_size_over is not None: + if args.starred_skip_size_over <= 0: + logger.warning( + "--starred-skip-size-over must be greater than 0, ignoring" + ) + else: + size_limit_kb = args.starred_skip_size_over * 1024 + filtered = [] + for r in repositories: + if r.get("is_starred") and r.get("size", 0) > size_limit_kb: + size_mb = r.get("size", 0) / 1024 + logger.info( + "Skipping starred repo {0} ({1:.0f} MB) due to --starred-skip-size-over {2}".format( + r.get("full_name", r.get("name")), size_mb, args.starred_skip_size_over + ) + ) + else: + filtered.append(r) + repositories = filtered if args.exclude: repositories = [ r for r in repositories if "name" not in r or r["name"] not in args.exclude diff --git a/tests/test_case_sensitivity.py b/tests/test_case_sensitivity.py index 1398d0d..058a7df 100644 --- a/tests/test_case_sensitivity.py +++ b/tests/test_case_sensitivity.py @@ -26,6 +26,8 @@ def test_filter_repositories_case_insensitive_user(self): args.private = False args.public = False args.all = True + args.skip_archived = False + args.starred_skip_size_over = None # Simulate GitHub API returning canonical case repos = [ @@ -65,6 +67,8 @@ def test_filter_repositories_case_insensitive_org(self): args.private = False args.public = False args.all = True + args.skip_archived = False + args.starred_skip_size_over = None repos = [ { @@ -93,6 +97,8 @@ def test_filter_repositories_case_variations(self): args.private = False args.public = False args.all = True + args.skip_archived = False + args.starred_skip_size_over = None repos = [ {"name": "repo1", "owner": {"login": "test-user"}, "private": False, "fork": False}, diff --git a/tests/test_starred_skip_size_over.py b/tests/test_starred_skip_size_over.py new file mode 100644 index 0000000..2deb72a --- /dev/null +++ b/tests/test_starred_skip_size_over.py @@ -0,0 +1,224 @@ +"""Tests for --starred-skip-size-over flag behavior (issue #108).""" + +import pytest +from unittest.mock import Mock + +from github_backup import github_backup + + +class TestStarredSkipSizeOver: + """Test suite for --starred-skip-size-over flag. + + Issue #108: Allow restricting size of starred repositories before cloning. + The size is based on the GitHub API's 'size' field (in KB), but the CLI + argument accepts MB for user convenience. + """ + + def _create_mock_args(self, **overrides): + """Create a mock args object with sensible defaults.""" + args = Mock() + args.user = "testuser" + args.repository = None + args.name_regex = None + args.languages = None + args.fork = False + args.private = False + args.skip_archived = False + args.starred_skip_size_over = None + args.exclude = None + + for key, value in overrides.items(): + setattr(args, key, value) + + return args + + +class TestStarredSkipSizeOverArgumentParsing(TestStarredSkipSizeOver): + """Tests for --starred-skip-size-over argument parsing.""" + + def test_starred_skip_size_over_not_set_defaults_to_none(self): + """When --starred-skip-size-over is not specified, it should default to None.""" + args = github_backup.parse_args(["testuser"]) + assert args.starred_skip_size_over is None + + def test_starred_skip_size_over_accepts_integer(self): + """--starred-skip-size-over should accept an integer value.""" + args = github_backup.parse_args(["testuser", "--starred-skip-size-over", "500"]) + assert args.starred_skip_size_over == 500 + + def test_starred_skip_size_over_rejects_non_integer(self): + """--starred-skip-size-over should reject non-integer values.""" + with pytest.raises(SystemExit): + github_backup.parse_args(["testuser", "--starred-skip-size-over", "abc"]) + + +class TestStarredSkipSizeOverFiltering(TestStarredSkipSizeOver): + """Tests for --starred-skip-size-over filtering behavior.""" + + def test_starred_repo_under_limit_is_kept(self): + """Starred repos under the size limit should be kept.""" + args = self._create_mock_args(starred_skip_size_over=500) + + repos = [ + { + "name": "small-repo", + "owner": {"login": "otheruser"}, + "size": 100 * 1024, # 100 MB in KB + "is_starred": True, + } + ] + + result = github_backup.filter_repositories(args, repos) + assert len(result) == 1 + assert result[0]["name"] == "small-repo" + + def test_starred_repo_over_limit_is_filtered(self): + """Starred repos over the size limit should be filtered out.""" + args = self._create_mock_args(starred_skip_size_over=500) + + repos = [ + { + "name": "huge-repo", + "owner": {"login": "otheruser"}, + "size": 600 * 1024, # 600 MB in KB + "is_starred": True, + } + ] + + result = github_backup.filter_repositories(args, repos) + assert len(result) == 0 + + def test_own_repo_over_limit_is_kept(self): + """User's own repos should not be affected by the size limit.""" + args = self._create_mock_args(starred_skip_size_over=500) + + repos = [ + { + "name": "my-huge-repo", + "owner": {"login": "testuser"}, + "size": 600 * 1024, # 600 MB in KB + # No is_starred flag - this is the user's own repo + } + ] + + result = github_backup.filter_repositories(args, repos) + assert len(result) == 1 + assert result[0]["name"] == "my-huge-repo" + + def test_starred_repo_at_exact_limit_is_kept(self): + """Starred repos at exactly the size limit should be kept.""" + args = self._create_mock_args(starred_skip_size_over=500) + + repos = [ + { + "name": "exact-limit-repo", + "owner": {"login": "otheruser"}, + "size": 500 * 1024, # Exactly 500 MB in KB + "is_starred": True, + } + ] + + result = github_backup.filter_repositories(args, repos) + assert len(result) == 1 + assert result[0]["name"] == "exact-limit-repo" + + def test_mixed_repos_filtered_correctly(self): + """Mix of own and starred repos should be filtered correctly.""" + args = self._create_mock_args(starred_skip_size_over=500) + + repos = [ + { + "name": "my-huge-repo", + "owner": {"login": "testuser"}, + "size": 1000 * 1024, # 1 GB - own repo, should be kept + }, + { + "name": "starred-small", + "owner": {"login": "otheruser"}, + "size": 100 * 1024, # 100 MB - under limit + "is_starred": True, + }, + { + "name": "starred-huge", + "owner": {"login": "anotheruser"}, + "size": 2000 * 1024, # 2 GB - over limit + "is_starred": True, + }, + ] + + result = github_backup.filter_repositories(args, repos) + assert len(result) == 2 + names = [r["name"] for r in result] + assert "my-huge-repo" in names + assert "starred-small" in names + assert "starred-huge" not in names + + def test_no_size_limit_keeps_all_starred(self): + """When no size limit is set, all starred repos should be kept.""" + args = self._create_mock_args(starred_skip_size_over=None) + + repos = [ + { + "name": "huge-starred-repo", + "owner": {"login": "otheruser"}, + "size": 10000 * 1024, # 10 GB + "is_starred": True, + } + ] + + result = github_backup.filter_repositories(args, repos) + assert len(result) == 1 + + def test_repo_without_size_field_is_kept(self): + """Repos without a size field should be kept (size defaults to 0).""" + args = self._create_mock_args(starred_skip_size_over=500) + + repos = [ + { + "name": "no-size-repo", + "owner": {"login": "otheruser"}, + "is_starred": True, + # No size field + } + ] + + result = github_backup.filter_repositories(args, repos) + assert len(result) == 1 + + def test_zero_value_warns_and_is_ignored(self, caplog): + """Zero value should warn and keep all repos.""" + args = self._create_mock_args(starred_skip_size_over=0) + + repos = [ + { + "name": "huge-starred-repo", + "owner": {"login": "otheruser"}, + "size": 10000 * 1024, # 10 GB + "is_starred": True, + } + ] + + result = github_backup.filter_repositories(args, repos) + assert len(result) == 1 + assert "must be greater than 0" in caplog.text + + def test_negative_value_warns_and_is_ignored(self, caplog): + """Negative value should warn and keep all repos.""" + args = self._create_mock_args(starred_skip_size_over=-5) + + repos = [ + { + "name": "huge-starred-repo", + "owner": {"login": "otheruser"}, + "size": 10000 * 1024, # 10 GB + "is_starred": True, + } + ] + + result = github_backup.filter_repositories(args, repos) + assert len(result) == 1 + assert "must be greater than 0" in caplog.text + + +if __name__ == "__main__": + pytest.main([__file__, "-v"]) From 81a72ac8af02a39b79bf74c37bbd21938294c9d8 Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Sun, 21 Dec 2025 23:48:36 +0000 Subject: [PATCH 085/148] Release version 0.59.0 --- CHANGES.rst | 19 ++++++++++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 697b39f..a6a1c4d 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,9 +1,26 @@ Changelog ========= -0.58.0 (2025-12-16) +0.59.0 (2025-12-21) ------------------- ------------------------ +- Add --starred-skip-size-over flag to limit starred repo size (#108) + [Rodos] + + Allow users to skip starred repositories exceeding a size threshold + when using --all-starred. Size is specified in MB and checked against + the GitHub API's repository size field. + + - Only affects starred repos; user's own repos always included + - Logs each skipped repo with name and size + + Closes #108 +- Chore: remove deprecated -u/-p password authentication options. + [Rodos] + + +0.58.0 (2025-12-16) +------------------- - Fix retry logic for HTTP 5xx errors and network failures. [Rodos] Refactors error handling to retry all 5xx errors (not just 502), network errors (URLError, socket.error, IncompleteRead), and JSON parse errors with exponential backoff and jitter. Respects retry-after and rate limit headers per GitHub API requirements. Consolidates retry logic into make_request_with_retry() wrapper and adds clear logging for retry attempts and failures. Removes dead code from 2016 (errors list, _request_http_error, _request_url_error) that was intentionally disabled in commit 1e5a9048 to fix #29. diff --git a/github_backup/__init__.py b/github_backup/__init__.py index 45dbfca..25dbb4b 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.58.0" +__version__ = "0.59.0" From 89502c326d0aab93d4e60b7103f5738593d93d6b Mon Sep 17 00:00:00 2001 From: michaelmartinez Date: Mon, 22 Dec 2025 14:23:02 -0800 Subject: [PATCH 086/148] update retry logic and logging ### What 1. configureable retry count 2. additional logging ### Why 1. pass retry count as a command line arg; default 5 2. show details when api requests fail ### Testing before merge compiles cleanly ### Validation after merge compile and test ### Issue addressed by this PR https://github.com/stellar/ops/issues/2039 --- github_backup/cli.py | 2 ++ github_backup/github_backup.py | 21 ++++++++++++++++----- github_backup/max_retries.py | 1 + 3 files changed, 19 insertions(+), 5 deletions(-) create mode 100644 github_backup/max_retries.py diff --git a/github_backup/cli.py b/github_backup/cli.py index 54849d4..cdc9c5f 100644 --- a/github_backup/cli.py +++ b/github_backup/cli.py @@ -4,6 +4,7 @@ import logging import os import sys +from github_backup import max_retries from github_backup.github_backup import ( backup_account, @@ -39,6 +40,7 @@ def main(): """Main entry point for github-backup CLI.""" args = parse_args() + max_retries.MAX_RETRIES = args.max_retries if args.private and not get_auth(args): logger.warning( diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 1d4e354..13cda22 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -25,6 +25,7 @@ from urllib.error import HTTPError, URLError from urllib.parse import urlencode, urlparse from urllib.request import HTTPRedirectHandler, Request, build_opener, urlopen +from github_backup import max_retries try: from . import __version__ @@ -75,7 +76,7 @@ def __init__(self, message, dmca_url=None): ) # Retry configuration -MAX_RETRIES = 5 +MAX_RETRIES = max_retries.MAX_RETRIES def logging_subprocess( @@ -468,6 +469,13 @@ def parse_args(args=None): parser.add_argument( "--exclude", dest="exclude", help="names of repositories to exclude", nargs="*" ) + parser.add_argument( + "--retries", + dest="max_retries", + type=int, + default=5, + help="maximum number of retries for API calls (default: 5)", + ) return parser.parse_args(args) @@ -737,16 +745,19 @@ def is_retryable_status(status_code, headers): except HTTPError as exc: # HTTPError can be used as a response-like object if not is_retryable_status(exc.code, exc.headers): + logger.error(f"API Error: {exc.code} {exc.reason} for {request.full_url}") raise # Non-retryable error if attempt >= MAX_RETRIES - 1: logger.error(f"HTTP {exc.code} failed after {MAX_RETRIES} attempts") + logger.error(f"HTTP {exc.code} failed after {MAX_RETRIES} attempts for {request.full_url}") raise delay = calculate_retry_delay(attempt, exc.headers) logger.warning( - f"HTTP {exc.code}, retrying in {delay:.1f}s " - f"(attempt {attempt + 1}/{MAX_RETRIES})" + f"HTTP {exc.code} ({exc.reason}), retrying in {delay:.1f}s " + f"(attempt {attempt + 1}/{MAX_RETRIES}) for {request.full_url}" + ) if auth is None and exc.code in (403, 429): logger.info("Hint: Authenticate to raise your GitHub rate limit") @@ -754,12 +765,12 @@ def is_retryable_status(status_code, headers): except (URLError, socket.error) as e: if attempt >= MAX_RETRIES - 1: - logger.error(f"Connection error failed after {MAX_RETRIES} attempts: {e}") + logger.error(f"Connection error failed after {MAX_RETRIES} attempts: {e} for {request.full_url}") raise delay = calculate_retry_delay(attempt, {}) logger.warning( f"Connection error: {e}, retrying in {delay:.1f}s " - f"(attempt {attempt + 1}/{MAX_RETRIES})" + f"(attempt {attempt + 1}/{MAX_RETRIES}) for {request.full_url}" ) time.sleep(delay) diff --git a/github_backup/max_retries.py b/github_backup/max_retries.py new file mode 100644 index 0000000..3bd0f5d --- /dev/null +++ b/github_backup/max_retries.py @@ -0,0 +1 @@ +MAX_RETRIES=None From 8b1b632d8962a868f7ebfb1d2c38bde93983ee58 Mon Sep 17 00:00:00 2001 From: michaelmartinez Date: Mon, 22 Dec 2025 14:47:26 -0800 Subject: [PATCH 087/148] max_retries 5 --- github_backup/max_retries.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/github_backup/max_retries.py b/github_backup/max_retries.py index 3bd0f5d..43594f7 100644 --- a/github_backup/max_retries.py +++ b/github_backup/max_retries.py @@ -1 +1 @@ -MAX_RETRIES=None +MAX_RETRIES=5 From 1f2ec016d561e0c73faa22519730dc47aaf70d44 Mon Sep 17 00:00:00 2001 From: michaelmartinez Date: Mon, 22 Dec 2025 16:13:12 -0800 Subject: [PATCH 088/148] readme, simplify the logic a bit --- github_backup/github_backup.py | 30 +++++++++++++----------------- 1 file changed, 13 insertions(+), 17 deletions(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 13cda22..23bb836 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -75,9 +75,6 @@ def __init__(self, message, dmca_url=None): " 3. Debian/Ubuntu: apt-get install ca-certificates\n\n" ) -# Retry configuration -MAX_RETRIES = max_retries.MAX_RETRIES - def logging_subprocess( popenargs, stdout_log_level=logging.DEBUG, stderr_log_level=logging.ERROR, **kwargs @@ -639,7 +636,7 @@ def fetch_all() -> Generator[dict, None, None]: while True: # FIRST: Fetch response - for attempt in range(MAX_RETRIES): + for attempt in range(max_retries.MAX_RETRIES): request = _construct_request( per_page=per_page if paginated else None, query_args=query_args, @@ -662,10 +659,10 @@ def fetch_all() -> Generator[dict, None, None]: TimeoutError, ) as e: logger.warning(f"{type(e).__name__} reading response") - if attempt < MAX_RETRIES - 1: + if attempt < max_retries.MAX_RETRIES - 1: delay = calculate_retry_delay(attempt, {}) logger.warning( - f"Retrying in {delay:.1f}s (attempt {attempt + 1}/{MAX_RETRIES})" + f"Retrying in {delay:.1f}s (attempt {attempt + 1}/{max_retries.MAX_RETRIES})" ) time.sleep(delay) continue # Next retry attempt @@ -691,10 +688,10 @@ def fetch_all() -> Generator[dict, None, None]: ) else: logger.error( - f"Failed to read response after {MAX_RETRIES} attempts for {next_url or template}" + f"Failed to read response after {max_retries.MAX_RETRIES} attempts for {next_url or template}" ) raise Exception( - f"Failed to read response after {MAX_RETRIES} attempts for {next_url or template}" + f"Failed to read response after {max_retries.MAX_RETRIES} attempts for {next_url or template}" ) # SECOND: Process and paginate @@ -738,7 +735,7 @@ def is_retryable_status(status_code, headers): return int(headers.get("x-ratelimit-remaining", 1)) < 1 return False - for attempt in range(MAX_RETRIES): + for attempt in range(max_retries.MAX_RETRIES): try: return urlopen(request, context=https_ctx) @@ -748,15 +745,14 @@ def is_retryable_status(status_code, headers): logger.error(f"API Error: {exc.code} {exc.reason} for {request.full_url}") raise # Non-retryable error - if attempt >= MAX_RETRIES - 1: - logger.error(f"HTTP {exc.code} failed after {MAX_RETRIES} attempts") - logger.error(f"HTTP {exc.code} failed after {MAX_RETRIES} attempts for {request.full_url}") + if attempt >= max_retries.MAX_RETRIES - 1: + logger.error(f"HTTP {exc.code} failed after {max_retries.MAX_RETRIES} attempts for {request.full_url}") raise delay = calculate_retry_delay(attempt, exc.headers) logger.warning( f"HTTP {exc.code} ({exc.reason}), retrying in {delay:.1f}s " - f"(attempt {attempt + 1}/{MAX_RETRIES}) for {request.full_url}" + f"(attempt {attempt + 1}/{max_retries.MAX_RETRIES}) for {request.full_url}" ) if auth is None and exc.code in (403, 429): @@ -764,17 +760,17 @@ def is_retryable_status(status_code, headers): time.sleep(delay) except (URLError, socket.error) as e: - if attempt >= MAX_RETRIES - 1: - logger.error(f"Connection error failed after {MAX_RETRIES} attempts: {e} for {request.full_url}") + if attempt >= max_retries.MAX_RETRIES - 1: + logger.error(f"Connection error failed after {max_retries.MAX_RETRIES} attempts: {e} for {request.full_url}") raise delay = calculate_retry_delay(attempt, {}) logger.warning( f"Connection error: {e}, retrying in {delay:.1f}s " - f"(attempt {attempt + 1}/{MAX_RETRIES}) for {request.full_url}" + f"(attempt {attempt + 1}/{max_retries.MAX_RETRIES}) for {request.full_url}" ) time.sleep(delay) - raise Exception(f"Request failed after {MAX_RETRIES} attempts") # pragma: no cover + raise Exception(f"Request failed after {max_retries.MAX_RETRIES} attempts") # pragma: no cover def _construct_request(per_page, query_args, template, auth, as_app=None, fine=False): From f9827da342a5306ed904acfac116d0afeaab4109 Mon Sep 17 00:00:00 2001 From: michaelmartinez Date: Tue, 23 Dec 2025 08:53:54 -0800 Subject: [PATCH 089/148] don't use a global variable, pass the args instead --- github_backup/cli.py | 2 -- github_backup/github_backup.py | 32 +++++++++++++++----------------- 2 files changed, 15 insertions(+), 19 deletions(-) diff --git a/github_backup/cli.py b/github_backup/cli.py index cdc9c5f..54849d4 100644 --- a/github_backup/cli.py +++ b/github_backup/cli.py @@ -4,7 +4,6 @@ import logging import os import sys -from github_backup import max_retries from github_backup.github_backup import ( backup_account, @@ -40,7 +39,6 @@ def main(): """Main entry point for github-backup CLI.""" args = parse_args() - max_retries.MAX_RETRIES = args.max_retries if args.private and not get_auth(args): logger.warning( diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 23bb836..7aaf722 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -25,7 +25,6 @@ from urllib.error import HTTPError, URLError from urllib.parse import urlencode, urlparse from urllib.request import HTTPRedirectHandler, Request, build_opener, urlopen -from github_backup import max_retries try: from . import __version__ @@ -636,7 +635,7 @@ def fetch_all() -> Generator[dict, None, None]: while True: # FIRST: Fetch response - for attempt in range(max_retries.MAX_RETRIES): + for attempt in range(args.max_retries): request = _construct_request( per_page=per_page if paginated else None, query_args=query_args, @@ -645,7 +644,7 @@ def fetch_all() -> Generator[dict, None, None]: as_app=args.as_app, fine=args.token_fine is not None, ) - http_response = make_request_with_retry(request, auth) + http_response = make_request_with_retry(request, auth, args.max_retries) match http_response.getcode(): case 200: @@ -659,10 +658,10 @@ def fetch_all() -> Generator[dict, None, None]: TimeoutError, ) as e: logger.warning(f"{type(e).__name__} reading response") - if attempt < max_retries.MAX_RETRIES - 1: + if attempt < args.max_retries - 1: delay = calculate_retry_delay(attempt, {}) logger.warning( - f"Retrying in {delay:.1f}s (attempt {attempt + 1}/{max_retries.MAX_RETRIES})" + f"Retrying read in {delay:.1f}s (attempt {attempt + 1}/{args.max_retries})" ) time.sleep(delay) continue # Next retry attempt @@ -688,10 +687,10 @@ def fetch_all() -> Generator[dict, None, None]: ) else: logger.error( - f"Failed to read response after {max_retries.MAX_RETRIES} attempts for {next_url or template}" + f"Failed to read response after {args.max_retries} attempts for {next_url or template}" ) raise Exception( - f"Failed to read response after {max_retries.MAX_RETRIES} attempts for {next_url or template}" + f"Failed to read response after {args.max_retries} attempts for {next_url or template}" ) # SECOND: Process and paginate @@ -723,7 +722,7 @@ def fetch_all() -> Generator[dict, None, None]: return list(fetch_all()) -def make_request_with_retry(request, auth): +def make_request_with_retry(request, auth, max_retries=5): """Make HTTP request with automatic retry for transient errors.""" def is_retryable_status(status_code, headers): @@ -735,7 +734,7 @@ def is_retryable_status(status_code, headers): return int(headers.get("x-ratelimit-remaining", 1)) < 1 return False - for attempt in range(max_retries.MAX_RETRIES): + for attempt in range(max_retries): try: return urlopen(request, context=https_ctx) @@ -745,32 +744,31 @@ def is_retryable_status(status_code, headers): logger.error(f"API Error: {exc.code} {exc.reason} for {request.full_url}") raise # Non-retryable error - if attempt >= max_retries.MAX_RETRIES - 1: - logger.error(f"HTTP {exc.code} failed after {max_retries.MAX_RETRIES} attempts for {request.full_url}") + if attempt >= max_retries - 1: + logger.error(f"HTTP {exc.code} failed after {max_retries} attempts for {request.full_url}") raise delay = calculate_retry_delay(attempt, exc.headers) logger.warning( f"HTTP {exc.code} ({exc.reason}), retrying in {delay:.1f}s " - f"(attempt {attempt + 1}/{max_retries.MAX_RETRIES}) for {request.full_url}" - + f"(attempt {attempt + 1}/{max_retries}) for {request.full_url}" ) if auth is None and exc.code in (403, 429): logger.info("Hint: Authenticate to raise your GitHub rate limit") time.sleep(delay) except (URLError, socket.error) as e: - if attempt >= max_retries.MAX_RETRIES - 1: - logger.error(f"Connection error failed after {max_retries.MAX_RETRIES} attempts: {e} for {request.full_url}") + if attempt >= max_retries - 1: + logger.error(f"Connection error failed after {max_retries} attempts: {e} for {request.full_url}") raise delay = calculate_retry_delay(attempt, {}) logger.warning( f"Connection error: {e}, retrying in {delay:.1f}s " - f"(attempt {attempt + 1}/{max_retries.MAX_RETRIES}) for {request.full_url}" + f"(attempt {attempt + 1}/{max_retries}) for {request.full_url}" ) time.sleep(delay) - raise Exception(f"Request failed after {max_retries.MAX_RETRIES} attempts") # pragma: no cover + raise Exception(f"Request failed after {max_retries} attempts") # pragma: no cover def _construct_request(per_page, query_args, template, auth, as_app=None, fine=False): From 8b21e2501c8111cd3aa2a67ceec1ea1b9ec746dc Mon Sep 17 00:00:00 2001 From: michaelmartinez Date: Tue, 23 Dec 2025 08:55:52 -0800 Subject: [PATCH 090/148] readme --- README.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index ffa80ac..df31e28 100644 --- a/README.rst +++ b/README.rst @@ -152,7 +152,8 @@ CLI Help output:: --throttle-limit to be set) --exclude [EXCLUDE ...] names of repositories to exclude - + --retries MAX_RETRIES + maximum number of retries for API calls (default: 5) Usage Details ============= From 5ab3852476d4c387f04473a1ea1b1b76cd6a4878 Mon Sep 17 00:00:00 2001 From: michaelmartinez Date: Tue, 23 Dec 2025 08:57:57 -0800 Subject: [PATCH 091/148] rm max_retries.py --- github_backup/max_retries.py | 1 - 1 file changed, 1 deletion(-) delete mode 100644 github_backup/max_retries.py diff --git a/github_backup/max_retries.py b/github_backup/max_retries.py deleted file mode 100644 index 43594f7..0000000 --- a/github_backup/max_retries.py +++ /dev/null @@ -1 +0,0 @@ -MAX_RETRIES=5 From 44b0003ec9766759f39e23084db1ba152d90d1a1 Mon Sep 17 00:00:00 2001 From: michaelmartinez Date: Tue, 23 Dec 2025 14:07:38 -0800 Subject: [PATCH 092/148] updates to the tests, and fixes to the retry --- github_backup/github_backup.py | 59 ++++++--- tests/test_http_451.py | 41 ++++-- tests/test_pagination.py | 1 + tests/test_retrieve_data.py | 235 +++++++++++++++++++++++++++------ 4 files changed, 266 insertions(+), 70 deletions(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 7aaf722..12b354b 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -141,6 +141,17 @@ def mask_password(url, secret="*****"): return url.replace(parsed.password, secret) +def non_negative_int(value): + """Argparse type validator for non-negative integers.""" + try: + ivalue = int(value) + except ValueError: + raise argparse.ArgumentTypeError(f"'{value}' is not a valid integer") + if ivalue < 0: + raise argparse.ArgumentTypeError(f"{value} must be 0 or greater") + return ivalue + + def parse_args(args=None): parser = argparse.ArgumentParser(description="Backup a github account") parser.add_argument("user", metavar="USER", type=str, help="github username") @@ -468,7 +479,7 @@ def parse_args(args=None): parser.add_argument( "--retries", dest="max_retries", - type=int, + type=non_negative_int, default=5, help="maximum number of retries for API calls (default: 5)", ) @@ -626,7 +637,7 @@ def retrieve_data(args, template, query_args=None, paginated=True): def _extract_next_page_url(link_header): for link in link_header.split(","): if 'rel="next"' in link: - return link[link.find("<") + 1:link.find(">")] + return link[link.find("<") + 1 : link.find(">")] return None def fetch_all() -> Generator[dict, None, None]: @@ -635,7 +646,7 @@ def fetch_all() -> Generator[dict, None, None]: while True: # FIRST: Fetch response - for attempt in range(args.max_retries): + for attempt in range(args.max_retries + 1): request = _construct_request( per_page=per_page if paginated else None, query_args=query_args, @@ -658,10 +669,10 @@ def fetch_all() -> Generator[dict, None, None]: TimeoutError, ) as e: logger.warning(f"{type(e).__name__} reading response") - if attempt < args.max_retries - 1: + if attempt < args.max_retries: delay = calculate_retry_delay(attempt, {}) logger.warning( - f"Retrying read in {delay:.1f}s (attempt {attempt + 1}/{args.max_retries})" + f"Retrying read in {delay:.1f}s (attempt {attempt + 1}/{args.max_retries + 1})" ) time.sleep(delay) continue # Next retry attempt @@ -687,10 +698,10 @@ def fetch_all() -> Generator[dict, None, None]: ) else: logger.error( - f"Failed to read response after {args.max_retries} attempts for {next_url or template}" + f"Failed to read response after {args.max_retries + 1} attempts for {next_url or template}" ) raise Exception( - f"Failed to read response after {args.max_retries} attempts for {next_url or template}" + f"Failed to read response after {args.max_retries + 1} attempts for {next_url or template}" ) # SECOND: Process and paginate @@ -734,41 +745,49 @@ def is_retryable_status(status_code, headers): return int(headers.get("x-ratelimit-remaining", 1)) < 1 return False - for attempt in range(max_retries): + for attempt in range(max_retries + 1): try: return urlopen(request, context=https_ctx) except HTTPError as exc: # HTTPError can be used as a response-like object if not is_retryable_status(exc.code, exc.headers): - logger.error(f"API Error: {exc.code} {exc.reason} for {request.full_url}") + logger.error( + f"API Error: {exc.code} {exc.reason} for {request.full_url}" + ) raise # Non-retryable error - if attempt >= max_retries - 1: - logger.error(f"HTTP {exc.code} failed after {max_retries} attempts for {request.full_url}") + if attempt >= max_retries: + logger.error( + f"HTTP {exc.code} failed after {max_retries + 1} attempts for {request.full_url}" + ) raise delay = calculate_retry_delay(attempt, exc.headers) logger.warning( f"HTTP {exc.code} ({exc.reason}), retrying in {delay:.1f}s " - f"(attempt {attempt + 1}/{max_retries}) for {request.full_url}" + f"(attempt {attempt + 1}/{max_retries + 1}) for {request.full_url}" ) if auth is None and exc.code in (403, 429): logger.info("Hint: Authenticate to raise your GitHub rate limit") time.sleep(delay) except (URLError, socket.error) as e: - if attempt >= max_retries - 1: - logger.error(f"Connection error failed after {max_retries} attempts: {e} for {request.full_url}") + if attempt >= max_retries: + logger.error( + f"Connection error failed after {max_retries + 1} attempts: {e} for {request.full_url}" + ) raise delay = calculate_retry_delay(attempt, {}) logger.warning( f"Connection error: {e}, retrying in {delay:.1f}s " - f"(attempt {attempt + 1}/{max_retries}) for {request.full_url}" + f"(attempt {attempt + 1}/{max_retries + 1}) for {request.full_url}" ) time.sleep(delay) - raise Exception(f"Request failed after {max_retries} attempts") # pragma: no cover + raise Exception( + f"Request failed after {max_retries + 1} attempts" + ) # pragma: no cover def _construct_request(per_page, query_args, template, auth, as_app=None, fine=False): @@ -1584,9 +1603,7 @@ def filter_repositories(args, unfiltered_repositories): repositories = [r for r in repositories if not r.get("archived")] if args.starred_skip_size_over is not None: if args.starred_skip_size_over <= 0: - logger.warning( - "--starred-skip-size-over must be greater than 0, ignoring" - ) + logger.warning("--starred-skip-size-over must be greater than 0, ignoring") else: size_limit_kb = args.starred_skip_size_over * 1024 filtered = [] @@ -1595,7 +1612,9 @@ def filter_repositories(args, unfiltered_repositories): size_mb = r.get("size", 0) / 1024 logger.info( "Skipping starred repo {0} ({1:.0f} MB) due to --starred-skip-size-over {2}".format( - r.get("full_name", r.get("name")), size_mb, args.starred_skip_size_over + r.get("full_name", r.get("name")), + size_mb, + args.starred_skip_size_over, ) ) else: diff --git a/tests/test_http_451.py b/tests/test_http_451.py index d53d65c..bb825f7 100644 --- a/tests/test_http_451.py +++ b/tests/test_http_451.py @@ -21,6 +21,7 @@ def test_repository_unavailable_error_raised(self): args.osx_keychain_item_account = None args.throttle_limit = None args.throttle_pause = 0 + args.max_retries = 5 mock_response = Mock() mock_response.getcode.return_value = 451 @@ -30,18 +31,26 @@ def test_repository_unavailable_error_raised(self): "block": { "reason": "dmca", "created_at": "2024-11-12T14:38:04Z", - "html_url": "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md" - } + "html_url": "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md", + }, } mock_response.read.return_value = json.dumps(dmca_data).encode("utf-8") mock_response.headers = {"x-ratelimit-remaining": "5000"} mock_response.reason = "Unavailable For Legal Reasons" - with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response): + with patch( + "github_backup.github_backup.make_request_with_retry", + return_value=mock_response, + ): with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info: - github_backup.retrieve_data(args, "https://api.github.com/repos/test/dmca/issues") - - assert exc_info.value.dmca_url == "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md" + github_backup.retrieve_data( + args, "https://api.github.com/repos/test/dmca/issues" + ) + + assert ( + exc_info.value.dmca_url + == "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md" + ) assert "451" in str(exc_info.value) def test_repository_unavailable_error_without_dmca_url(self): @@ -54,6 +63,7 @@ def test_repository_unavailable_error_without_dmca_url(self): args.osx_keychain_item_account = None args.throttle_limit = None args.throttle_pause = 0 + args.max_retries = 5 mock_response = Mock() mock_response.getcode.return_value = 451 @@ -61,9 +71,14 @@ def test_repository_unavailable_error_without_dmca_url(self): mock_response.headers = {"x-ratelimit-remaining": "5000"} mock_response.reason = "Unavailable For Legal Reasons" - with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response): + with patch( + "github_backup.github_backup.make_request_with_retry", + return_value=mock_response, + ): with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info: - github_backup.retrieve_data(args, "https://api.github.com/repos/test/dmca/issues") + github_backup.retrieve_data( + args, "https://api.github.com/repos/test/dmca/issues" + ) assert exc_info.value.dmca_url is None assert "451" in str(exc_info.value) @@ -78,6 +93,7 @@ def test_repository_unavailable_error_with_malformed_json(self): args.osx_keychain_item_account = None args.throttle_limit = None args.throttle_pause = 0 + args.max_retries = 5 mock_response = Mock() mock_response.getcode.return_value = 451 @@ -85,9 +101,14 @@ def test_repository_unavailable_error_with_malformed_json(self): mock_response.headers = {"x-ratelimit-remaining": "5000"} mock_response.reason = "Unavailable For Legal Reasons" - with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response): + with patch( + "github_backup.github_backup.make_request_with_retry", + return_value=mock_response, + ): with pytest.raises(github_backup.RepositoryUnavailableError): - github_backup.retrieve_data(args, "https://api.github.com/repos/test/dmca/issues") + github_backup.retrieve_data( + args, "https://api.github.com/repos/test/dmca/issues" + ) if __name__ == "__main__": diff --git a/tests/test_pagination.py b/tests/test_pagination.py index 831b913..e35ff38 100644 --- a/tests/test_pagination.py +++ b/tests/test_pagination.py @@ -49,6 +49,7 @@ def mock_args(): args.osx_keychain_item_account = None args.throttle_limit = None args.throttle_pause = 0 + args.max_retries = 5 return args diff --git a/tests/test_retrieve_data.py b/tests/test_retrieve_data.py index adb1152..fa82bd7 100644 --- a/tests/test_retrieve_data.py +++ b/tests/test_retrieve_data.py @@ -9,26 +9,27 @@ from github_backup import github_backup from github_backup.github_backup import ( - MAX_RETRIES, calculate_retry_delay, make_request_with_retry, ) +# Default retry count used in tests (matches argparse default) +# With max_retries=5, total attempts = 6 (1 initial + 5 retries) +DEFAULT_MAX_RETRIES = 5 + class TestCalculateRetryDelay: def test_respects_retry_after_header(self): - headers = {'retry-after': '30'} + headers = {"retry-after": "30"} assert calculate_retry_delay(0, headers) == 30 def test_respects_rate_limit_reset(self): import time import calendar + # Set reset time 60 seconds in the future future_reset = calendar.timegm(time.gmtime()) + 60 - headers = { - 'x-ratelimit-remaining': '0', - 'x-ratelimit-reset': str(future_reset) - } + headers = {"x-ratelimit-remaining": "0", "x-ratelimit-reset": str(future_reset)} delay = calculate_retry_delay(0, headers) # Should be approximately 60 seconds (with some tolerance for execution time) assert 55 <= delay <= 65 @@ -50,12 +51,10 @@ def test_max_delay_cap(self): def test_minimum_rate_limit_delay(self): import time import calendar + # Set reset time in the past (already reset) past_reset = calendar.timegm(time.gmtime()) - 100 - headers = { - 'x-ratelimit-remaining': '0', - 'x-ratelimit-reset': str(past_reset) - } + headers = {"x-ratelimit-remaining": "0", "x-ratelimit-reset": str(past_reset)} delay = calculate_retry_delay(0, headers) # Should be minimum 10 seconds even if reset time is in past assert delay >= 10 @@ -74,6 +73,7 @@ def mock_args(self): args.osx_keychain_item_account = None args.throttle_limit = None args.throttle_pause = 0 + args.max_retries = DEFAULT_MAX_RETRIES return args def test_json_parse_error_retries_and_fails(self, mock_args): @@ -90,13 +90,22 @@ def mock_make_request(*args, **kwargs): call_count += 1 return mock_response - with patch("github_backup.github_backup.make_request_with_retry", side_effect=mock_make_request): - with patch("github_backup.github_backup.calculate_retry_delay", return_value=0): # No delay in tests + with patch( + "github_backup.github_backup.make_request_with_retry", + side_effect=mock_make_request, + ): + with patch( + "github_backup.github_backup.calculate_retry_delay", return_value=0 + ): # No delay in tests with pytest.raises(Exception) as exc_info: - github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/repo/issues") + github_backup.retrieve_data( + mock_args, "https://api.github.com/repos/test/repo/issues" + ) assert "Failed to read response after" in str(exc_info.value) - assert call_count == MAX_RETRIES + assert ( + call_count == DEFAULT_MAX_RETRIES + 1 + ) # 1 initial + 5 retries = 6 attempts def test_json_parse_error_recovers_on_retry(self, mock_args): """HTTP 200 with invalid JSON should succeed if retry returns valid JSON.""" @@ -119,9 +128,16 @@ def mock_make_request(*args, **kwargs): call_count += 1 return result - with patch("github_backup.github_backup.make_request_with_retry", side_effect=mock_make_request): - with patch("github_backup.github_backup.calculate_retry_delay", return_value=0): - result = github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/repo/issues") + with patch( + "github_backup.github_backup.make_request_with_retry", + side_effect=mock_make_request, + ): + with patch( + "github_backup.github_backup.calculate_retry_delay", return_value=0 + ): + result = github_backup.retrieve_data( + mock_args, "https://api.github.com/repos/test/repo/issues" + ) assert result == [{"id": 1}] assert call_count == 3 # Failed twice, succeeded on third @@ -134,11 +150,18 @@ def test_http_error_raises_exception(self, mock_args): mock_response.headers = {"x-ratelimit-remaining": "5000"} mock_response.reason = "Not Found" - with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response): + with patch( + "github_backup.github_backup.make_request_with_retry", + return_value=mock_response, + ): with pytest.raises(Exception) as exc_info: - github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/notfound/issues") + github_backup.retrieve_data( + mock_args, "https://api.github.com/repos/test/notfound/issues" + ) - assert not isinstance(exc_info.value, github_backup.RepositoryUnavailableError) + assert not isinstance( + exc_info.value, github_backup.RepositoryUnavailableError + ) assert "404" in str(exc_info.value) @@ -151,7 +174,7 @@ def test_502_error_retries_and_succeeds(self): good_response.read.return_value = b'{"ok": true}' call_count = 0 - fail_count = MAX_RETRIES - 1 # Fail all but last attempt + fail_count = DEFAULT_MAX_RETRIES # Fail all retries, succeed on last attempt def mock_urlopen(*args, **kwargs): nonlocal call_count @@ -167,14 +190,18 @@ def mock_urlopen(*args, **kwargs): return good_response with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): - with patch("github_backup.github_backup.calculate_retry_delay", return_value=0): + with patch( + "github_backup.github_backup.calculate_retry_delay", return_value=0 + ): result = make_request_with_retry(Mock(), None) assert result == good_response - assert call_count == MAX_RETRIES + assert ( + call_count == DEFAULT_MAX_RETRIES + 1 + ) # 1 initial + 5 retries = 6 attempts def test_503_error_retries_until_exhausted(self): - """HTTP 503 should retry MAX_RETRIES times then raise.""" + """HTTP 503 should make 1 initial + DEFAULT_MAX_RETRIES retry attempts then raise.""" call_count = 0 def mock_urlopen(*args, **kwargs): @@ -189,12 +216,16 @@ def mock_urlopen(*args, **kwargs): ) with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): - with patch("github_backup.github_backup.calculate_retry_delay", return_value=0): + with patch( + "github_backup.github_backup.calculate_retry_delay", return_value=0 + ): with pytest.raises(HTTPError) as exc_info: make_request_with_retry(Mock(), None) assert exc_info.value.code == 503 - assert call_count == MAX_RETRIES + assert ( + call_count == DEFAULT_MAX_RETRIES + 1 + ) # 1 initial + 5 retries = 6 attempts def test_404_error_not_retried(self): """HTTP 404 should not be retried - raise immediately.""" @@ -237,7 +268,9 @@ def mock_urlopen(*args, **kwargs): return good_response with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): - with patch("github_backup.github_backup.calculate_retry_delay", return_value=0): + with patch( + "github_backup.github_backup.calculate_retry_delay", return_value=0 + ): result = make_request_with_retry(Mock(), None) assert result == good_response @@ -269,7 +302,7 @@ def test_connection_error_retries_and_succeeds(self): """URLError (connection error) should retry and succeed if subsequent request works.""" good_response = Mock() call_count = 0 - fail_count = MAX_RETRIES - 1 # Fail all but last attempt + fail_count = DEFAULT_MAX_RETRIES # Fail all retries, succeed on last attempt def mock_urlopen(*args, **kwargs): nonlocal call_count @@ -279,14 +312,18 @@ def mock_urlopen(*args, **kwargs): return good_response with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): - with patch("github_backup.github_backup.calculate_retry_delay", return_value=0): + with patch( + "github_backup.github_backup.calculate_retry_delay", return_value=0 + ): result = make_request_with_retry(Mock(), None) assert result == good_response - assert call_count == MAX_RETRIES + assert ( + call_count == DEFAULT_MAX_RETRIES + 1 + ) # 1 initial + 5 retries = 6 attempts def test_socket_error_retries_until_exhausted(self): - """socket.error should retry MAX_RETRIES times then raise.""" + """socket.error should make 1 initial + DEFAULT_MAX_RETRIES retry attempts then raise.""" call_count = 0 def mock_urlopen(*args, **kwargs): @@ -295,11 +332,15 @@ def mock_urlopen(*args, **kwargs): raise socket.error("Connection reset by peer") with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): - with patch("github_backup.github_backup.calculate_retry_delay", return_value=0): + with patch( + "github_backup.github_backup.calculate_retry_delay", return_value=0 + ): with pytest.raises(socket.error): make_request_with_retry(Mock(), None) - assert call_count == MAX_RETRIES + assert ( + call_count == DEFAULT_MAX_RETRIES + 1 + ) # 1 initial + 5 retries = 6 attempts class TestRetrieveDataThrottling: @@ -315,6 +356,7 @@ def mock_args(self): args.osx_keychain_item_account = None args.throttle_limit = 10 # Throttle when remaining <= 10 args.throttle_pause = 5 # Pause 5 seconds + args.max_retries = DEFAULT_MAX_RETRIES return args def test_throttling_pauses_when_rate_limit_low(self, mock_args): @@ -322,11 +364,19 @@ def test_throttling_pauses_when_rate_limit_low(self, mock_args): mock_response = Mock() mock_response.getcode.return_value = 200 mock_response.read.return_value = json.dumps([{"id": 1}]).encode("utf-8") - mock_response.headers = {"x-ratelimit-remaining": "5", "Link": ""} # Below throttle_limit - - with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response): + mock_response.headers = { + "x-ratelimit-remaining": "5", + "Link": "", + } # Below throttle_limit + + with patch( + "github_backup.github_backup.make_request_with_retry", + return_value=mock_response, + ): with patch("github_backup.github_backup.time.sleep") as mock_sleep: - github_backup.retrieve_data(mock_args, "https://api.github.com/repos/test/repo/issues") + github_backup.retrieve_data( + mock_args, "https://api.github.com/repos/test/repo/issues" + ) mock_sleep.assert_called_once_with(5) # throttle_pause value @@ -344,16 +394,121 @@ def mock_args(self): args.osx_keychain_item_account = None args.throttle_limit = None args.throttle_pause = 0 + args.max_retries = DEFAULT_MAX_RETRIES return args def test_dict_response_returned_as_list(self, mock_args): """Single dict response should be returned as a list with one item.""" mock_response = Mock() mock_response.getcode.return_value = 200 - mock_response.read.return_value = json.dumps({"login": "testuser", "id": 123}).encode("utf-8") + mock_response.read.return_value = json.dumps( + {"login": "testuser", "id": 123} + ).encode("utf-8") mock_response.headers = {"x-ratelimit-remaining": "5000", "Link": ""} - with patch("github_backup.github_backup.make_request_with_retry", return_value=mock_response): - result = github_backup.retrieve_data(mock_args, "https://api.github.com/user") + with patch( + "github_backup.github_backup.make_request_with_retry", + return_value=mock_response, + ): + result = github_backup.retrieve_data( + mock_args, "https://api.github.com/user" + ) assert result == [{"login": "testuser", "id": 123}] + + +class TestRetriesCliArgument: + """Tests for --retries CLI argument validation and behavior.""" + + def test_retries_argument_accepted(self): + """--retries flag should be accepted and parsed correctly.""" + args = github_backup.parse_args(["--retries", "3", "testuser"]) + assert args.max_retries == 3 + + def test_retries_default_value(self): + """--retries should default to 5 if not specified.""" + args = github_backup.parse_args(["testuser"]) + assert args.max_retries == 5 + + def test_retries_zero_is_valid(self): + """--retries 0 should be valid and mean 1 attempt (no retries).""" + args = github_backup.parse_args(["--retries", "0", "testuser"]) + assert args.max_retries == 0 + + def test_retries_negative_rejected(self): + """--retries with negative value should be rejected by argparse.""" + with pytest.raises(SystemExit): + github_backup.parse_args(["--retries", "-1", "testuser"]) + + def test_retries_non_integer_rejected(self): + """--retries with non-integer value should be rejected by argparse.""" + with pytest.raises(SystemExit): + github_backup.parse_args(["--retries", "abc", "testuser"]) + + def test_retries_one_with_transient_error_succeeds(self): + """--retries 1 should allow one retry after initial failure.""" + good_response = Mock() + good_response.read.return_value = b'{"ok": true}' + + call_count = 0 + + def mock_urlopen(*args, **kwargs): + nonlocal call_count + call_count += 1 + if call_count == 1: + raise HTTPError( + url="https://api.github.com/test", + code=502, + msg="Bad Gateway", + hdrs={"x-ratelimit-remaining": "5000"}, + fp=None, + ) + return good_response + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + with patch( + "github_backup.github_backup.calculate_retry_delay", return_value=0 + ): + result = make_request_with_retry(Mock(), None, max_retries=1) + + assert result == good_response + assert call_count == 2 # 1 initial + 1 retry = 2 attempts + + def test_custom_retry_count_limits_attempts(self): + """Custom --retries value should limit actual retry attempts.""" + args = Mock() + args.as_app = False + args.token_fine = None + args.token_classic = "fake_token" + args.osx_keychain_item_name = None + args.osx_keychain_item_account = None + args.throttle_limit = None + args.throttle_pause = 0 + args.max_retries = 2 # 2 retries = 3 total attempts (1 initial + 2 retries) + + mock_response = Mock() + mock_response.getcode.return_value = 200 + mock_response.read.return_value = b"not valid json {" + mock_response.headers = {"x-ratelimit-remaining": "5000"} + + call_count = 0 + + def mock_make_request(*args, **kwargs): + nonlocal call_count + call_count += 1 + return mock_response + + with patch( + "github_backup.github_backup.make_request_with_retry", + side_effect=mock_make_request, + ): + with patch( + "github_backup.github_backup.calculate_retry_delay", return_value=0 + ): + with pytest.raises(Exception) as exc_info: + github_backup.retrieve_data( + args, "https://api.github.com/repos/test/repo/issues" + ) + + assert "Failed to read response after 3 attempts" in str(exc_info.value) + assert call_count == 3 # 1 initial + 2 retries = 3 attempts From 858731ebbd609c9eb5caecce9bbb8b5e04b490bb Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Wed, 24 Dec 2025 00:45:01 +0000 Subject: [PATCH 093/148] Release version 0.60.0 --- CHANGES.rst | 11 ++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index a6a1c4d..ee2a1d4 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,9 +1,18 @@ Changelog ========= -0.59.0 (2025-12-21) +0.60.0 (2025-12-24) ------------------- ------------------------ +- Rm max_retries.py. [michaelmartinez] +- Readme. [michaelmartinez] +- Don't use a global variable, pass the args instead. [michaelmartinez] +- Readme, simplify the logic a bit. [michaelmartinez] +- Max_retries 5. [michaelmartinez] + + +0.59.0 (2025-12-21) +------------------- - Add --starred-skip-size-over flag to limit starred repo size (#108) [Rodos] diff --git a/github_backup/__init__.py b/github_backup/__init__.py index 25dbb4b..5684ec7 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.59.0" +__version__ = "0.60.0" From 9a6f0b4c21be4f9157a110b96a5561d672dbf6b1 Mon Sep 17 00:00:00 2001 From: Lukas Bestle Date: Fri, 9 Jan 2026 21:04:21 +0100 Subject: [PATCH 094/148] feat: Backup of repository security advisories --- README.rst | 10 ++++---- github_backup/github_backup.py | 44 ++++++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+), 4 deletions(-) diff --git a/README.rst b/README.rst index df31e28..8e00d49 100644 --- a/README.rst +++ b/README.rst @@ -43,9 +43,9 @@ CLI Help output:: [--watched] [--followers] [--following] [--all] [--issues] [--issue-comments] [--issue-events] [--pulls] [--pull-comments] [--pull-commits] [--pull-details] - [--labels] [--hooks] [--milestones] [--repositories] - [--bare] [--no-prune] [--lfs] [--wikis] [--gists] - [--starred-gists] [--skip-archived] [--skip-existing] + [--labels] [--hooks] [--milestones] [--security-advisories] + [--repositories] [--bare] [--no-prune] [--lfs] [--wikis] + [--gists] [--starred-gists] [--skip-archived] [--skip-existing] [-L [LANGUAGES ...]] [-N NAME_REGEX] [-H GITHUB_HOST] [-O] [-R REPOSITORY] [-P] [-F] [--prefer-ssh] [-v] [--keychain-name OSX_KEYCHAIN_ITEM_NAME] @@ -101,6 +101,8 @@ CLI Help output:: --hooks include hooks in backup (works only when authenticated) --milestones include milestones in backup + --security-advisories + include security advisories in backup --repositories include repository clone in backup --bare clone bare repositories --no-prune disable prune option for git fetch @@ -401,7 +403,7 @@ Quietly and incrementally backup useful Github user data (public and private rep export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN GH_USER=YOUR-GITHUB-USER - github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --attachments --pull-details --gists --starred-gists $GH_USER + github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --security-advisories --repositories --wikis --releases --assets --attachments --pull-details --gists --starred-gists $GH_USER Debug an error/block or incomplete backup into a temporary directory. Omit "incremental" to fill a previous incomplete backup. :: diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 12b354b..8a60f66 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -310,6 +310,12 @@ def parse_args(args=None): dest="include_milestones", help="include milestones in backup", ) + parser.add_argument( + "--security-advisories", + action="store_true", + dest="include_security_advisories", + help="include security advisories in backup", + ) parser.add_argument( "--repositories", action="store_true", @@ -1718,6 +1724,9 @@ def backup_repositories(args, output_directory, repositories): if args.include_milestones or args.include_everything: backup_milestones(args, repo_cwd, repository, repos_template) + if args.include_security_advisories or args.include_everything: + backup_security_advisories(args, repo_cwd, repository, repos_template) + if args.include_labels or args.include_everything: backup_labels(args, repo_cwd, repository, repos_template) @@ -1934,6 +1943,41 @@ def backup_milestones(args, repo_cwd, repository, repos_template): ) +def backup_security_advisories(args, repo_cwd, repository, repos_template): + advisory_cwd = os.path.join(repo_cwd, "security-advisories") + if args.skip_existing and os.path.isdir(advisory_cwd): + return + + logger.info("Retrieving {0} security advisories".format(repository["full_name"])) + mkdir_p(repo_cwd, advisory_cwd) + + template = "{0}/{1}/security-advisories".format(repos_template, repository["full_name"]) + + _advisories = retrieve_data(args, template) + + advisories = {} + for advisory in _advisories: + advisories[advisory["ghsa_id"]] = advisory + + written_count = 0 + for ghsa_id, advisory in list(advisories.items()): + advisory_file = "{0}/{1}.json".format(advisory_cwd, ghsa_id) + if json_dump_if_changed(advisory, advisory_file): + written_count += 1 + + total = len(advisories) + if written_count == total: + logger.info("Saved {0} security advisories to disk".format(total)) + elif written_count == 0: + logger.info("{0} security advisories unchanged, skipped write".format(total)) + else: + logger.info( + "Saved {0} of {1} security advisories to disk ({2} unchanged)".format( + written_count, total, total - written_count + ) + ) + + def backup_labels(args, repo_cwd, repository, repos_template): label_cwd = os.path.join(repo_cwd, "labels") output_file = "{0}/labels.json".format(label_cwd) From a175ac3ed90cbcb5aa29785f8ce5adc7567e9123 Mon Sep 17 00:00:00 2001 From: Lukas Bestle Date: Sat, 10 Jan 2026 11:12:42 +0100 Subject: [PATCH 095/148] test: Adapt tests to new argument --- tests/test_all_starred.py | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/test_all_starred.py b/tests/test_all_starred.py index 0fab048..297d148 100644 --- a/tests/test_all_starred.py +++ b/tests/test_all_starred.py @@ -37,6 +37,7 @@ def _create_mock_args(self, **overrides): args.include_labels = False args.include_hooks = False args.include_milestones = False + args.include_security_advisories = False args.include_releases = False args.include_assets = False args.include_attachments = False From b3d35f9d9f7f3c1223c2eb94a8e0cd3c8a466e79 Mon Sep 17 00:00:00 2001 From: Lukas Bestle Date: Sat, 10 Jan 2026 15:44:37 +0100 Subject: [PATCH 096/148] docs: Add missing `--retries` argument to README --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index df31e28..f5149e6 100644 --- a/README.rst +++ b/README.rst @@ -55,7 +55,7 @@ CLI Help output:: [--skip-assets-on [SKIP_ASSETS_ON ...]] [--attachments] [--throttle-limit THROTTLE_LIMIT] [--throttle-pause THROTTLE_PAUSE] - [--exclude [EXCLUDE ...]] + [--exclude [EXCLUDE ...]] [--retries MAX_RETRIES] USER Backup a github account From c63fb37d30fc5547f39c2ba798c30a97545ea285 Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Mon, 12 Jan 2026 16:30:28 +0000 Subject: [PATCH 097/148] Release version 0.61.0 --- CHANGES.rst | 9 ++++++++- github_backup/__init__.py | 2 +- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index ee2a1d4..0e66663 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,9 +1,16 @@ Changelog ========= -0.60.0 (2025-12-24) +0.61.0 (2026-01-12) ------------------- ------------------------ +- Docs: Add missing `--retries` argument to README. [Lukas Bestle] +- Test: Adapt tests to new argument. [Lukas Bestle] +- Feat: Backup of repository security advisories. [Lukas Bestle] + + +0.60.0 (2025-12-24) +------------------- - Rm max_retries.py. [michaelmartinez] - Readme. [michaelmartinez] - Don't use a global variable, pass the args instead. [michaelmartinez] diff --git a/github_backup/__init__.py b/github_backup/__init__.py index 5684ec7..a076e5d 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.60.0" +__version__ = "0.61.0" From fce4abb74ae729679d5a6dc7b0b5cf57044efcf2 Mon Sep 17 00:00:00 2001 From: Rodos Date: Tue, 13 Jan 2026 13:15:38 +1100 Subject: [PATCH 098/148] Fix fine-grained PAT attachment downloads for private repos (#477) Fine-grained personal access tokens cannot download attachments from private repositories directly due to a GitHub platform limitation. This adds a workaround for image attachments (/assets/ URLs) using GitHub's Markdown API to convert URLs to JWT-signed URLs that can be downloaded without authentication. Changes: - Add get_jwt_signed_url_via_markdown_api() function - Detect fine-grained token + private repo + /assets/ URL upfront - Use JWT workaround for those cases, mark success with jwt_workaround flag - Skip download with skipped_at when workaround fails - Add startup warning when using --attachments with fine-grained tokens - Document limitation in README (file attachments still fail) - Add 6 unit tests for JWT workaround logic --- README.rst | 2 + github_backup/cli.py | 10 +++ github_backup/github_backup.py | 108 ++++++++++++++++++++++++-- tests/test_attachments.py | 136 +++++++++++++++++++++++++++++++++ 4 files changed, 248 insertions(+), 8 deletions(-) diff --git a/README.rst b/README.rst index e2c8fc2..c23027d 100644 --- a/README.rst +++ b/README.rst @@ -281,6 +281,8 @@ The tool automatically extracts file extensions from HTTP headers to ensure file **Repository filtering** for repo files/assets handles renamed and transferred repositories gracefully. URLs are included if they either match the current repository name directly, or redirect to it (e.g., ``willmcgugan/rich`` redirects to ``Textualize/rich`` after transfer). +**Fine-grained token limitation:** Due to a GitHub platform limitation, fine-grained personal access tokens (``github_pat_...``) cannot download attachments from private repositories directly. This affects both ``/assets/`` (images) and ``/files/`` (documents) URLs. The tool implements a workaround for image attachments using GitHub's Markdown API, which converts URLs to temporary JWT-signed URLs that can be downloaded. However, this workaround only works for images - document attachments (PDFs, text files, etc.) will fail with 404 errors when using fine-grained tokens on private repos. For full attachment support on private repositories, use a classic token (``-t``) instead of a fine-grained token (``-f``). See `#477 `_ for details. + Run in Docker container ----------------------- diff --git a/github_backup/cli.py b/github_backup/cli.py index 54849d4..987ae71 100644 --- a/github_backup/cli.py +++ b/github_backup/cli.py @@ -46,6 +46,16 @@ def main(): "Use -t/--token or -f/--token-fine to authenticate." ) + # Issue #477: Fine-grained PATs cannot download all attachment types from + # private repos. Image attachments will be retried via Markdown API workaround. + if args.include_attachments and args.token_fine: + logger.warning( + "Using --attachments with fine-grained token. Due to GitHub platform " + "limitations, file attachments (PDFs, etc.) from private repos may fail. " + "Image attachments will be retried via workaround. For full attachment " + "support, use --token-classic instead." + ) + if args.quiet: logger.setLevel(logging.WARNING) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 8a60f66..705f013 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -1062,6 +1062,65 @@ def download_attachment_file(url, path, auth, as_app=False, fine=False): return metadata +def get_jwt_signed_url_via_markdown_api(url, token, repo_context): + """Convert a user-attachments/assets URL to a JWT-signed URL via Markdown API. + + GitHub's Markdown API renders image URLs and returns HTML containing + JWT-signed private-user-images.githubusercontent.com URLs that work + without token authentication. + + This is a workaround for issue #477 where fine-grained PATs cannot + download user-attachments URLs from private repos directly. + + Limitations: + - Only works for /assets/ URLs (images) + - Does NOT work for /files/ URLs (PDFs, text files, etc.) + - JWT URLs expire after ~5 minutes + + Args: + url: The github.com/user-attachments/assets/UUID URL + token: Raw fine-grained PAT (github_pat_...) + repo_context: Repository context as "owner/repo" + + Returns: + str: JWT-signed URL from private-user-images.githubusercontent.com + None: If conversion fails + """ + + try: + payload = json.dumps( + {"text": f"![img]({url})", "mode": "gfm", "context": repo_context} + ).encode("utf-8") + + request = Request("https://api.github.com/markdown", data=payload, method="POST") + request.add_header("Authorization", f"token {token}") + request.add_header("Content-Type", "application/json") + request.add_header("Accept", "application/vnd.github+json") + + html = urlopen(request, timeout=30).read().decode("utf-8") + + # Parse JWT-signed URL from HTML response + # Format: + if match := re.search( + r'src="(https://private-user-images\.githubusercontent\.com/[^"]+)"', html + ): + jwt_url = match.group(1) + logger.debug("Converted attachment URL to JWT-signed URL via Markdown API") + return jwt_url + + logger.debug("Markdown API response did not contain JWT-signed URL") + return None + + except HTTPError as e: + logger.debug( + "Markdown API request failed with HTTP {0}: {1}".format(e.code, e.reason) + ) + return None + except Exception as e: + logger.debug("Markdown API request failed: {0}".format(str(e))) + return None + + def extract_attachment_urls(item_data, issue_number=None, repository_full_name=None): """Extract GitHub-hosted attachment URLs from issue/PR body and comments. @@ -1415,15 +1474,46 @@ def download_attachments( filename = get_attachment_filename(url) filepath = os.path.join(attachments_dir, filename) - # Download and get metadata - metadata = download_attachment_file( - url, - filepath, - get_auth(args, encode=not args.as_app), - as_app=args.as_app, - fine=args.token_fine is not None, + # Issue #477: Fine-grained PATs cannot download user-attachments/assets + # from private repos directly (404). Use Markdown API workaround to get + # a JWT-signed URL. Only works for /assets/ (images), not /files/. + needs_jwt = ( + args.token_fine is not None + and repository.get("private", False) + and "github.com/user-attachments/assets/" in url ) + if not needs_jwt: + # NORMAL download path + metadata = download_attachment_file( + url, + filepath, + get_auth(args, encode=not args.as_app), + as_app=args.as_app, + fine=args.token_fine is not None, + ) + elif jwt_url := get_jwt_signed_url_via_markdown_api( + url, args.token_fine, repository["full_name"] + ): + # JWT needed and extracted, download via JWT + metadata = download_attachment_file( + jwt_url, filepath, auth=None, as_app=False, fine=False + ) + metadata["url"] = url # Apply back the original URL + metadata["jwt_workaround"] = True + else: + # Markdown API workaround failed - skip download we know will fail + metadata = { + "url": url, + "success": False, + "skipped_at": datetime.now(timezone.utc).isoformat(), + "error": "Fine-grained token cannot download private repo attachments. " + "Markdown API workaround failed. Use --token-classic instead.", + } + logger.warning( + "Skipping attachment {0}: {1}".format(url, metadata["error"]) + ) + # If download succeeded but we got an extension from Content-Disposition, # we may need to rename the file to add the extension if metadata["success"] and metadata.get("original_filename"): @@ -1951,7 +2041,9 @@ def backup_security_advisories(args, repo_cwd, repository, repos_template): logger.info("Retrieving {0} security advisories".format(repository["full_name"])) mkdir_p(repo_cwd, advisory_cwd) - template = "{0}/{1}/security-advisories".format(repos_template, repository["full_name"]) + template = "{0}/{1}/security-advisories".format( + repos_template, repository["full_name"] + ) _advisories = retrieve_data(args, template) diff --git a/tests/test_attachments.py b/tests/test_attachments.py index b338caf..4613984 100644 --- a/tests/test_attachments.py +++ b/tests/test_attachments.py @@ -349,3 +349,139 @@ def test_manifest_skips_permanent_failures(self, attachment_test_setup): downloaded_urls[0] == "https://github.com/user-attachments/assets/unavailable" ) + + +class TestJWTWorkaround: + """Test JWT workaround for fine-grained tokens on private repos (issue #477).""" + + def test_markdown_api_extracts_jwt_url(self): + """Markdown API response with JWT URL is extracted correctly.""" + from unittest.mock import patch, Mock + + html_response = '''

img

''' + + mock_response = Mock() + mock_response.read.return_value = html_response.encode("utf-8") + + with patch("github_backup.github_backup.urlopen", return_value=mock_response): + result = github_backup.get_jwt_signed_url_via_markdown_api( + "https://github.com/user-attachments/assets/abc123", + "github_pat_token", + "owner/repo" + ) + + assert result == "https://private-user-images.githubusercontent.com/123/abc.png?jwt=eyJhbGciOiJ" + + def test_markdown_api_returns_none_on_http_error(self): + """HTTP errors return None.""" + from unittest.mock import patch + from urllib.error import HTTPError + + with patch("github_backup.github_backup.urlopen", side_effect=HTTPError(None, 403, "Forbidden", {}, None)): + result = github_backup.get_jwt_signed_url_via_markdown_api( + "https://github.com/user-attachments/assets/abc123", + "github_pat_token", + "owner/repo" + ) + + assert result is None + + def test_markdown_api_returns_none_when_no_jwt_url(self): + """Response without JWT URL returns None.""" + from unittest.mock import patch, Mock + + mock_response = Mock() + mock_response.read.return_value = b"

No image here

" + + with patch("github_backup.github_backup.urlopen", return_value=mock_response): + result = github_backup.get_jwt_signed_url_via_markdown_api( + "https://github.com/user-attachments/assets/abc123", + "github_pat_token", + "owner/repo" + ) + + assert result is None + + def test_needs_jwt_only_for_fine_grained_private_assets(self): + """needs_jwt is True only for fine-grained + private + /assets/ URL.""" + assets_url = "https://github.com/user-attachments/assets/abc123" + files_url = "https://github.com/user-attachments/files/123/doc.pdf" + + # Fine-grained + private + assets = True + assert ( + "github_pat_" is not None + and True # private + and "github.com/user-attachments/assets/" in assets_url + ) is True + + # Fine-grained + private + files = False + assert ( + "github_pat_" is not None + and True + and "github.com/user-attachments/assets/" in files_url + ) is False + + # Fine-grained + public + assets = False + assert ( + "github_pat_" is not None + and False # public + and "github.com/user-attachments/assets/" in assets_url + ) is False + + def test_jwt_workaround_sets_manifest_flag(self, attachment_test_setup): + """Successful JWT workaround sets jwt_workaround flag in manifest.""" + from unittest.mock import patch, Mock + + setup = attachment_test_setup + setup["args"].token_fine = "github_pat_test" + setup["repository"]["private"] = True + + issue_data = {"body": "https://github.com/user-attachments/assets/abc123"} + + jwt_url = "https://private-user-images.githubusercontent.com/123/abc.png?jwt=token" + + with patch( + "github_backup.github_backup.get_jwt_signed_url_via_markdown_api", + return_value=jwt_url + ), patch( + "github_backup.github_backup.download_attachment_file", + return_value={"success": True, "http_status": 200, "url": jwt_url} + ): + github_backup.download_attachments( + setup["args"], setup["issue_cwd"], issue_data, 123, setup["repository"] + ) + + manifest_path = os.path.join(setup["issue_cwd"], "attachments", "123", "manifest.json") + with open(manifest_path) as f: + manifest = json.load(f) + + assert manifest["attachments"][0]["jwt_workaround"] is True + assert manifest["attachments"][0]["url"] == "https://github.com/user-attachments/assets/abc123" + + def test_jwt_workaround_failure_uses_skipped_at(self, attachment_test_setup): + """Failed JWT workaround uses skipped_at instead of downloaded_at.""" + from unittest.mock import patch + + setup = attachment_test_setup + setup["args"].token_fine = "github_pat_test" + setup["repository"]["private"] = True + + issue_data = {"body": "https://github.com/user-attachments/assets/abc123"} + + with patch( + "github_backup.github_backup.get_jwt_signed_url_via_markdown_api", + return_value=None # Markdown API failed + ): + github_backup.download_attachments( + setup["args"], setup["issue_cwd"], issue_data, 123, setup["repository"] + ) + + manifest_path = os.path.join(setup["issue_cwd"], "attachments", "123", "manifest.json") + with open(manifest_path) as f: + manifest = json.load(f) + + attachment = manifest["attachments"][0] + assert attachment["success"] is False + assert "skipped_at" in attachment + assert "downloaded_at" not in attachment + assert "Use --token-classic" in attachment["error"] From ab0eebb175009a07727bd23eb78b5e9f9e0f13bc Mon Sep 17 00:00:00 2001 From: Rodos Date: Tue, 13 Jan 2026 13:43:45 +1100 Subject: [PATCH 099/148] Refactor test fixtures to use shared create_args helper Uses the real parse_args() function to get CLI defaults, so when new arguments are added they're automatically available to all tests. Changes: - Add tests/conftest.py with create_args fixture - Update 8 test files to use shared fixture - Remove duplicate _create_mock_args methods - Remove redundant @pytest.fixture mock_args definitions This eliminates the need to update multiple test files when adding new CLI arguments. --- tests/conftest.py | 25 ++++++++ tests/test_all_starred.py | 62 +++----------------- tests/test_attachments.py | 72 +++++++++++------------ tests/test_case_sensitivity.py | 46 ++------------- tests/test_http_451.py | 36 ++---------- tests/test_pagination.py | 34 ++++------- tests/test_retrieve_data.py | 87 +++++++++------------------- tests/test_skip_assets_on.py | 76 +++++------------------- tests/test_starred_skip_size_over.py | 75 +++++++++--------------- 9 files changed, 158 insertions(+), 355 deletions(-) create mode 100644 tests/conftest.py diff --git a/tests/conftest.py b/tests/conftest.py new file mode 100644 index 0000000..b36fe64 --- /dev/null +++ b/tests/conftest.py @@ -0,0 +1,25 @@ +"""Shared pytest fixtures for github-backup tests.""" + +import pytest + +from github_backup.github_backup import parse_args + + +@pytest.fixture +def create_args(): + """Factory fixture that creates args with real CLI defaults. + + Uses the actual argument parser so new CLI args are automatically + available with their defaults - no test updates needed. + + Usage: + def test_something(self, create_args): + args = create_args(include_releases=True, user="myuser") + """ + def _create(**overrides): + # Use real parser to get actual defaults + args = parse_args(["testuser"]) + for key, value in overrides.items(): + setattr(args, key, value) + return args + return _create diff --git a/tests/test_all_starred.py b/tests/test_all_starred.py index 297d148..9776926 100644 --- a/tests/test_all_starred.py +++ b/tests/test_all_starred.py @@ -1,7 +1,7 @@ """Tests for --all-starred flag behavior (issue #225).""" import pytest -from unittest.mock import Mock, patch +from unittest.mock import patch from github_backup import github_backup @@ -12,58 +12,14 @@ class TestAllStarredCloning: Issue #225: --all-starred should clone starred repos without requiring --repositories. """ - def _create_mock_args(self, **overrides): - """Create a mock args object with sensible defaults.""" - args = Mock() - args.user = "testuser" - args.output_directory = "/tmp/backup" - args.include_repository = False - args.include_everything = False - args.include_gists = False - args.include_starred_gists = False - args.all_starred = False - args.skip_existing = False - args.bare_clone = False - args.lfs_clone = False - args.no_prune = False - args.include_wiki = False - args.include_issues = False - args.include_issue_comments = False - args.include_issue_events = False - args.include_pulls = False - args.include_pull_comments = False - args.include_pull_commits = False - args.include_pull_details = False - args.include_labels = False - args.include_hooks = False - args.include_milestones = False - args.include_security_advisories = False - args.include_releases = False - args.include_assets = False - args.include_attachments = False - args.incremental = False - args.incremental_by_files = False - args.github_host = None - args.prefer_ssh = False - args.token_classic = None - args.token_fine = None - args.as_app = False - args.osx_keychain_item_name = None - args.osx_keychain_item_account = None - - for key, value in overrides.items(): - setattr(args, key, value) - - return args - @patch('github_backup.github_backup.fetch_repository') @patch('github_backup.github_backup.get_github_repo_url') - def test_all_starred_clones_without_repositories_flag(self, mock_get_url, mock_fetch): + def test_all_starred_clones_without_repositories_flag(self, mock_get_url, mock_fetch, create_args): """--all-starred should clone starred repos without --repositories flag. This is the core fix for issue #225. """ - args = self._create_mock_args(all_starred=True) + args = create_args(all_starred=True) mock_get_url.return_value = "https://github.com/otheruser/awesome-project.git" # A starred repository (is_starred flag set by retrieve_repositories) @@ -88,9 +44,9 @@ def test_all_starred_clones_without_repositories_flag(self, mock_get_url, mock_f @patch('github_backup.github_backup.fetch_repository') @patch('github_backup.github_backup.get_github_repo_url') - def test_starred_repo_not_cloned_without_all_starred_flag(self, mock_get_url, mock_fetch): + def test_starred_repo_not_cloned_without_all_starred_flag(self, mock_get_url, mock_fetch, create_args): """Starred repos should NOT be cloned if --all-starred is not set.""" - args = self._create_mock_args(all_starred=False) + args = create_args(all_starred=False) mock_get_url.return_value = "https://github.com/otheruser/awesome-project.git" starred_repo = { @@ -111,9 +67,9 @@ def test_starred_repo_not_cloned_without_all_starred_flag(self, mock_get_url, mo @patch('github_backup.github_backup.fetch_repository') @patch('github_backup.github_backup.get_github_repo_url') - def test_non_starred_repo_not_cloned_with_only_all_starred(self, mock_get_url, mock_fetch): + def test_non_starred_repo_not_cloned_with_only_all_starred(self, mock_get_url, mock_fetch, create_args): """Non-starred repos should NOT be cloned when only --all-starred is set.""" - args = self._create_mock_args(all_starred=True) + args = create_args(all_starred=True) mock_get_url.return_value = "https://github.com/testuser/my-project.git" # A regular (non-starred) repository @@ -135,9 +91,9 @@ def test_non_starred_repo_not_cloned_with_only_all_starred(self, mock_get_url, m @patch('github_backup.github_backup.fetch_repository') @patch('github_backup.github_backup.get_github_repo_url') - def test_repositories_flag_still_works(self, mock_get_url, mock_fetch): + def test_repositories_flag_still_works(self, mock_get_url, mock_fetch, create_args): """--repositories flag should still clone repos as before.""" - args = self._create_mock_args(include_repository=True) + args = create_args(include_repository=True) mock_get_url.return_value = "https://github.com/testuser/my-project.git" regular_repo = { diff --git a/tests/test_attachments.py b/tests/test_attachments.py index 4613984..241a08f 100644 --- a/tests/test_attachments.py +++ b/tests/test_attachments.py @@ -4,7 +4,7 @@ import os import tempfile from pathlib import Path -from unittest.mock import Mock +from unittest.mock import Mock, patch import pytest @@ -12,22 +12,13 @@ @pytest.fixture -def attachment_test_setup(tmp_path): +def attachment_test_setup(tmp_path, create_args): """Fixture providing setup and helper for attachment download tests.""" - from unittest.mock import patch - issue_cwd = tmp_path / "issues" issue_cwd.mkdir() - # Mock args - args = Mock() - args.as_app = False - args.token_fine = None - args.token_classic = None - args.osx_keychain_item_name = None - args.osx_keychain_item_account = None - args.user = "testuser" - args.repository = "testrepo" + # Create args using shared fixture + args = create_args(user="testuser", repository="testrepo") repository = {"full_name": "testuser/testrepo"} @@ -356,9 +347,12 @@ class TestJWTWorkaround: def test_markdown_api_extracts_jwt_url(self): """Markdown API response with JWT URL is extracted correctly.""" - from unittest.mock import patch, Mock - - html_response = '''

img

''' + html_response = ( + '

' + ) mock_response = Mock() mock_response.read.return_value = html_response.encode("utf-8") @@ -370,14 +364,18 @@ def test_markdown_api_extracts_jwt_url(self): "owner/repo" ) - assert result == "https://private-user-images.githubusercontent.com/123/abc.png?jwt=eyJhbGciOiJ" + expected = ( + "https://private-user-images.githubusercontent.com" + "/123/abc.png?jwt=eyJhbGciOiJ" + ) + assert result == expected def test_markdown_api_returns_none_on_http_error(self): """HTTP errors return None.""" - from unittest.mock import patch from urllib.error import HTTPError - with patch("github_backup.github_backup.urlopen", side_effect=HTTPError(None, 403, "Forbidden", {}, None)): + error = HTTPError("http://test", 403, "Forbidden", {}, None) + with patch("github_backup.github_backup.urlopen", side_effect=error): result = github_backup.get_jwt_signed_url_via_markdown_api( "https://github.com/user-attachments/assets/abc123", "github_pat_token", @@ -388,8 +386,6 @@ def test_markdown_api_returns_none_on_http_error(self): def test_markdown_api_returns_none_when_no_jwt_url(self): """Response without JWT URL returns None.""" - from unittest.mock import patch, Mock - mock_response = Mock() mock_response.read.return_value = b"

No image here

" @@ -406,32 +402,36 @@ def test_needs_jwt_only_for_fine_grained_private_assets(self): """needs_jwt is True only for fine-grained + private + /assets/ URL.""" assets_url = "https://github.com/user-attachments/assets/abc123" files_url = "https://github.com/user-attachments/files/123/doc.pdf" + token_fine = "github_pat_test" + private = True + public = False # Fine-grained + private + assets = True - assert ( - "github_pat_" is not None - and True # private + needs_jwt = ( + token_fine is not None + and private and "github.com/user-attachments/assets/" in assets_url - ) is True + ) + assert needs_jwt is True # Fine-grained + private + files = False - assert ( - "github_pat_" is not None - and True + needs_jwt = ( + token_fine is not None + and private and "github.com/user-attachments/assets/" in files_url - ) is False + ) + assert needs_jwt is False # Fine-grained + public + assets = False - assert ( - "github_pat_" is not None - and False # public + needs_jwt = ( + token_fine is not None + and public and "github.com/user-attachments/assets/" in assets_url - ) is False + ) + assert needs_jwt is False def test_jwt_workaround_sets_manifest_flag(self, attachment_test_setup): """Successful JWT workaround sets jwt_workaround flag in manifest.""" - from unittest.mock import patch, Mock - setup = attachment_test_setup setup["args"].token_fine = "github_pat_test" setup["repository"]["private"] = True @@ -460,8 +460,6 @@ def test_jwt_workaround_sets_manifest_flag(self, attachment_test_setup): def test_jwt_workaround_failure_uses_skipped_at(self, attachment_test_setup): """Failed JWT workaround uses skipped_at instead of downloaded_at.""" - from unittest.mock import patch - setup = attachment_test_setup setup["args"].token_fine = "github_pat_test" setup["repository"]["private"] = True diff --git a/tests/test_case_sensitivity.py b/tests/test_case_sensitivity.py index 058a7df..795c14b 100644 --- a/tests/test_case_sensitivity.py +++ b/tests/test_case_sensitivity.py @@ -1,7 +1,6 @@ """Tests for case-insensitive username/organization filtering.""" import pytest -from unittest.mock import Mock from github_backup import github_backup @@ -9,25 +8,14 @@ class TestCaseSensitivity: """Test suite for case-insensitive username matching in filter_repositories.""" - def test_filter_repositories_case_insensitive_user(self): + def test_filter_repositories_case_insensitive_user(self, create_args): """Should filter repositories case-insensitively for usernames. Reproduces issue #198 where typing 'iamrodos' fails to match repositories with owner.login='Iamrodos' (the canonical case from GitHub API). """ # Simulate user typing lowercase username - args = Mock() - args.user = "iamrodos" # lowercase (what user typed) - args.repository = None - args.name_regex = None - args.languages = None - args.exclude = None - args.fork = False - args.private = False - args.public = False - args.all = True - args.skip_archived = False - args.starred_skip_size_over = None + args = create_args(user="iamrodos") # Simulate GitHub API returning canonical case repos = [ @@ -52,23 +40,12 @@ def test_filter_repositories_case_insensitive_user(self): assert filtered[0]["name"] == "repo1" assert filtered[1]["name"] == "repo2" - def test_filter_repositories_case_insensitive_org(self): + def test_filter_repositories_case_insensitive_org(self, create_args): """Should filter repositories case-insensitively for organizations. Tests the example from issue #198 where 'prai-org' doesn't match 'PRAI-Org'. """ - args = Mock() - args.user = "prai-org" # lowercase (what user typed) - args.repository = None - args.name_regex = None - args.languages = None - args.exclude = None - args.fork = False - args.private = False - args.public = False - args.all = True - args.skip_archived = False - args.starred_skip_size_over = None + args = create_args(user="prai-org") repos = [ { @@ -85,20 +62,9 @@ def test_filter_repositories_case_insensitive_org(self): assert len(filtered) == 1 assert filtered[0]["name"] == "repo1" - def test_filter_repositories_case_variations(self): + def test_filter_repositories_case_variations(self, create_args): """Should handle various case combinations correctly.""" - args = Mock() - args.user = "TeSt-UsEr" # Mixed case - args.repository = None - args.name_regex = None - args.languages = None - args.exclude = None - args.fork = False - args.private = False - args.public = False - args.all = True - args.skip_archived = False - args.starred_skip_size_over = None + args = create_args(user="TeSt-UsEr") repos = [ {"name": "repo1", "owner": {"login": "test-user"}, "private": False, "fork": False}, diff --git a/tests/test_http_451.py b/tests/test_http_451.py index bb825f7..b556069 100644 --- a/tests/test_http_451.py +++ b/tests/test_http_451.py @@ -11,17 +11,9 @@ class TestHTTP451Exception: """Test suite for HTTP 451 DMCA takedown exception handling.""" - def test_repository_unavailable_error_raised(self): + def test_repository_unavailable_error_raised(self, create_args): """HTTP 451 should raise RepositoryUnavailableError with DMCA URL.""" - args = Mock() - args.as_app = False - args.token_fine = None - args.token_classic = None - args.osx_keychain_item_name = None - args.osx_keychain_item_account = None - args.throttle_limit = None - args.throttle_pause = 0 - args.max_retries = 5 + args = create_args() mock_response = Mock() mock_response.getcode.return_value = 451 @@ -53,17 +45,9 @@ def test_repository_unavailable_error_raised(self): ) assert "451" in str(exc_info.value) - def test_repository_unavailable_error_without_dmca_url(self): + def test_repository_unavailable_error_without_dmca_url(self, create_args): """HTTP 451 without DMCA details should still raise exception.""" - args = Mock() - args.as_app = False - args.token_fine = None - args.token_classic = None - args.osx_keychain_item_name = None - args.osx_keychain_item_account = None - args.throttle_limit = None - args.throttle_pause = 0 - args.max_retries = 5 + args = create_args() mock_response = Mock() mock_response.getcode.return_value = 451 @@ -83,17 +67,9 @@ def test_repository_unavailable_error_without_dmca_url(self): assert exc_info.value.dmca_url is None assert "451" in str(exc_info.value) - def test_repository_unavailable_error_with_malformed_json(self): + def test_repository_unavailable_error_with_malformed_json(self, create_args): """HTTP 451 with malformed JSON should still raise exception.""" - args = Mock() - args.as_app = False - args.token_fine = None - args.token_classic = None - args.osx_keychain_item_name = None - args.osx_keychain_item_account = None - args.throttle_limit = None - args.throttle_pause = 0 - args.max_retries = 5 + args = create_args() mock_response = Mock() mock_response.getcode.return_value = 451 diff --git a/tests/test_pagination.py b/tests/test_pagination.py index e35ff38..1931042 100644 --- a/tests/test_pagination.py +++ b/tests/test_pagination.py @@ -1,9 +1,7 @@ """Tests for Link header pagination handling.""" import json -from unittest.mock import Mock, patch - -import pytest +from unittest.mock import patch from github_backup import github_backup @@ -38,23 +36,9 @@ def headers(self): return headers -@pytest.fixture -def mock_args(): - """Mock args for retrieve_data.""" - args = Mock() - args.as_app = False - args.token_fine = None - args.token_classic = "fake_token" - args.osx_keychain_item_name = None - args.osx_keychain_item_account = None - args.throttle_limit = None - args.throttle_pause = 0 - args.max_retries = 5 - return args - - -def test_cursor_based_pagination(mock_args): +def test_cursor_based_pagination(create_args): """Link header with 'after' cursor parameter works correctly.""" + args = create_args(token_classic="fake_token") # Simulate issues endpoint behavior: returns cursor in Link header responses = [ @@ -77,7 +61,7 @@ def mock_urlopen(request, *args, **kwargs): with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): results = github_backup.retrieve_data( - mock_args, "https://api.github.com/repos/owner/repo/issues" + args, "https://api.github.com/repos/owner/repo/issues" ) # Verify all items retrieved and cursor was used in second request @@ -86,8 +70,9 @@ def mock_urlopen(request, *args, **kwargs): assert "after=ABC123" in requests_made[1] -def test_page_based_pagination(mock_args): +def test_page_based_pagination(create_args): """Link header with 'page' parameter works correctly.""" + args = create_args(token_classic="fake_token") # Simulate pulls/repos endpoint behavior: returns page numbers in Link header responses = [ @@ -110,7 +95,7 @@ def mock_urlopen(request, *args, **kwargs): with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): results = github_backup.retrieve_data( - mock_args, "https://api.github.com/repos/owner/repo/pulls" + args, "https://api.github.com/repos/owner/repo/pulls" ) # Verify all items retrieved and page parameter was used (not cursor) @@ -120,8 +105,9 @@ def mock_urlopen(request, *args, **kwargs): assert "after" not in requests_made[1] -def test_no_link_header_stops_pagination(mock_args): +def test_no_link_header_stops_pagination(create_args): """Pagination stops when Link header is absent.""" + args = create_args(token_classic="fake_token") # Simulate endpoint with results that fit in a single page responses = [ @@ -138,7 +124,7 @@ def mock_urlopen(request, *args, **kwargs): with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): results = github_backup.retrieve_data( - mock_args, "https://api.github.com/repos/owner/repo/labels" + args, "https://api.github.com/repos/owner/repo/labels" ) # Verify pagination stopped after first request diff --git a/tests/test_retrieve_data.py b/tests/test_retrieve_data.py index fa82bd7..159f06e 100644 --- a/tests/test_retrieve_data.py +++ b/tests/test_retrieve_data.py @@ -63,21 +63,9 @@ def test_minimum_rate_limit_delay(self): class TestRetrieveDataRetry: """Tests for retry behavior in retrieve_data.""" - @pytest.fixture - def mock_args(self): - args = Mock() - args.as_app = False - args.token_fine = None - args.token_classic = "fake_token" - args.osx_keychain_item_name = None - args.osx_keychain_item_account = None - args.throttle_limit = None - args.throttle_pause = 0 - args.max_retries = DEFAULT_MAX_RETRIES - return args - - def test_json_parse_error_retries_and_fails(self, mock_args): + def test_json_parse_error_retries_and_fails(self, create_args): """HTTP 200 with invalid JSON should retry and eventually fail.""" + args = create_args(token_classic="fake_token") mock_response = Mock() mock_response.getcode.return_value = 200 mock_response.read.return_value = b"not valid json {" @@ -85,7 +73,7 @@ def test_json_parse_error_retries_and_fails(self, mock_args): call_count = 0 - def mock_make_request(*args, **kwargs): + def mock_make_request(*a, **kw): nonlocal call_count call_count += 1 return mock_response @@ -99,7 +87,7 @@ def mock_make_request(*args, **kwargs): ): # No delay in tests with pytest.raises(Exception) as exc_info: github_backup.retrieve_data( - mock_args, "https://api.github.com/repos/test/repo/issues" + args, "https://api.github.com/repos/test/repo/issues" ) assert "Failed to read response after" in str(exc_info.value) @@ -107,8 +95,9 @@ def mock_make_request(*args, **kwargs): call_count == DEFAULT_MAX_RETRIES + 1 ) # 1 initial + 5 retries = 6 attempts - def test_json_parse_error_recovers_on_retry(self, mock_args): + def test_json_parse_error_recovers_on_retry(self, create_args): """HTTP 200 with invalid JSON should succeed if retry returns valid JSON.""" + args = create_args(token_classic="fake_token") bad_response = Mock() bad_response.getcode.return_value = 200 bad_response.read.return_value = b"not valid json {" @@ -122,7 +111,7 @@ def test_json_parse_error_recovers_on_retry(self, mock_args): responses = [bad_response, bad_response, good_response] call_count = 0 - def mock_make_request(*args, **kwargs): + def mock_make_request(*a, **kw): nonlocal call_count result = responses[call_count] call_count += 1 @@ -136,14 +125,15 @@ def mock_make_request(*args, **kwargs): "github_backup.github_backup.calculate_retry_delay", return_value=0 ): result = github_backup.retrieve_data( - mock_args, "https://api.github.com/repos/test/repo/issues" + args, "https://api.github.com/repos/test/repo/issues" ) assert result == [{"id": 1}] assert call_count == 3 # Failed twice, succeeded on third - def test_http_error_raises_exception(self, mock_args): + def test_http_error_raises_exception(self, create_args): """Non-success HTTP status codes should raise Exception.""" + args = create_args(token_classic="fake_token") mock_response = Mock() mock_response.getcode.return_value = 404 mock_response.read.return_value = b'{"message": "Not Found"}' @@ -156,7 +146,7 @@ def test_http_error_raises_exception(self, mock_args): ): with pytest.raises(Exception) as exc_info: github_backup.retrieve_data( - mock_args, "https://api.github.com/repos/test/notfound/issues" + args, "https://api.github.com/repos/test/notfound/issues" ) assert not isinstance( @@ -346,21 +336,13 @@ def mock_urlopen(*args, **kwargs): class TestRetrieveDataThrottling: """Tests for throttling behavior in retrieve_data.""" - @pytest.fixture - def mock_args(self): - args = Mock() - args.as_app = False - args.token_fine = None - args.token_classic = "fake_token" - args.osx_keychain_item_name = None - args.osx_keychain_item_account = None - args.throttle_limit = 10 # Throttle when remaining <= 10 - args.throttle_pause = 5 # Pause 5 seconds - args.max_retries = DEFAULT_MAX_RETRIES - return args - - def test_throttling_pauses_when_rate_limit_low(self, mock_args): + def test_throttling_pauses_when_rate_limit_low(self, create_args): """Should pause when x-ratelimit-remaining is at or below throttle_limit.""" + args = create_args( + token_classic="fake_token", + throttle_limit=10, + throttle_pause=5, + ) mock_response = Mock() mock_response.getcode.return_value = 200 mock_response.read.return_value = json.dumps([{"id": 1}]).encode("utf-8") @@ -375,7 +357,7 @@ def test_throttling_pauses_when_rate_limit_low(self, mock_args): ): with patch("github_backup.github_backup.time.sleep") as mock_sleep: github_backup.retrieve_data( - mock_args, "https://api.github.com/repos/test/repo/issues" + args, "https://api.github.com/repos/test/repo/issues" ) mock_sleep.assert_called_once_with(5) # throttle_pause value @@ -384,21 +366,9 @@ def test_throttling_pauses_when_rate_limit_low(self, mock_args): class TestRetrieveDataSingleItem: """Tests for single item (dict) responses in retrieve_data.""" - @pytest.fixture - def mock_args(self): - args = Mock() - args.as_app = False - args.token_fine = None - args.token_classic = "fake_token" - args.osx_keychain_item_name = None - args.osx_keychain_item_account = None - args.throttle_limit = None - args.throttle_pause = 0 - args.max_retries = DEFAULT_MAX_RETRIES - return args - - def test_dict_response_returned_as_list(self, mock_args): + def test_dict_response_returned_as_list(self, create_args): """Single dict response should be returned as a list with one item.""" + args = create_args(token_classic="fake_token") mock_response = Mock() mock_response.getcode.return_value = 200 mock_response.read.return_value = json.dumps( @@ -411,7 +381,7 @@ def test_dict_response_returned_as_list(self, mock_args): return_value=mock_response, ): result = github_backup.retrieve_data( - mock_args, "https://api.github.com/user" + args, "https://api.github.com/user" ) assert result == [{"login": "testuser", "id": 123}] @@ -474,17 +444,12 @@ def mock_urlopen(*args, **kwargs): assert result == good_response assert call_count == 2 # 1 initial + 1 retry = 2 attempts - def test_custom_retry_count_limits_attempts(self): + def test_custom_retry_count_limits_attempts(self, create_args): """Custom --retries value should limit actual retry attempts.""" - args = Mock() - args.as_app = False - args.token_fine = None - args.token_classic = "fake_token" - args.osx_keychain_item_name = None - args.osx_keychain_item_account = None - args.throttle_limit = None - args.throttle_pause = 0 - args.max_retries = 2 # 2 retries = 3 total attempts (1 initial + 2 retries) + args = create_args( + token_classic="fake_token", + max_retries=2, # 2 retries = 3 total attempts (1 initial + 2 retries) + ) mock_response = Mock() mock_response.getcode.return_value = 200 diff --git a/tests/test_skip_assets_on.py b/tests/test_skip_assets_on.py index ce28287..519750e 100644 --- a/tests/test_skip_assets_on.py +++ b/tests/test_skip_assets_on.py @@ -1,7 +1,7 @@ """Tests for --skip-assets-on flag behavior (issue #135).""" import pytest -from unittest.mock import Mock, patch +from unittest.mock import patch from github_backup import github_backup @@ -13,52 +13,6 @@ class TestSkipAssetsOn: while still backing up release metadata. """ - def _create_mock_args(self, **overrides): - """Create a mock args object with sensible defaults.""" - args = Mock() - args.user = "testuser" - args.output_directory = "/tmp/backup" - args.include_repository = False - args.include_everything = False - args.include_gists = False - args.include_starred_gists = False - args.all_starred = False - args.skip_existing = False - args.bare_clone = False - args.lfs_clone = False - args.no_prune = False - args.include_wiki = False - args.include_issues = False - args.include_issue_comments = False - args.include_issue_events = False - args.include_pulls = False - args.include_pull_comments = False - args.include_pull_commits = False - args.include_pull_details = False - args.include_labels = False - args.include_hooks = False - args.include_milestones = False - args.include_releases = True - args.include_assets = True - args.skip_assets_on = [] - args.include_attachments = False - args.incremental = False - args.incremental_by_files = False - args.github_host = None - args.prefer_ssh = False - args.token_classic = "test-token" - args.token_fine = None - args.as_app = False - args.osx_keychain_item_name = None - args.osx_keychain_item_account = None - args.skip_prerelease = False - args.number_of_latest_releases = None - - for key, value in overrides.items(): - setattr(args, key, value) - - return args - def _create_mock_repository(self, name="test-repo", owner="testuser"): """Create a mock repository object.""" return { @@ -123,10 +77,10 @@ class TestSkipAssetsOnBehavior(TestSkipAssetsOn): @patch("github_backup.github_backup.mkdir_p") @patch("github_backup.github_backup.json_dump_if_changed") def test_assets_downloaded_when_not_skipped( - self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download + self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download, create_args ): """Assets should be downloaded when repo is not in skip list.""" - args = self._create_mock_args(skip_assets_on=[]) + args = create_args(skip_assets_on=[]) repository = self._create_mock_repository(name="normal-repo") release = self._create_mock_release() asset = self._create_mock_asset() @@ -154,10 +108,10 @@ def test_assets_downloaded_when_not_skipped( @patch("github_backup.github_backup.mkdir_p") @patch("github_backup.github_backup.json_dump_if_changed") def test_assets_skipped_when_repo_name_matches( - self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download + self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download, create_args ): """Assets should be skipped when repo name is in skip list.""" - args = self._create_mock_args(skip_assets_on=["big-repo"]) + args = create_args(skip_assets_on=["big-repo"]) repository = self._create_mock_repository(name="big-repo") release = self._create_mock_release() @@ -180,10 +134,10 @@ def test_assets_skipped_when_repo_name_matches( @patch("github_backup.github_backup.mkdir_p") @patch("github_backup.github_backup.json_dump_if_changed") def test_assets_skipped_when_full_name_matches( - self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download + self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download, create_args ): """Assets should be skipped when owner/repo format matches.""" - args = self._create_mock_args(skip_assets_on=["otheruser/big-repo"]) + args = create_args(skip_assets_on=["otheruser/big-repo"]) repository = self._create_mock_repository(name="big-repo", owner="otheruser") release = self._create_mock_release() @@ -206,11 +160,11 @@ def test_assets_skipped_when_full_name_matches( @patch("github_backup.github_backup.mkdir_p") @patch("github_backup.github_backup.json_dump_if_changed") def test_case_insensitive_matching( - self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download + self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download, create_args ): """Skip matching should be case-insensitive.""" # User types uppercase, repo name is lowercase - args = self._create_mock_args(skip_assets_on=["BIG-REPO"]) + args = create_args(skip_assets_on=["BIG-REPO"]) repository = self._create_mock_repository(name="big-repo") release = self._create_mock_release() @@ -233,10 +187,10 @@ def test_case_insensitive_matching( @patch("github_backup.github_backup.mkdir_p") @patch("github_backup.github_backup.json_dump_if_changed") def test_multiple_skip_repos( - self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download + self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download, create_args ): """Multiple repos in skip list should all be skipped.""" - args = self._create_mock_args(skip_assets_on=["repo1", "repo2", "repo3"]) + args = create_args(skip_assets_on=["repo1", "repo2", "repo3"]) repository = self._create_mock_repository(name="repo2") release = self._create_mock_release() @@ -259,10 +213,10 @@ def test_multiple_skip_repos( @patch("github_backup.github_backup.mkdir_p") @patch("github_backup.github_backup.json_dump_if_changed") def test_release_metadata_still_saved_when_assets_skipped( - self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download + self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download, create_args ): """Release JSON should still be saved even when assets are skipped.""" - args = self._create_mock_args(skip_assets_on=["big-repo"]) + args = create_args(skip_assets_on=["big-repo"]) repository = self._create_mock_repository(name="big-repo") release = self._create_mock_release() @@ -287,10 +241,10 @@ def test_release_metadata_still_saved_when_assets_skipped( @patch("github_backup.github_backup.mkdir_p") @patch("github_backup.github_backup.json_dump_if_changed") def test_non_matching_repo_still_downloads_assets( - self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download + self, mock_json_dump, mock_mkdir, mock_retrieve, mock_download, create_args ): """Repos not in skip list should still download assets.""" - args = self._create_mock_args(skip_assets_on=["other-repo"]) + args = create_args(skip_assets_on=["other-repo"]) repository = self._create_mock_repository(name="normal-repo") release = self._create_mock_release() asset = self._create_mock_asset() diff --git a/tests/test_starred_skip_size_over.py b/tests/test_starred_skip_size_over.py index 2deb72a..250d191 100644 --- a/tests/test_starred_skip_size_over.py +++ b/tests/test_starred_skip_size_over.py @@ -1,39 +1,11 @@ """Tests for --starred-skip-size-over flag behavior (issue #108).""" import pytest -from unittest.mock import Mock from github_backup import github_backup -class TestStarredSkipSizeOver: - """Test suite for --starred-skip-size-over flag. - - Issue #108: Allow restricting size of starred repositories before cloning. - The size is based on the GitHub API's 'size' field (in KB), but the CLI - argument accepts MB for user convenience. - """ - - def _create_mock_args(self, **overrides): - """Create a mock args object with sensible defaults.""" - args = Mock() - args.user = "testuser" - args.repository = None - args.name_regex = None - args.languages = None - args.fork = False - args.private = False - args.skip_archived = False - args.starred_skip_size_over = None - args.exclude = None - - for key, value in overrides.items(): - setattr(args, key, value) - - return args - - -class TestStarredSkipSizeOverArgumentParsing(TestStarredSkipSizeOver): +class TestStarredSkipSizeOverArgumentParsing: """Tests for --starred-skip-size-over argument parsing.""" def test_starred_skip_size_over_not_set_defaults_to_none(self): @@ -52,12 +24,17 @@ def test_starred_skip_size_over_rejects_non_integer(self): github_backup.parse_args(["testuser", "--starred-skip-size-over", "abc"]) -class TestStarredSkipSizeOverFiltering(TestStarredSkipSizeOver): - """Tests for --starred-skip-size-over filtering behavior.""" +class TestStarredSkipSizeOverFiltering: + """Tests for --starred-skip-size-over filtering behavior. + + Issue #108: Allow restricting size of starred repositories before cloning. + The size is based on the GitHub API's 'size' field (in KB), but the CLI + argument accepts MB for user convenience. + """ - def test_starred_repo_under_limit_is_kept(self): + def test_starred_repo_under_limit_is_kept(self, create_args): """Starred repos under the size limit should be kept.""" - args = self._create_mock_args(starred_skip_size_over=500) + args = create_args(starred_skip_size_over=500) repos = [ { @@ -72,9 +49,9 @@ def test_starred_repo_under_limit_is_kept(self): assert len(result) == 1 assert result[0]["name"] == "small-repo" - def test_starred_repo_over_limit_is_filtered(self): + def test_starred_repo_over_limit_is_filtered(self, create_args): """Starred repos over the size limit should be filtered out.""" - args = self._create_mock_args(starred_skip_size_over=500) + args = create_args(starred_skip_size_over=500) repos = [ { @@ -88,9 +65,9 @@ def test_starred_repo_over_limit_is_filtered(self): result = github_backup.filter_repositories(args, repos) assert len(result) == 0 - def test_own_repo_over_limit_is_kept(self): + def test_own_repo_over_limit_is_kept(self, create_args): """User's own repos should not be affected by the size limit.""" - args = self._create_mock_args(starred_skip_size_over=500) + args = create_args(starred_skip_size_over=500) repos = [ { @@ -105,9 +82,9 @@ def test_own_repo_over_limit_is_kept(self): assert len(result) == 1 assert result[0]["name"] == "my-huge-repo" - def test_starred_repo_at_exact_limit_is_kept(self): + def test_starred_repo_at_exact_limit_is_kept(self, create_args): """Starred repos at exactly the size limit should be kept.""" - args = self._create_mock_args(starred_skip_size_over=500) + args = create_args(starred_skip_size_over=500) repos = [ { @@ -122,9 +99,9 @@ def test_starred_repo_at_exact_limit_is_kept(self): assert len(result) == 1 assert result[0]["name"] == "exact-limit-repo" - def test_mixed_repos_filtered_correctly(self): + def test_mixed_repos_filtered_correctly(self, create_args): """Mix of own and starred repos should be filtered correctly.""" - args = self._create_mock_args(starred_skip_size_over=500) + args = create_args(starred_skip_size_over=500) repos = [ { @@ -153,9 +130,9 @@ def test_mixed_repos_filtered_correctly(self): assert "starred-small" in names assert "starred-huge" not in names - def test_no_size_limit_keeps_all_starred(self): + def test_no_size_limit_keeps_all_starred(self, create_args): """When no size limit is set, all starred repos should be kept.""" - args = self._create_mock_args(starred_skip_size_over=None) + args = create_args(starred_skip_size_over=None) repos = [ { @@ -169,9 +146,9 @@ def test_no_size_limit_keeps_all_starred(self): result = github_backup.filter_repositories(args, repos) assert len(result) == 1 - def test_repo_without_size_field_is_kept(self): + def test_repo_without_size_field_is_kept(self, create_args): """Repos without a size field should be kept (size defaults to 0).""" - args = self._create_mock_args(starred_skip_size_over=500) + args = create_args(starred_skip_size_over=500) repos = [ { @@ -185,9 +162,9 @@ def test_repo_without_size_field_is_kept(self): result = github_backup.filter_repositories(args, repos) assert len(result) == 1 - def test_zero_value_warns_and_is_ignored(self, caplog): + def test_zero_value_warns_and_is_ignored(self, create_args, caplog): """Zero value should warn and keep all repos.""" - args = self._create_mock_args(starred_skip_size_over=0) + args = create_args(starred_skip_size_over=0) repos = [ { @@ -202,9 +179,9 @@ def test_zero_value_warns_and_is_ignored(self, caplog): assert len(result) == 1 assert "must be greater than 0" in caplog.text - def test_negative_value_warns_and_is_ignored(self, caplog): + def test_negative_value_warns_and_is_ignored(self, create_args, caplog): """Negative value should warn and keep all repos.""" - args = self._create_mock_args(starred_skip_size_over=-5) + args = create_args(starred_skip_size_over=-5) repos = [ { From 6780d3ad6c86228f6eaf06f5656efdbee6870d9f Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Tue, 13 Jan 2026 23:10:05 +0000 Subject: [PATCH 100/148] Release version 0.61.1 --- CHANGES.rst | 37 ++++++++++++++++++++++++++++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 37 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 0e66663..e44cd3f 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,9 +1,44 @@ Changelog ========= -0.61.0 (2026-01-12) +0.61.1 (2026-01-13) ------------------- ------------------------ +- Refactor test fixtures to use shared create_args helper. [Rodos] + + Uses the real parse_args() function to get CLI defaults, so when + new arguments are added they're automatically available to all tests. + + Changes: + - Add tests/conftest.py with create_args fixture + - Update 8 test files to use shared fixture + - Remove duplicate _create_mock_args methods + - Remove redundant @pytest.fixture mock_args definitions + + This eliminates the need to update multiple test files when + adding new CLI arguments. +- Fix fine-grained PAT attachment downloads for private repos (#477) + [Rodos] + + Fine-grained personal access tokens cannot download attachments from + private repositories directly due to a GitHub platform limitation. + + This adds a workaround for image attachments (/assets/ URLs) using + GitHub's Markdown API to convert URLs to JWT-signed URLs that can be + downloaded without authentication. + + Changes: + - Add get_jwt_signed_url_via_markdown_api() function + - Detect fine-grained token + private repo + /assets/ URL upfront + - Use JWT workaround for those cases, mark success with jwt_workaround flag + - Skip download with skipped_at when workaround fails + - Add startup warning when using --attachments with fine-grained tokens + - Document limitation in README (file attachments still fail) + - Add 6 unit tests for JWT workaround logic + + +0.61.0 (2026-01-12) +------------------- - Docs: Add missing `--retries` argument to README. [Lukas Bestle] - Test: Adapt tests to new argument. [Lukas Bestle] - Feat: Backup of repository security advisories. [Lukas Bestle] diff --git a/github_backup/__init__.py b/github_backup/__init__.py index a076e5d..daa1407 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.61.0" +__version__ = "0.61.1" From 93e505c07da4cf02e4257933c003471a2ecc53f8 Mon Sep 17 00:00:00 2001 From: Lukas Bestle Date: Wed, 14 Jan 2026 21:01:59 +0100 Subject: [PATCH 101/148] fix: Handle 404 errors on security advisories --- github_backup/github_backup.py | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 705f013..9d96f3b 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -2045,7 +2045,13 @@ def backup_security_advisories(args, repo_cwd, repository, repos_template): repos_template, repository["full_name"] ) - _advisories = retrieve_data(args, template) + try: + _advisories = retrieve_data(args, template) + except Exception as e: + if "404" in str(e): + logger.info("Security advisories are not available for this repository, skipping") + return + raise advisories = {} for advisory in _advisories: From c6fa8c76955e881cbcc5fa9b9cf301e114fdcea7 Mon Sep 17 00:00:00 2001 From: Lukas Bestle Date: Wed, 14 Jan 2026 21:02:51 +0100 Subject: [PATCH 102/148] feat: Only make security advisory dir if successful Avoids empty directories for private repos --- github_backup/github_backup.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 9d96f3b..fdc18f9 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -2039,7 +2039,6 @@ def backup_security_advisories(args, repo_cwd, repository, repos_template): return logger.info("Retrieving {0} security advisories".format(repository["full_name"])) - mkdir_p(repo_cwd, advisory_cwd) template = "{0}/{1}/security-advisories".format( repos_template, repository["full_name"] @@ -2053,6 +2052,8 @@ def backup_security_advisories(args, repo_cwd, repository, repos_template): return raise + mkdir_p(repo_cwd, advisory_cwd) + advisories = {} for advisory in _advisories: advisories[advisory["ghsa_id"]] = advisory From 856ad5db415f0df0e94462b7929c264ec2aeb818 Mon Sep 17 00:00:00 2001 From: Lukas Bestle Date: Wed, 14 Jan 2026 21:03:17 +0100 Subject: [PATCH 103/148] fix: Skip security advisories for private repos unless explicitly requested --- github_backup/github_backup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index fdc18f9..346d541 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -1814,7 +1814,7 @@ def backup_repositories(args, output_directory, repositories): if args.include_milestones or args.include_everything: backup_milestones(args, repo_cwd, repository, repos_template) - if args.include_security_advisories or args.include_everything: + if args.include_security_advisories or (args.include_everything and not repository["Private"]): backup_security_advisories(args, repo_cwd, repository, repos_template) if args.include_labels or args.include_everything: From 1181f811b704d58e971a7686240694c63c3e6a50 Mon Sep 17 00:00:00 2001 From: Lukas Bestle Date: Fri, 16 Jan 2026 08:52:45 +0100 Subject: [PATCH 104/148] docs: Explain security advisories in README --- README.rst | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/README.rst b/README.rst index c23027d..cd7be1f 100644 --- a/README.rst +++ b/README.rst @@ -284,6 +284,17 @@ The tool automatically extracts file extensions from HTTP headers to ensure file **Fine-grained token limitation:** Due to a GitHub platform limitation, fine-grained personal access tokens (``github_pat_...``) cannot download attachments from private repositories directly. This affects both ``/assets/`` (images) and ``/files/`` (documents) URLs. The tool implements a workaround for image attachments using GitHub's Markdown API, which converts URLs to temporary JWT-signed URLs that can be downloaded. However, this workaround only works for images - document attachments (PDFs, text files, etc.) will fail with 404 errors when using fine-grained tokens on private repos. For full attachment support on private repositories, use a classic token (``-t``) instead of a fine-grained token (``-f``). See `#477 `_ for details. +About security advisories +------------------------- + +GitHub security advisories are only available in public repositories. GitHub does not provide the respective API endpoint for private repositories. + +Therefore the logic is implemented as follows: +- Security advisories are included in the `--all` option. +- If only the `--all` option was provided, backups of security advisories are skipped for private repositories. +- If the `--security-advisories` option is provided (on its own or in addition to `--all`), a backup of security advisories is attempted for all repositories, with graceful handling if the GitHub API doesn't return any. + + Run in Docker container ----------------------- From e6283f93847b5378bf6f2800d8b15fb60ac44b61 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 19 Jan 2026 14:50:28 +0000 Subject: [PATCH 105/148] chore(deps): bump black in the python-packages group Bumps the python-packages group with 1 update: [black](https://github.com/psf/black). Updates `black` from 25.12.0 to 26.1.0 - [Release notes](https://github.com/psf/black/releases) - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) - [Commits](https://github.com/psf/black/compare/25.12.0...26.1.0) --- updated-dependencies: - dependency-name: black dependency-version: 26.1.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index dd2d73f..1d3c36f 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -1,6 +1,6 @@ # Linting & Formatting autopep8==2.3.2 -black==25.12.0 +black==26.1.0 flake8==7.3.0 # Testing From 712d22d124d2922a4a4a3f35433ccf2a8903392c Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Mon, 19 Jan 2026 17:40:27 +0000 Subject: [PATCH 106/148] Release version 0.61.2 --- CHANGES.rst | 38 +++++++++++++++++++++++++++++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 38 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index e44cd3f..1811a4f 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,9 +1,45 @@ Changelog ========= -0.61.1 (2026-01-13) +0.61.2 (2026-01-19) ------------------- ------------------------ + +Fix +~~~ +- Skip security advisories for private repos unless explicitly + requested. [Lukas Bestle] +- Handle 404 errors on security advisories. [Lukas Bestle] + +Other +~~~~~ +- Chore(deps): bump black in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [black](https://github.com/psf/black). + + + Updates `black` from 25.12.0 to 26.1.0 + - [Release notes](https://github.com/psf/black/releases) + - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) + - [Commits](https://github.com/psf/black/compare/25.12.0...26.1.0) + + --- + updated-dependencies: + - dependency-name: black + dependency-version: 26.1.0 + dependency-type: direct:production + update-type: version-update:semver-major + dependency-group: python-packages + ... +- Docs: Explain security advisories in README. [Lukas Bestle] +- Feat: Only make security advisory dir if successful. [Lukas Bestle] + + Avoids empty directories for private repos + + +0.61.1 (2026-01-13) +------------------- - Refactor test fixtures to use shared create_args helper. [Rodos] Uses the real parse_args() function to get CLI defaults, so when diff --git a/github_backup/__init__.py b/github_backup/__init__.py index daa1407..bbe1689 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.61.1" +__version__ = "0.61.2" From 0d8a504b024a73096f00175ffbac51a8100cf08c Mon Sep 17 00:00:00 2001 From: Rodos Date: Wed, 21 Jan 2026 21:12:03 +1100 Subject: [PATCH 107/148] Fix KeyError: 'Private' when using --all flag (#481) The repository dictionary uses lowercase "private" key. Use .get() with the correct case to match the pattern used elsewhere in the codebase. The bug only affects --all users since --security-advisories short-circuits before the key access. --- github_backup/github_backup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 346d541..0b7e1f8 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -1814,7 +1814,7 @@ def backup_repositories(args, output_directory, repositories): if args.include_milestones or args.include_everything: backup_milestones(args, repo_cwd, repository, repos_template) - if args.include_security_advisories or (args.include_everything and not repository["Private"]): + if args.include_security_advisories or (args.include_everything and not repository.get("private", False)): backup_security_advisories(args, repo_cwd, repository, repos_template) if args.include_labels or args.include_everything: From 2f5e7c2dcfa0446d7dd2ae9368e4397b4a878c0e Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed, 21 Jan 2026 13:05:17 +0000 Subject: [PATCH 108/148] chore(deps): bump setuptools in the python-packages group Bumps the python-packages group with 1 update: [setuptools](https://github.com/pypa/setuptools). Updates `setuptools` from 80.9.0 to 80.10.1 - [Release notes](https://github.com/pypa/setuptools/releases) - [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst) - [Commits](https://github.com/pypa/setuptools/compare/v80.9.0...v80.10.1) --- updated-dependencies: - dependency-name: setuptools dependency-version: 80.10.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index 1d3c36f..1a533c0 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -9,7 +9,7 @@ pytest==9.0.2 # Release & Publishing twine==6.2.0 gitchangelog==3.0.4 -setuptools==80.9.0 +setuptools==80.10.1 # Documentation restructuredtext-lint==2.0.2 From 9be6282719862f58dd59a6a29b61e45b95e31296 Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Sat, 24 Jan 2026 05:45:42 +0000 Subject: [PATCH 109/148] Release version 0.61.3 --- CHANGES.rst | 32 +++++++++++++++++++++++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 32 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 1811a4f..094f1ee 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,9 +1,39 @@ Changelog ========= -0.61.2 (2026-01-19) +0.61.3 (2026-01-24) ------------------- ------------------------ +- Fix KeyError: 'Private' when using --all flag (#481) [Rodos] + + The repository dictionary uses lowercase "private" key. Use .get() with + the correct case to match the pattern used elsewhere in the codebase. + + The bug only affects --all users since --security-advisories short-circuits + before the key access. +- Chore(deps): bump setuptools in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [setuptools](https://github.com/pypa/setuptools). + + + Updates `setuptools` from 80.9.0 to 80.10.1 + - [Release notes](https://github.com/pypa/setuptools/releases) + - [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst) + - [Commits](https://github.com/pypa/setuptools/compare/v80.9.0...v80.10.1) + + --- + updated-dependencies: + - dependency-name: setuptools + dependency-version: 80.10.1 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + ... + + +0.61.2 (2026-01-19) +------------------- Fix ~~~ diff --git a/github_backup/__init__.py b/github_backup/__init__.py index bbe1689..ce11d35 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.61.2" +__version__ = "0.61.3" From be900d1f3ffb0a0a010cad0d6c0e9ac22d14ed65 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 26 Jan 2026 14:08:53 +0000 Subject: [PATCH 110/148] chore(deps): bump setuptools in the python-packages group Bumps the python-packages group with 1 update: [setuptools](https://github.com/pypa/setuptools). Updates `setuptools` from 80.10.1 to 80.10.2 - [Release notes](https://github.com/pypa/setuptools/releases) - [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst) - [Commits](https://github.com/pypa/setuptools/compare/v80.10.1...v80.10.2) --- updated-dependencies: - dependency-name: setuptools dependency-version: 80.10.2 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index 1a533c0..4c614e9 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -9,7 +9,7 @@ pytest==9.0.2 # Release & Publishing twine==6.2.0 gitchangelog==3.0.4 -setuptools==80.10.1 +setuptools==80.10.2 # Documentation restructuredtext-lint==2.0.2 From 6268a4c5c6116929c380f58d227529ef97d700a9 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 9 Feb 2026 14:31:40 +0000 Subject: [PATCH 111/148] chore(deps): bump setuptools in the python-packages group Bumps the python-packages group with 1 update: [setuptools](https://github.com/pypa/setuptools). Updates `setuptools` from 80.10.2 to 82.0.0 - [Release notes](https://github.com/pypa/setuptools/releases) - [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst) - [Commits](https://github.com/pypa/setuptools/compare/v80.10.2...v82.0.0) --- updated-dependencies: - dependency-name: setuptools dependency-version: 82.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index 4c614e9..6742290 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -9,7 +9,7 @@ pytest==9.0.2 # Release & Publishing twine==6.2.0 gitchangelog==3.0.4 -setuptools==80.10.2 +setuptools==82.0.0 # Documentation restructuredtext-lint==2.0.2 From 0162f7ed465ebaf459b694060948b464dbf62c22 Mon Sep 17 00:00:00 2001 From: Rodos Date: Mon, 16 Feb 2026 10:12:36 +1100 Subject: [PATCH 112/148] Fix HTTP 451 DMCA and 403 TOS handling regression (#487) The DMCA handling added in PR #454 had a bug: make_request_with_retry() raises HTTPError before retrieve_data() could check the status code via getcode(), making the case 451 handler dead code. This also affected HTTP 403 TOS violations (e.g. jumoog/MagiskOnWSA). Fix by catching HTTPError in retrieve_data() and converting 451 and blocked 403 responses (identified by "block" key in response body) to RepositoryUnavailableError. Non-block 403s (permissions, scopes) still propagate as HTTPError. Also handle RepositoryUnavailableError in retrieve_repositories() for the --repository case. Rewrote tests to mock urlopen (not make_request_with_retry) to exercise the real code path that was previously untested. Closes #487 --- github_backup/github_backup.py | 123 +++++++++++++--------- tests/test_http_451.py | 180 ++++++++++++++++++++++++++------- tests/test_retrieve_data.py | 22 ++++ 3 files changed, 245 insertions(+), 80 deletions(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 0b7e1f8..ada2d40 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -39,11 +39,11 @@ class RepositoryUnavailableError(Exception): - """Raised when a repository is unavailable due to legal reasons (e.g., DMCA takedown).""" + """Raised when a repository is unavailable due to legal reasons (e.g., DMCA takedown, TOS violation).""" - def __init__(self, message, dmca_url=None): + def __init__(self, message, legal_url=None): super().__init__(message) - self.dmca_url = dmca_url + self.legal_url = legal_url # Setup SSL context with fallback chain @@ -647,6 +647,14 @@ def _extract_next_page_url(link_header): return None def fetch_all() -> Generator[dict, None, None]: + def _extract_legal_url(response_body_bytes): + """Extract DMCA/legal notice URL from GitHub API error response body.""" + try: + data = json.loads(response_body_bytes.decode("utf-8")) + return data.get("block", {}).get("html_url") + except Exception: + return None + next_url = None while True: @@ -661,47 +669,66 @@ def fetch_all() -> Generator[dict, None, None]: as_app=args.as_app, fine=args.token_fine is not None, ) - http_response = make_request_with_retry(request, auth, args.max_retries) - - match http_response.getcode(): - case 200: - # Success - Parse JSON response - try: - response = json.loads(http_response.read().decode("utf-8")) - break # Exit retry loop and handle the data returned - except ( - IncompleteRead, - json.decoder.JSONDecodeError, - TimeoutError, - ) as e: - logger.warning(f"{type(e).__name__} reading response") - if attempt < args.max_retries: - delay = calculate_retry_delay(attempt, {}) - logger.warning( - f"Retrying read in {delay:.1f}s (attempt {attempt + 1}/{args.max_retries + 1})" - ) - time.sleep(delay) - continue # Next retry attempt - - case 451: - # DMCA takedown - extract URL if available, then raise - dmca_url = None - try: - response_data = json.loads( - http_response.read().decode("utf-8") - ) - dmca_url = response_data.get("block", {}).get("html_url") - except Exception: - pass + try: + http_response = make_request_with_retry( + request, auth, args.max_retries + ) + except HTTPError as exc: + if exc.code == 451: + legal_url = _extract_legal_url(exc.read()) raise RepositoryUnavailableError( - "Repository unavailable due to legal reasons (HTTP 451)", - dmca_url=dmca_url, + f"Repository unavailable due to legal reasons (HTTP {exc.code})", + legal_url=legal_url, ) + elif exc.code == 403: + # Rate-limit 403s (x-ratelimit-remaining=0) are retried + # by make_request_with_retry — re-raise if exhausted. + if int(exc.headers.get("x-ratelimit-remaining", 1)) < 1: + raise + # Only convert to RepositoryUnavailableError if GitHub + # indicates a TOS/DMCA block (response contains "block" + # key). Other 403s (permissions, scopes) should propagate. + body = exc.read() + try: + data = json.loads(body.decode("utf-8")) + except Exception: + data = {} + if "block" in data: + raise RepositoryUnavailableError( + "Repository access blocked (HTTP 403)", + legal_url=data.get("block", {}).get("html_url"), + ) + raise + else: + raise + + # urlopen raises HTTPError for non-2xx, so only success gets here. + # Guard against unexpected status codes from proxies, future Python + # changes, or other edge cases we haven't considered. + status = http_response.getcode() + if status != 200: + raise Exception( + f"Unexpected HTTP {status} from {next_url or template} " + f"(expected non-2xx to raise HTTPError)" + ) - case _: - raise Exception( - f"API request returned HTTP {http_response.getcode()}: {http_response.reason}" + # Parse JSON response + try: + response = json.loads(http_response.read().decode("utf-8")) + break # Exit retry loop and handle the data returned + except ( + IncompleteRead, + json.decoder.JSONDecodeError, + TimeoutError, + ) as e: + logger.warning(f"{type(e).__name__} reading response") + if attempt < args.max_retries: + delay = calculate_retry_delay(attempt, {}) + logger.warning( + f"Retrying read in {delay:.1f}s (attempt {attempt + 1}/{args.max_retries + 1})" ) + time.sleep(delay) + continue # Next retry attempt else: logger.error( f"Failed to read response after {args.max_retries + 1} attempts for {next_url or template}" @@ -1614,7 +1641,13 @@ def retrieve_repositories(args, authenticated_user): paginated = False template = "https://{0}/repos/{1}".format(get_github_api_host(args), repo_path) - repos = retrieve_data(args, template, paginated=paginated) + try: + repos = retrieve_data(args, template, paginated=paginated) + except RepositoryUnavailableError as e: + logger.warning(f"Repository is unavailable: {e}") + if e.legal_url: + logger.warning(f"Legal notice: {e.legal_url}") + return [] if args.all_starred: starred_template = "https://{0}/users/{1}/starred".format( @@ -1832,11 +1865,9 @@ def backup_repositories(args, output_directory, repositories): include_assets=args.include_assets or args.include_everything, ) except RepositoryUnavailableError as e: - logger.warning( - f"Repository {repository['full_name']} is unavailable (HTTP 451)" - ) - if e.dmca_url: - logger.warning(f"DMCA notice: {e.dmca_url}") + logger.warning(f"Repository {repository['full_name']} is unavailable: {e}") + if e.legal_url: + logger.warning(f"Legal notice: {e.legal_url}") logger.info(f"Skipping remaining resources for {repository['full_name']}") continue diff --git a/tests/test_http_451.py b/tests/test_http_451.py index b556069..bba866e 100644 --- a/tests/test_http_451.py +++ b/tests/test_http_451.py @@ -1,13 +1,28 @@ -"""Tests for HTTP 451 (DMCA takedown) handling.""" +"""Tests for HTTP 451 (DMCA takedown) and HTTP 403 (TOS) handling.""" +import io import json -from unittest.mock import Mock, patch +from unittest.mock import patch +from urllib.error import HTTPError import pytest from github_backup import github_backup +def _make_http_error(code, body_bytes, msg="Error", headers=None): + """Create an HTTPError with a readable body (like a real urllib response).""" + if headers is None: + headers = {"x-ratelimit-remaining": "5000"} + return HTTPError( + url="https://api.github.com/repos/test/repo", + code=code, + msg=msg, + hdrs=headers, + fp=io.BytesIO(body_bytes), + ) + + class TestHTTP451Exception: """Test suite for HTTP 451 DMCA takedown exception handling.""" @@ -15,9 +30,6 @@ def test_repository_unavailable_error_raised(self, create_args): """HTTP 451 should raise RepositoryUnavailableError with DMCA URL.""" args = create_args() - mock_response = Mock() - mock_response.getcode.return_value = 451 - dmca_data = { "message": "Repository access blocked", "block": { @@ -26,66 +38,166 @@ def test_repository_unavailable_error_raised(self, create_args): "html_url": "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md", }, } - mock_response.read.return_value = json.dumps(dmca_data).encode("utf-8") - mock_response.headers = {"x-ratelimit-remaining": "5000"} - mock_response.reason = "Unavailable For Legal Reasons" - - with patch( - "github_backup.github_backup.make_request_with_retry", - return_value=mock_response, - ): + body = json.dumps(dmca_data).encode("utf-8") + + def mock_urlopen(*a, **kw): + raise _make_http_error(451, body, msg="Unavailable For Legal Reasons") + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info: github_backup.retrieve_data( args, "https://api.github.com/repos/test/dmca/issues" ) assert ( - exc_info.value.dmca_url + exc_info.value.legal_url == "https://github.com/github/dmca/blob/master/2024/11/2024-11-04-source-code.md" ) assert "451" in str(exc_info.value) - def test_repository_unavailable_error_without_dmca_url(self, create_args): + def test_repository_unavailable_error_without_legal_url(self, create_args): """HTTP 451 without DMCA details should still raise exception.""" args = create_args() - mock_response = Mock() - mock_response.getcode.return_value = 451 - mock_response.read.return_value = b'{"message": "Blocked"}' - mock_response.headers = {"x-ratelimit-remaining": "5000"} - mock_response.reason = "Unavailable For Legal Reasons" + def mock_urlopen(*a, **kw): + raise _make_http_error(451, b'{"message": "Blocked"}') - with patch( - "github_backup.github_backup.make_request_with_retry", - return_value=mock_response, - ): + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info: github_backup.retrieve_data( args, "https://api.github.com/repos/test/dmca/issues" ) - assert exc_info.value.dmca_url is None + assert exc_info.value.legal_url is None assert "451" in str(exc_info.value) def test_repository_unavailable_error_with_malformed_json(self, create_args): """HTTP 451 with malformed JSON should still raise exception.""" args = create_args() - mock_response = Mock() - mock_response.getcode.return_value = 451 - mock_response.read.return_value = b"invalid json {" - mock_response.headers = {"x-ratelimit-remaining": "5000"} - mock_response.reason = "Unavailable For Legal Reasons" + def mock_urlopen(*a, **kw): + raise _make_http_error(451, b"invalid json {") - with patch( - "github_backup.github_backup.make_request_with_retry", - return_value=mock_response, - ): + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): with pytest.raises(github_backup.RepositoryUnavailableError): github_backup.retrieve_data( args, "https://api.github.com/repos/test/dmca/issues" ) +class TestHTTP403TOS: + """Test suite for HTTP 403 TOS violation handling.""" + + def test_403_tos_raises_repository_unavailable(self, create_args): + """HTTP 403 (non-rate-limit) should raise RepositoryUnavailableError.""" + args = create_args() + + tos_data = { + "message": "Repository access blocked", + "block": { + "reason": "tos", + "html_url": "https://github.com/contact/tos-violation", + }, + } + body = json.dumps(tos_data).encode("utf-8") + + def mock_urlopen(*a, **kw): + raise _make_http_error( + 403, + body, + msg="Forbidden", + headers={"x-ratelimit-remaining": "5000"}, + ) + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + with pytest.raises(github_backup.RepositoryUnavailableError) as exc_info: + github_backup.retrieve_data( + args, "https://api.github.com/repos/test/blocked/issues" + ) + + assert ( + exc_info.value.legal_url == "https://github.com/contact/tos-violation" + ) + assert "403" in str(exc_info.value) + + def test_403_permission_denied_not_converted(self, create_args): + """HTTP 403 without 'block' in body should propagate as HTTPError, not RepositoryUnavailableError.""" + args = create_args() + + body = json.dumps({"message": "Must have admin rights to Repository."}).encode( + "utf-8" + ) + + def mock_urlopen(*a, **kw): + raise _make_http_error( + 403, + body, + msg="Forbidden", + headers={"x-ratelimit-remaining": "5000"}, + ) + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + with pytest.raises(HTTPError) as exc_info: + github_backup.retrieve_data( + args, "https://api.github.com/repos/test/private/issues" + ) + + assert exc_info.value.code == 403 + + def test_403_rate_limit_not_converted(self, create_args): + """HTTP 403 with rate limit exhausted should NOT become RepositoryUnavailableError.""" + args = create_args() + + call_count = 0 + + def mock_urlopen(*a, **kw): + nonlocal call_count + call_count += 1 + raise _make_http_error( + 403, + b'{"message": "rate limit"}', + msg="Forbidden", + headers={"x-ratelimit-remaining": "0"}, + ) + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + with patch( + "github_backup.github_backup.calculate_retry_delay", return_value=0 + ): + with pytest.raises(HTTPError) as exc_info: + github_backup.retrieve_data( + args, "https://api.github.com/repos/test/ratelimit/issues" + ) + + assert exc_info.value.code == 403 + # Should have retried (not raised immediately as RepositoryUnavailableError) + assert call_count > 1 + + +class TestRetrieveRepositoriesUnavailable: + """Test that retrieve_repositories handles RepositoryUnavailableError gracefully.""" + + def test_unavailable_repo_returns_empty_list(self, create_args): + """retrieve_repositories should return [] when the repo is unavailable.""" + args = create_args(repository="blocked-repo") + + def mock_urlopen(*a, **kw): + raise _make_http_error( + 451, + json.dumps( + { + "message": "Blocked", + "block": {"html_url": "https://example.com/dmca"}, + } + ).encode("utf-8"), + msg="Unavailable For Legal Reasons", + ) + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + repos = github_backup.retrieve_repositories(args, {"login": None}) + + assert repos == [] + + if __name__ == "__main__": pytest.main([__file__, "-v"]) diff --git a/tests/test_retrieve_data.py b/tests/test_retrieve_data.py index 159f06e..014c309 100644 --- a/tests/test_retrieve_data.py +++ b/tests/test_retrieve_data.py @@ -288,6 +288,28 @@ def mock_urlopen(*args, **kwargs): assert exc_info.value.code == 403 assert call_count == 1 # No retries + def test_451_error_not_retried(self): + """HTTP 451 should not be retried - raise immediately.""" + call_count = 0 + + def mock_urlopen(*args, **kwargs): + nonlocal call_count + call_count += 1 + raise HTTPError( + url="https://api.github.com/test", + code=451, + msg="Unavailable For Legal Reasons", + hdrs={"x-ratelimit-remaining": "5000"}, + fp=None, + ) + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + with pytest.raises(HTTPError) as exc_info: + make_request_with_retry(Mock(), None) + + assert exc_info.value.code == 451 + assert call_count == 1 # No retries + def test_connection_error_retries_and_succeeds(self): """URLError (connection error) should retry and succeed if subsequent request works.""" good_response = Mock() From 60067650b070b73f8d1821064c8edc9affa6884c Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Mon, 16 Feb 2026 05:46:39 +0000 Subject: [PATCH 113/148] Release version 0.61.4 --- CHANGES.rst | 61 ++++++++++++++++++++++++++++++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 61 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 094f1ee..808da6b 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,9 +1,68 @@ Changelog ========= -0.61.3 (2026-01-24) +0.61.4 (2026-02-16) ------------------- ------------------------ +- Fix HTTP 451 DMCA and 403 TOS handling regression (#487) [Rodos] + + The DMCA handling added in PR #454 had a bug: make_request_with_retry() + raises HTTPError before retrieve_data() could check the status code via + getcode(), making the case 451 handler dead code. This also affected + HTTP 403 TOS violations (e.g. jumoog/MagiskOnWSA). + + Fix by catching HTTPError in retrieve_data() and converting 451 and + blocked 403 responses (identified by "block" key in response body) to + RepositoryUnavailableError. Non-block 403s (permissions, scopes) still + propagate as HTTPError. Also handle RepositoryUnavailableError in + retrieve_repositories() for the --repository case. + + Rewrote tests to mock urlopen (not make_request_with_retry) to exercise + the real code path that was previously untested. + + Closes #487 +- Chore(deps): bump setuptools in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [setuptools](https://github.com/pypa/setuptools). + + + Updates `setuptools` from 80.10.2 to 82.0.0 + - [Release notes](https://github.com/pypa/setuptools/releases) + - [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst) + - [Commits](https://github.com/pypa/setuptools/compare/v80.10.2...v82.0.0) + + --- + updated-dependencies: + - dependency-name: setuptools + dependency-version: 82.0.0 + dependency-type: direct:production + update-type: version-update:semver-major + dependency-group: python-packages + ... +- Chore(deps): bump setuptools in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [setuptools](https://github.com/pypa/setuptools). + + + Updates `setuptools` from 80.10.1 to 80.10.2 + - [Release notes](https://github.com/pypa/setuptools/releases) + - [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst) + - [Commits](https://github.com/pypa/setuptools/compare/v80.10.1...v80.10.2) + + --- + updated-dependencies: + - dependency-name: setuptools + dependency-version: 80.10.2 + dependency-type: direct:production + update-type: version-update:semver-patch + dependency-group: python-packages + ... + + +0.61.3 (2026-01-24) +------------------- - Fix KeyError: 'Private' when using --all flag (#481) [Rodos] The repository dictionary uses lowercase "private" key. Use .get() with diff --git a/github_backup/__init__.py b/github_backup/__init__.py index ce11d35..03f7dee 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.61.3" +__version__ = "0.61.4" From f54a5458f6db668a5ff4d6395d792e00d20999e7 Mon Sep 17 00:00:00 2001 From: Rodos Date: Wed, 18 Feb 2026 20:10:48 +1100 Subject: [PATCH 114/148] Fix empty repository crash due to None timestamp comparison (#489) Empty repositories have None for pushed_at/updated_at, causing a TypeError when compared to the last_update string. Use .get() with truthiness check to skip None timestamps in incremental tracking. --- github_backup/github_backup.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index ada2d40..4d5394e 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -1772,9 +1772,9 @@ def backup_repositories(args, output_directory, repositories): last_update = "0000-00-00T00:00:00Z" for repository in repositories: - if "updated_at" in repository and repository["updated_at"] > last_update: + if repository.get("updated_at") and repository["updated_at"] > last_update: last_update = repository["updated_at"] - elif "pushed_at" in repository and repository["pushed_at"] > last_update: + elif repository.get("pushed_at") and repository["pushed_at"] > last_update: last_update = repository["pushed_at"] if repository.get("is_gist"): From 68af1d406a5ee0249829b24972e0d9bc77320a5a Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Wed, 18 Feb 2026 21:04:32 +0000 Subject: [PATCH 115/148] Release version 0.61.5 --- CHANGES.rst | 12 +++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 808da6b..6041b9e 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,9 +1,19 @@ Changelog ========= -0.61.4 (2026-02-16) +0.61.5 (2026-02-18) ------------------- ------------------------ +- Fix empty repository crash due to None timestamp comparison (#489) + [Rodos] + + Empty repositories have None for pushed_at/updated_at, causing a + TypeError when compared to the last_update string. Use .get() with + truthiness check to skip None timestamps in incremental tracking. + + +0.61.4 (2026-02-16) +------------------- - Fix HTTP 451 DMCA and 403 TOS handling regression (#487) [Rodos] The DMCA handling added in PR #454 had a bug: make_request_with_retry() diff --git a/github_backup/__init__.py b/github_backup/__init__.py index 03f7dee..294be4d 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.61.4" +__version__ = "0.61.5" From 8a0553a5b175a9f91449e6a29b37ceffeff26c1e Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 9 Mar 2026 04:33:49 +0000 Subject: [PATCH 116/148] chore(deps): bump docker/metadata-action from 5 to 6 Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 5 to 6. - [Release notes](https://github.com/docker/metadata-action/releases) - [Commits](https://github.com/docker/metadata-action/compare/v5...v6) --- updated-dependencies: - dependency-name: docker/metadata-action dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] --- .github/workflows/docker.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/docker.yml b/.github/workflows/docker.yml index f367b99..1aa81fe 100644 --- a/.github/workflows/docker.yml +++ b/.github/workflows/docker.yml @@ -57,7 +57,7 @@ jobs: - name: Extract metadata (tags, labels) for Docker id: meta - uses: docker/metadata-action@v5 + uses: docker/metadata-action@v6 with: images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} tags: | From 7f1807aaf82ac3565e1e4f1261644b376d0a5600 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 9 Mar 2026 04:33:53 +0000 Subject: [PATCH 117/148] chore(deps): bump docker/setup-buildx-action from 3 to 4 Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 3 to 4. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](https://github.com/docker/setup-buildx-action/compare/v3...v4) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] --- .github/workflows/docker.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/docker.yml b/.github/workflows/docker.yml index f367b99..b9103c5 100644 --- a/.github/workflows/docker.yml +++ b/.github/workflows/docker.yml @@ -46,7 +46,7 @@ jobs: uses: docker/setup-qemu-action@v3 - name: Set up Docker Buildx - uses: docker/setup-buildx-action@v3 + uses: docker/setup-buildx-action@v4 - name: Log in to the Container registry uses: docker/login-action@v3 From cceef92346fb8c6fb672b29b8f0917e95cbcb591 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 9 Mar 2026 04:33:55 +0000 Subject: [PATCH 118/148] chore(deps): bump docker/setup-qemu-action from 3 to 4 Bumps [docker/setup-qemu-action](https://github.com/docker/setup-qemu-action) from 3 to 4. - [Release notes](https://github.com/docker/setup-qemu-action/releases) - [Commits](https://github.com/docker/setup-qemu-action/compare/v3...v4) --- updated-dependencies: - dependency-name: docker/setup-qemu-action dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] --- .github/workflows/docker.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/docker.yml b/.github/workflows/docker.yml index f367b99..749ed52 100644 --- a/.github/workflows/docker.yml +++ b/.github/workflows/docker.yml @@ -43,7 +43,7 @@ jobs: persist-credentials: false - name: Set up QEMU - uses: docker/setup-qemu-action@v3 + uses: docker/setup-qemu-action@v4 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 From 5758e489e82305bfcdc02cf643c6c543b489ebb7 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 9 Mar 2026 04:33:58 +0000 Subject: [PATCH 119/148] chore(deps): bump docker/build-push-action from 6 to 7 Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6 to 7. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/v6...v7) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] --- .github/workflows/docker.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/docker.yml b/.github/workflows/docker.yml index f367b99..00fdec3 100644 --- a/.github/workflows/docker.yml +++ b/.github/workflows/docker.yml @@ -68,7 +68,7 @@ jobs: type=raw,value=latest,enable=${{ github.ref == format('refs/heads/{0}', 'main') }} - name: Build and push Docker image - uses: docker/build-push-action@v6 + uses: docker/build-push-action@v7 with: context: . push: true From d5be07ec809c9c0ca7bfafc80345f09c9baf532b Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 9 Mar 2026 13:28:37 +0000 Subject: [PATCH 120/148] chore(deps): bump the python-packages group with 2 updates Bumps the python-packages group with 2 updates: [black](https://github.com/psf/black) and [setuptools](https://github.com/pypa/setuptools). Updates `black` from 26.1.0 to 26.3.0 - [Release notes](https://github.com/psf/black/releases) - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) - [Commits](https://github.com/psf/black/compare/26.1.0...26.3.0) Updates `setuptools` from 82.0.0 to 82.0.1 - [Release notes](https://github.com/pypa/setuptools/releases) - [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst) - [Commits](https://github.com/pypa/setuptools/compare/v82.0.0...v82.0.1) --- updated-dependencies: - dependency-name: black dependency-version: 26.3.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages - dependency-name: setuptools dependency-version: 82.0.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/release-requirements.txt b/release-requirements.txt index 6742290..65a036b 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -1,6 +1,6 @@ # Linting & Formatting autopep8==2.3.2 -black==26.1.0 +black==26.3.0 flake8==7.3.0 # Testing @@ -9,7 +9,7 @@ pytest==9.0.2 # Release & Publishing twine==6.2.0 gitchangelog==3.0.4 -setuptools==82.0.0 +setuptools==82.0.1 # Documentation restructuredtext-lint==2.0.2 From 3d961d11184f1fc384a8be290347b1de1e5064fe Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 9 Mar 2026 17:26:41 +0000 Subject: [PATCH 121/148] chore(deps): bump docker/login-action from 3 to 4 Bumps [docker/login-action](https://github.com/docker/login-action) from 3 to 4. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/v3...v4) --- updated-dependencies: - dependency-name: docker/login-action dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] --- .github/workflows/docker.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/docker.yml b/.github/workflows/docker.yml index 9508f94..4e5c89b 100644 --- a/.github/workflows/docker.yml +++ b/.github/workflows/docker.yml @@ -49,7 +49,7 @@ jobs: uses: docker/setup-buildx-action@v4 - name: Log in to the Container registry - uses: docker/login-action@v3 + uses: docker/login-action@v4 with: registry: ${{ env.REGISTRY }} username: ${{ github.actor }} From f85c759e5df58bb5c1c680943bedbf03b9141afb Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu, 12 Mar 2026 13:05:24 +0000 Subject: [PATCH 122/148] chore(deps): bump black in the python-packages group Bumps the python-packages group with 1 update: [black](https://github.com/psf/black). Updates `black` from 26.3.0 to 26.3.1 - [Release notes](https://github.com/psf/black/releases) - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) - [Commits](https://github.com/psf/black/compare/26.3.0...26.3.1) --- updated-dependencies: - dependency-name: black dependency-version: 26.3.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index 65a036b..ddc1430 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -1,6 +1,6 @@ # Linting & Formatting autopep8==2.3.2 -black==26.3.0 +black==26.3.1 flake8==7.3.0 # Testing From 9fde6ed1ffff0660b8ead272c4993bd472312762 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed, 8 Apr 2026 13:05:48 +0000 Subject: [PATCH 123/148] chore(deps): bump pytest in the python-packages group Bumps the python-packages group with 1 update: [pytest](https://github.com/pytest-dev/pytest). Updates `pytest` from 9.0.2 to 9.0.3 - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pytest-dev/pytest/compare/9.0.2...9.0.3) --- updated-dependencies: - dependency-name: pytest dependency-version: 9.0.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index ddc1430..ad8bc5c 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -4,7 +4,7 @@ black==26.3.1 flake8==7.3.0 # Testing -pytest==9.0.2 +pytest==9.0.3 # Release & Publishing twine==6.2.0 From f4117990b29b8f50ad3c57c86c5af1f9700c1b9c Mon Sep 17 00:00:00 2001 From: Duncan Ogilvie Date: Sun, 26 Apr 2026 13:42:14 +0200 Subject: [PATCH 124/148] Add --token-from-gh authentication option --- CHANGES.rst | 5 +++ README.rst | 7 ++-- github_backup/github_backup.py | 48 +++++++++++++++++++++++-- tests/test_auth.py | 65 ++++++++++++++++++++++++++++++++++ 4 files changed, 121 insertions(+), 4 deletions(-) create mode 100644 tests/test_auth.py diff --git a/CHANGES.rst b/CHANGES.rst index 6041b9e..364bd3d 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,6 +1,11 @@ Changelog ========= +Unreleased +---------- +- Add ``--token-from-gh`` to read authentication from ``gh auth token``. + + 0.61.5 (2026-02-18) ------------------- ------------------------ diff --git a/README.rst b/README.rst index cd7be1f..030f260 100644 --- a/README.rst +++ b/README.rst @@ -36,8 +36,8 @@ Show the CLI help output:: CLI Help output:: - github-backup [-h] [-t TOKEN_CLASSIC] [-f TOKEN_FINE] [-q] [--as-app] - [-o OUTPUT_DIRECTORY] [-l LOG_LEVEL] [-i] + github-backup [-h] [-t TOKEN_CLASSIC] [-f TOKEN_FINE] [--token-from-gh] + [-q] [--as-app] [-o OUTPUT_DIRECTORY] [-l LOG_LEVEL] [-i] [--incremental-by-files] [--starred] [--all-starred] [--starred-skip-size-over MB] [--watched] [--followers] [--following] [--all] @@ -71,6 +71,7 @@ CLI Help output:: -f, --token-fine TOKEN_FINE fine-grained personal access token (github_pat_....), or path to token (file://...) + --token-from-gh read token from GitHub CLI (gh auth token) -q, --quiet supress log messages less severe than warning, e.g. info --as-app authenticate as github app instead of as a user. @@ -171,6 +172,8 @@ The positional argument ``USER`` specifies the user or organization account you **Classic tokens** (``-t TOKEN``) are `slightly less secure `_ as they provide very coarse-grained permissions. +If you already authenticate with the `GitHub CLI `_, you can use ``--token-from-gh`` to read the token with ``gh auth token`` instead of passing a token directly. This avoids placing the token in shell history or process arguments. When ``--github-host`` is set, the token is read with ``gh auth token --hostname HOST``. + Fine Tokens ~~~~~~~~~~~ diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 4d5394e..fd2fd99 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -167,6 +167,12 @@ def parse_args(args=None): dest="token_fine", help="fine-grained personal access token (github_pat_....), or path to token (file://...)", ) # noqa + parser.add_argument( + "--token-from-gh", + action="store_true", + dest="token_from_gh", + help="read token from GitHub CLI (gh auth token)", + ) parser.add_argument( "-q", "--quiet", @@ -537,8 +543,14 @@ def get_auth(args, encode=True, for_git_cli=False): raise Exception( "Fine-grained token supplied does not look like a GitHub PAT" ) - elif args.token_classic: - if args.token_classic.startswith(FILE_URI_PREFIX): + elif args.token_classic or args.token_from_gh: + if args.token_from_gh: + if args.as_app: + raise Exception( + "--token-from-gh cannot be used with --as-app; provide the app token with --token instead" + ) + args.token_classic = read_token_from_gh_cli(args) + elif args.token_classic.startswith(FILE_URI_PREFIX): args.token_classic = read_file_contents(args.token_classic) if not args.as_app: @@ -580,6 +592,38 @@ def read_file_contents(file_uri): return open(file_uri[len(FILE_URI_PREFIX) :], "rt").readline().strip() +def read_token_from_gh_cli(args): + cached_token = getattr(args, "_token_from_gh_value", None) + if cached_token: + return cached_token + + command = ["gh", "auth", "token"] + if args.github_host: + command.extend(["--hostname", get_github_host(args)]) + + try: + token = subprocess.check_output(command, stderr=subprocess.PIPE).decode( + "utf-8" + ).strip() + except FileNotFoundError: + raise Exception( + "Unable to read token from GitHub CLI: 'gh' executable not found" + ) + except subprocess.CalledProcessError as e: + stderr = e.stderr.decode("utf-8", errors="replace").strip() + if stderr: + raise Exception( + "Unable to read token from GitHub CLI: {0}".format(stderr) + ) + raise Exception("Unable to read token from GitHub CLI") + + if not token: + raise Exception("Unable to read token from GitHub CLI: token was empty") + + args._token_from_gh_value = token + return token + + def get_github_repo_url(args, repository): if repository.get("is_gist"): if args.prefer_ssh: diff --git a/tests/test_auth.py b/tests/test_auth.py new file mode 100644 index 0000000..504c822 --- /dev/null +++ b/tests/test_auth.py @@ -0,0 +1,65 @@ +"""Tests for authentication helpers.""" + +from unittest.mock import patch + +import pytest + +from github_backup import github_backup + + +def test_token_from_gh_flag_parses(): + args = github_backup.parse_args(["--token-from-gh", "testuser"]) + assert args.token_from_gh is True + + +def test_get_auth_reads_token_from_gh_cli(create_args): + args = create_args(token_from_gh=True) + + with patch( + "github_backup.github_backup.subprocess.check_output", + return_value=b"gho_test_token\n", + ) as mock_check_output: + auth = github_backup.get_auth(args, encode=False) + + assert auth == "gho_test_token:x-oauth-basic" + mock_check_output.assert_called_once_with( + ["gh", "auth", "token"], stderr=github_backup.subprocess.PIPE + ) + + +def test_get_auth_reads_token_from_gh_cli_for_enterprise_host(create_args): + args = create_args(token_from_gh=True, github_host="ghe.example.com") + + with patch( + "github_backup.github_backup.subprocess.check_output", + return_value=b"gho_enterprise_token\n", + ) as mock_check_output: + auth = github_backup.get_auth(args, encode=False) + + assert auth == "gho_enterprise_token:x-oauth-basic" + mock_check_output.assert_called_once_with( + ["gh", "auth", "token", "--hostname", "ghe.example.com"], + stderr=github_backup.subprocess.PIPE, + ) + + +def test_token_from_gh_is_cached(create_args): + args = create_args(token_from_gh=True) + + with patch( + "github_backup.github_backup.subprocess.check_output", + return_value=b"gho_cached_token\n", + ) as mock_check_output: + assert github_backup.get_auth(args, encode=False) == "gho_cached_token:x-oauth-basic" + assert github_backup.get_auth(args, encode=False) == "gho_cached_token:x-oauth-basic" + + mock_check_output.assert_called_once() + + +def test_token_from_gh_rejects_as_app(create_args): + args = create_args(token_from_gh=True, as_app=True) + + with pytest.raises(Exception) as exc_info: + github_backup.get_auth(args, encode=False) + + assert "--token-from-gh cannot be used with --as-app" in str(exc_info.value) From 4d022d94d0c7656a481651d8310a23e97a7db7fd Mon Sep 17 00:00:00 2001 From: Duncan Ogilvie Date: Sun, 26 Apr 2026 13:45:29 +0200 Subject: [PATCH 125/148] Add support for discussions Closes #290 --- CHANGES.rst | 2 + README.rst | 34 ++- github_backup/github_backup.py | 495 +++++++++++++++++++++++++++++-- github_backup/graphql_queries.py | 292 ++++++++++++++++++ tests/test_auth.py | 10 + tests/test_discussions.py | 222 ++++++++++++++ tests/test_retrieve_data.py | 28 ++ 7 files changed, 1042 insertions(+), 41 deletions(-) create mode 100644 github_backup/graphql_queries.py create mode 100644 tests/test_discussions.py diff --git a/CHANGES.rst b/CHANGES.rst index 364bd3d..50f8d54 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -3,6 +3,8 @@ Changelog Unreleased ---------- +- Add GitHub Discussions backups via GraphQL, including comments, replies, + optional attachment downloads, and per-repository incremental checkpoints. - Add ``--token-from-gh`` to read authentication from ``gh auth token``. diff --git a/README.rst b/README.rst index 030f260..4135743 100644 --- a/README.rst +++ b/README.rst @@ -4,7 +4,7 @@ github-backup |PyPI| |Python Versions| -The package can be used to backup an *entire* `Github `_ organization, repository or user account, including starred repos, issues and wikis in the most appropriate format (clones for wikis, json files for issues). +The package can be used to backup an *entire* `Github `_ organization, repository or user account, including starred repos, issues, discussions and wikis in the most appropriate format (clones for wikis, json files for issues and discussions). Requirements ============ @@ -44,8 +44,9 @@ CLI Help output:: [--issues] [--issue-comments] [--issue-events] [--pulls] [--pull-comments] [--pull-commits] [--pull-details] [--labels] [--hooks] [--milestones] [--security-advisories] - [--repositories] [--bare] [--no-prune] [--lfs] [--wikis] - [--gists] [--starred-gists] [--skip-archived] [--skip-existing] + [--discussions] [--repositories] [--bare] [--no-prune] + [--lfs] [--wikis] [--gists] [--starred-gists] + [--skip-archived] [--skip-existing] [-L [LANGUAGES ...]] [-N NAME_REGEX] [-H GITHUB_HOST] [-O] [-R REPOSITORY] [-P] [-F] [--prefer-ssh] [-v] [--keychain-name OSX_KEYCHAIN_ITEM_NAME] @@ -104,6 +105,7 @@ CLI Help output:: --milestones include milestones in backup --security-advisories include security advisories in backup + --discussions include discussions in backup --repositories include repository clone in backup --bare clone bare repositories --no-prune disable prune option for git fetch @@ -144,8 +146,8 @@ CLI Help output:: applies if including releases --skip-assets-on [SKIP_ASSETS_ON ...] skip asset downloads for these repositories - --attachments download user-attachments from issues and pull - requests + --attachments download user-attachments from issues, pull requests, + and discussions --throttle-limit THROTTLE_LIMIT start throttling of GitHub API requests after this amount of API requests remain @@ -184,7 +186,7 @@ Customise the permissions for your use case, but for a personal account full bac **User permissions**: Read access to followers, starring, and watching. -**Repository permissions**: Read access to contents, issues, metadata, pull requests, and webhooks. +**Repository permissions**: Read access to contents, discussions, issues, metadata, pull requests, and webhooks. GitHub Apps @@ -265,9 +267,9 @@ LFS objects are fetched for all refs, not just the current checkout, ensuring a About Attachments ----------------- -When you use the ``--attachments`` option with ``--issues`` or ``--pulls``, the tool will download user-uploaded attachments (images, videos, documents, etc.) from issue and pull request descriptions and comments. In some circumstances attachments contain valuable data related to the topic, and without their backup important information or context might be lost inadvertently. +When you use the ``--attachments`` option with ``--issues``, ``--pulls`` or ``--discussions``, the tool will download user-uploaded attachments (images, videos, documents, etc.) from issue, pull request and discussion descriptions and comments. In some circumstances attachments contain valuable data related to the topic, and without their backup important information or context might be lost inadvertently. -Attachments are saved to ``issues/attachments/{issue_number}/`` and ``pulls/attachments/{pull_number}/`` directories, where ``{issue_number}`` is the GitHub issue number (e.g., issue #123 saves to ``issues/attachments/123/``). Each attachment directory contains: +Attachments are saved to ``issues/attachments/{issue_number}/``, ``pulls/attachments/{pull_number}/`` and ``discussions/attachments/{discussion_number}/`` directories, where ``{issue_number}`` is the GitHub issue number (e.g., issue #123 saves to ``issues/attachments/123/``). Each attachment directory contains: - The downloaded attachment files (named by their GitHub identifier with appropriate file extensions) - If multiple attachments have the same filename, conflicts are resolved with numeric suffixes (e.g., ``report.pdf``, ``report_1.pdf``, ``report_2.pdf``) @@ -287,6 +289,16 @@ The tool automatically extracts file extensions from HTTP headers to ensure file **Fine-grained token limitation:** Due to a GitHub platform limitation, fine-grained personal access tokens (``github_pat_...``) cannot download attachments from private repositories directly. This affects both ``/assets/`` (images) and ``/files/`` (documents) URLs. The tool implements a workaround for image attachments using GitHub's Markdown API, which converts URLs to temporary JWT-signed URLs that can be downloaded. However, this workaround only works for images - document attachments (PDFs, text files, etc.) will fail with 404 errors when using fine-grained tokens on private repos. For full attachment support on private repositories, use a classic token (``-t``) instead of a fine-grained token (``-f``). See `#477 `_ for details. +About Discussions +----------------- + +GitHub Discussions are backed up with GitHub's GraphQL API because the REST API does not expose discussions. Use ``--discussions`` to save each discussion as JSON under ``repositories/{repo}/discussions/{number}.json``. Discussion backups include the discussion body and metadata, category information, comments, and comment replies. + +``--discussions`` is included in ``--all``. Unlike most REST API-backed resources, discussions require authentication because GitHub's GraphQL API requires a token. Fine-grained personal access tokens and GitHub Apps need read access to the repository's Discussions permission. + +Incremental backups use a per-repository checkpoint at ``repositories/{repo}/discussions/last_update`` based on discussion ``updatedAt`` timestamps. This is separate from the repository-level ``last_update`` file so discussion activity is not missed if the repository's own update timestamp does not change. If you enable ``--discussions`` on an existing incremental backup, the first run performs a full discussions backup for each repository and creates the discussions checkpoint for future runs. + + About security advisories ------------------------- @@ -419,14 +431,14 @@ Quietly and incrementally backup useful Github user data (public and private rep export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN GH_USER=YOUR-GITHUB-USER - github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --security-advisories --repositories --wikis --releases --assets --attachments --pull-details --gists --starred-gists $GH_USER + github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --security-advisories --discussions --repositories --wikis --releases --assets --attachments --pull-details --gists --starred-gists $GH_USER Debug an error/block or incomplete backup into a temporary directory. Omit "incremental" to fill a previous incomplete backup. :: export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN GH_USER=YOUR-GITHUB-USER - github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER + github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --discussions --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER Pipe a token from stdin to avoid storing it in environment variables or command history (Unix-like systems only):: @@ -442,7 +454,7 @@ This tool creates backups only, there is no inbuilt restore command. cd /tmp/white-house/repositories/petitions/repository git push --mirror git@github.com:WhiteHouse/petitions.git -**Issues, pull requests, comments, and other metadata** are saved as JSON files for archival purposes. The GitHub API does not support recreating this data faithfully, creating issues via the API has limitations: +**Issues, pull requests, discussions, comments, and other metadata** are saved as JSON files for archival purposes. The GitHub API does not support recreating this data faithfully, creating issues via the API has limitations: - New issue/PR numbers are assigned (original numbers cannot be set) - Timestamps reflect creation time (original dates cannot be set) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index fd2fd99..c1245bd 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -33,6 +33,13 @@ except ImportError: VERSION = "unknown" +from .graphql_queries import ( + DISCUSSION_DETAIL_QUERY, + DISCUSSION_LIST_QUERY, + DISCUSSION_PAGE_SIZE, + DISCUSSION_REPLIES_QUERY, +) + FNULL = open(os.devnull, "w") FILE_URI_PREFIX = "file://" logger = logging.getLogger(__name__) @@ -322,6 +329,12 @@ def parse_args(args=None): dest="include_security_advisories", help="include security advisories in backup", ) + parser.add_argument( + "--discussions", + action="store_true", + dest="include_discussions", + help="include discussions in backup", + ) parser.add_argument( "--repositories", action="store_true", @@ -469,7 +482,7 @@ def parse_args(args=None): "--attachments", action="store_true", dest="include_attachments", - help="download user-attachments from issues and pull requests", + help="download user-attachments from issues, pull requests, and discussions", ) parser.add_argument( "--throttle-limit", @@ -579,6 +592,31 @@ def get_github_api_host(args): return host +def get_github_graphql_url(args): + if args.github_host: + return "https://{0}/api/graphql".format(args.github_host) + + return "https://api.github.com/graphql" + + +def get_graphql_auth(args): + auth = get_auth(args, encode=False) + if not auth: + return None + + # GraphQL expects a bearer token. Classic tokens and keychain tokens use + # "token:x-oauth-basic" for REST Basic auth, so strip the synthetic + # password before sending the GraphQL Authorization header. + if ( + not getattr(args, "as_app", False) + and getattr(args, "token_fine", None) is None + and ":" in auth + ): + auth = auth.split(":", 1)[0] + + return auth + + def get_github_host(args): if args.github_host: host = args.github_host @@ -810,6 +848,87 @@ def _extract_legal_url(response_body_bytes): return list(fetch_all()) +def retrieve_graphql_data(args, query, variables=None, log_context=None): + """Fetch data from GitHub's GraphQL API.""" + auth = get_graphql_auth(args) + if not auth: + raise Exception("GitHub GraphQL API requires authentication") + + variables = variables or {} + payload = json.dumps( + {"query": query, "variables": variables}, ensure_ascii=False + ).encode("utf-8") + endpoint = get_github_graphql_url(args) + + for attempt in range(args.max_retries + 1): + request = Request(endpoint, data=payload, method="POST") + request.add_header("Accept", "application/json") + request.add_header("Content-Type", "application/json") + request.add_header("Authorization", "bearer " + auth) + log_url = endpoint + if log_context: + log_url = "{0} ({1})".format(log_url, log_context) + logger.info("Requesting {0}".format(log_url)) + + http_response = make_request_with_retry(request, auth, args.max_retries) + + status = http_response.getcode() + if status != 200: + raise Exception( + f"Unexpected HTTP {status} from {endpoint} " + f"(expected non-2xx to raise HTTPError)" + ) + + try: + response = json.loads(http_response.read().decode("utf-8")) + except (IncompleteRead, json.decoder.JSONDecodeError, TimeoutError) as e: + logger.warning(f"{type(e).__name__} reading GraphQL response") + if attempt < args.max_retries: + delay = calculate_retry_delay(attempt, {}) + logger.warning( + f"Retrying GraphQL read in {delay:.1f}s " + f"(attempt {attempt + 1}/{args.max_retries + 1})" + ) + time.sleep(delay) + continue + raise Exception( + f"Failed to read GraphQL response after {args.max_retries + 1} " + f"attempts for {endpoint}" + ) + + if ( + remaining := int(http_response.headers.get("x-ratelimit-remaining", 0)) + ) <= (args.throttle_limit or 0): + if args.throttle_limit: + logger.info( + f"Throttling: {remaining} requests left, pausing {args.throttle_pause}s" + ) + time.sleep(args.throttle_pause) + + errors = response.get("errors") or [] + if errors: + if any(error.get("type") == "RATE_LIMITED" for error in errors): + if attempt < args.max_retries: + delay = calculate_retry_delay(attempt, http_response.headers) + logger.warning( + f"GraphQL rate limit hit, retrying in {delay:.1f}s " + f"(attempt {attempt + 1}/{args.max_retries + 1})" + ) + time.sleep(delay) + continue + + messages = "; ".join( + error.get("message", str(error)) for error in errors + ) + raise Exception("GraphQL Error: {0}".format(messages)) + + return response.get("data", {}) + + raise Exception( + f"GraphQL request failed after {args.max_retries + 1} attempts" + ) # pragma: no cover + + def make_request_with_retry(request, auth, max_retries=5): """Make HTTP request with automatic retry for transient errors.""" @@ -1193,7 +1312,7 @@ def get_jwt_signed_url_via_markdown_api(url, token, repo_context): def extract_attachment_urls(item_data, issue_number=None, repository_full_name=None): - """Extract GitHub-hosted attachment URLs from issue/PR body and comments. + """Extract GitHub-hosted attachment URLs from issue/PR/discussion body and comments. What qualifies as an attachment? There is no "attachment" concept in the GitHub API - it's a user behavior pattern @@ -1335,33 +1454,29 @@ def redirect_request(self, req, fp, code, msg, headers, newurl): # and exclude the URL to avoid downloading from wrong repos return False + def extract_from_text(text): + text_cleaned = remove_code_blocks(text or "") + for pattern in patterns: + found_urls = re.findall(pattern, text_cleaned) + urls.extend([clean_url(url) for url in found_urls]) + + def extract_from_comments(comments): + for comment in comments: + extract_from_text(comment.get("body") or "") + # GitHub Discussions support one level of replies. Issues and pull + # requests don't have reply_data, so this is a no-op for them. + extract_from_comments(comment.get("reply_data") or []) + # Extract from body - body = item_data.get("body") or "" - # Remove code blocks before searching for URLs - body_cleaned = remove_code_blocks(body) - for pattern in patterns: - found_urls = re.findall(pattern, body_cleaned) - urls.extend([clean_url(url) for url in found_urls]) - - # Extract from issue comments + extract_from_text(item_data.get("body") or "") + + # Extract from issue comments and discussion comments if "comment_data" in item_data: - for comment in item_data["comment_data"]: - comment_body = comment.get("body") or "" - # Remove code blocks before searching for URLs - comment_cleaned = remove_code_blocks(comment_body) - for pattern in patterns: - found_urls = re.findall(pattern, comment_cleaned) - urls.extend([clean_url(url) for url in found_urls]) + extract_from_comments(item_data["comment_data"]) # Extract from PR regular comments if "comment_regular_data" in item_data: - for comment in item_data["comment_regular_data"]: - comment_body = comment.get("body") or "" - # Remove code blocks before searching for URLs - comment_cleaned = remove_code_blocks(comment_body) - for pattern in patterns: - found_urls = re.findall(pattern, comment_cleaned) - urls.extend([clean_url(url) for url in found_urls]) + extract_from_comments(item_data["comment_regular_data"]) regex_urls = list(set(urls)) # dedupe @@ -1463,20 +1578,24 @@ def resolve_filename_collision(filepath): def download_attachments( args, item_cwd, item_data, number, repository, item_type="issue" ): - """Download user-attachments from issue/PR body and comments with manifest. + """Download user-attachments from issue/PR/discussion body and comments with manifest. Args: args: Command line arguments - item_cwd: Working directory (issue_cwd or pulls_cwd) - item_data: Issue or PR data dict - number: Issue or PR number + item_cwd: Working directory (issue_cwd, pulls_cwd, or discussion_cwd) + item_data: Issue, PR, or discussion data dict + number: Issue, PR, or discussion number repository: Repository dict - item_type: "issue" or "pull" for logging/manifest + item_type: "issue", "pull", or "discussion" for logging/manifest """ import json from datetime import datetime, timezone - item_type_display = "issue" if item_type == "issue" else "pull request" + item_type_display = { + "issue": "issue", + "pull": "pull request", + "discussion": "discussion", + }.get(item_type, item_type) urls = extract_attachment_urls( item_data, issue_number=number, repository_full_name=repository["full_name"] @@ -1621,6 +1740,8 @@ def download_attachments( # Write manifest if attachment_metadata_list: manifest = { + "item_number": number, + "item_type": item_type, "issue_number": number, "issue_type": item_type, "repository": ( @@ -1888,6 +2009,9 @@ def backup_repositories(args, output_directory, repositories): if args.include_pulls or args.include_everything: backup_pulls(args, repo_cwd, repository, repos_template) + if args.include_discussions or args.include_everything: + backup_discussions(args, repo_cwd, repository) + if args.include_milestones or args.include_everything: backup_milestones(args, repo_cwd, repository, repos_template) @@ -1922,6 +2046,317 @@ def backup_repositories(args, output_directory, repositories): open(last_update_path, "w").write(last_update) +def _repository_owner_name(repository): + return repository["full_name"].split("/", 1) + + +def _connection_nodes(connection): + return [node for node in (connection or {}).get("nodes") or [] if node] + + +def retrieve_discussion_summaries(args, repository, since=None): + owner, name = _repository_owner_name(repository) + after = None + page = 1 + summaries = [] + newest_seen = None + discussions_enabled = None + total_count = 0 + + while True: + data = retrieve_graphql_data( + args, + DISCUSSION_LIST_QUERY, + { + "owner": owner, + "name": name, + "after": after, + "pageSize": DISCUSSION_PAGE_SIZE, + }, + log_context="discussion summaries {0} page {1}".format( + repository["full_name"], page + ), + ) + repository_data = data.get("repository") + if repository_data is None: + raise Exception( + "Repository {0} not found in GraphQL response".format( + repository["full_name"] + ) + ) + + discussions_enabled = repository_data.get("hasDiscussionsEnabled") + if not discussions_enabled: + return [], None, False, 0 + + discussions = repository_data.get("discussions") or {} + total_count = discussions.get("totalCount", total_count) + stop = False + + for discussion in _connection_nodes(discussions): + updated_at = discussion.get("updatedAt") + if updated_at and (newest_seen is None or updated_at > newest_seen): + newest_seen = updated_at + + if since and updated_at and updated_at < since: + stop = True + break + + summaries.append(discussion) + + page_info = discussions.get("pageInfo") or {} + if stop or not page_info.get("hasNextPage"): + break + + after = page_info.get("endCursor") + page += 1 + + return summaries, newest_seen, discussions_enabled, total_count + + +def retrieve_discussion_comment_replies(args, comment_id, after=None, log_context=None): + data = retrieve_graphql_data( + args, + DISCUSSION_REPLIES_QUERY, + { + "commentId": comment_id, + "repliesCursor": after, + "pageSize": DISCUSSION_PAGE_SIZE, + }, + log_context=log_context, + ) + node = data.get("node") or {} + return node.get("replies") or {} + + +def _discussion_comment_log_identifier(comment_node): + return ( + comment_node.get("databaseId") + or comment_node.get("url") + or comment_node.get("id") + ) + + +def _discussion_comment_with_replies( + args, comment_node, repository_full_name=None, discussion_number=None +): + replies_connection = comment_node.get("replies") or {} + replies = _connection_nodes(replies_connection) + reply_total_count = replies_connection.get("totalCount", len(replies)) + page_info = replies_connection.get("pageInfo") or {} + reply_page = 2 + + while page_info.get("hasNextPage"): + log_context = None + if repository_full_name and discussion_number is not None: + log_context = "discussion {0}#{1} comment {2} replies page {3}".format( + repository_full_name, + discussion_number, + _discussion_comment_log_identifier(comment_node), + reply_page, + ) + + replies_connection = retrieve_discussion_comment_replies( + args, + comment_node["id"], + page_info.get("endCursor"), + log_context=log_context, + ) + replies.extend(_connection_nodes(replies_connection)) + page_info = replies_connection.get("pageInfo") or {} + reply_page += 1 + + comment = {key: value for key, value in comment_node.items() if key != "replies"} + comment["reply_count"] = reply_total_count + comment["reply_data"] = replies + return comment + + +def retrieve_discussion(args, repository, number): + owner, name = _repository_owner_name(repository) + comments_cursor = None + comments_page = 1 + discussion_data = None + comments = [] + comment_total_count = 0 + + while True: + data = retrieve_graphql_data( + args, + DISCUSSION_DETAIL_QUERY, + { + "owner": owner, + "name": name, + "number": number, + "commentsCursor": comments_cursor, + "pageSize": DISCUSSION_PAGE_SIZE, + }, + log_context="discussion {0}#{1} details/comments page {2}".format( + repository["full_name"], number, comments_page + ), + ) + repository_data = data.get("repository") or {} + discussion = repository_data.get("discussion") + if discussion is None: + raise Exception( + "Discussion #{0} not found in {1}".format( + number, repository["full_name"] + ) + ) + + if discussion_data is None: + discussion_data = { + key: value for key, value in discussion.items() if key != "comments" + } + + comments_connection = discussion.get("comments") or {} + comment_total_count = comments_connection.get( + "totalCount", comment_total_count + ) + for comment_node in _connection_nodes(comments_connection): + comments.append( + _discussion_comment_with_replies( + args, comment_node, repository["full_name"], number + ) + ) + + page_info = comments_connection.get("pageInfo") or {} + if not page_info.get("hasNextPage"): + break + + comments_cursor = page_info.get("endCursor") + comments_page += 1 + + discussion_data["comment_count"] = comment_total_count + discussion_data["comment_data"] = comments + return discussion_data + + +def backup_discussions(args, repo_cwd, repository): + discussion_cwd = os.path.join(repo_cwd, "discussions") + if args.skip_existing and os.path.isdir(discussion_cwd): + return + + if not get_graphql_auth(args): + logger.info( + "Skipping {0} discussions since GitHub GraphQL API requires authentication".format( + repository["full_name"] + ) + ) + return + + discussions_since = None + discussion_last_update_path = os.path.join(discussion_cwd, "last_update") + if args.incremental and os.path.exists(discussion_last_update_path): + discussions_since = open(discussion_last_update_path).read().strip() + + logger.info("Retrieving {0} discussions".format(repository["full_name"])) + try: + ( + summaries, + newest_seen, + discussions_enabled, + total_count, + ) = retrieve_discussion_summaries(args, repository, since=discussions_since) + except Exception as e: + logger.warning( + "Unable to retrieve discussions for {0}, skipping: {1}".format( + repository["full_name"], e + ) + ) + return + + if not discussions_enabled: + logger.info( + "Discussions are not enabled for {0}, skipping".format( + repository["full_name"] + ) + ) + return + + mkdir_p(repo_cwd, discussion_cwd) + + if discussions_since: + logger.info( + "Saving {0} updated discussions to disk ({1} total)".format( + len(summaries), total_count + ) + ) + else: + logger.info("Saving {0} discussions to disk".format(len(summaries))) + + written_count = 0 + skipped_count = 0 + had_errors = False + for summary in summaries: + number = summary["number"] + discussion_file = os.path.join(discussion_cwd, "{0}.json".format(number)) + + if args.incremental_by_files and os.path.isfile(discussion_file): + modified = os.path.getmtime(discussion_file) + modified = datetime.fromtimestamp(modified).strftime("%Y-%m-%dT%H:%M:%SZ") + if modified > summary["updatedAt"]: + logger.info( + "Skipping discussion {0} because it wasn't modified since last backup".format( + number + ) + ) + skipped_count += 1 + continue + + try: + discussion = retrieve_discussion(args, repository, number) + except Exception as e: + logger.warning( + "Unable to retrieve discussion {0}#{1}, skipping: {2}".format( + repository["full_name"], number, e + ) + ) + had_errors = True + continue + + if args.include_attachments: + download_attachments( + args, + discussion_cwd, + discussion, + number, + repository, + item_type="discussion", + ) + + if json_dump_if_changed(discussion, discussion_file): + written_count += 1 + + if ( + args.incremental + and not had_errors + and newest_seen + and (not discussions_since or newest_seen > discussions_since) + ): + open(discussion_last_update_path, "w").write(newest_seen) + + attempted_count = len(summaries) - skipped_count + if not summaries: + logger.info("No discussions to save") + elif attempted_count == 0: + logger.info("{0} discussions skipped".format(skipped_count)) + elif written_count == attempted_count: + logger.info("Saved {0} discussions to disk".format(written_count)) + elif written_count == 0: + logger.info( + "{0} discussions unchanged, skipped write".format(attempted_count) + ) + else: + logger.info( + "Saved {0} discussions to disk ({1} unchanged, {2} skipped)".format( + written_count, + attempted_count - written_count, + skipped_count, + ) + ) + + def backup_issues(args, repo_cwd, repository, repos_template): has_issues_dir = os.path.isdir("{0}/issues/.git".format(repo_cwd)) if args.skip_existing and has_issues_dir: diff --git a/github_backup/graphql_queries.py b/github_backup/graphql_queries.py new file mode 100644 index 0000000..96eb552 --- /dev/null +++ b/github_backup/graphql_queries.py @@ -0,0 +1,292 @@ +"""GraphQL query templates used by github-backup.""" + +DISCUSSION_PAGE_SIZE = 100 + +DISCUSSION_LIST_QUERY = """ +query($owner: String!, $name: String!, $after: String, $pageSize: Int!) { + repository(owner: $owner, name: $name) { + hasDiscussionsEnabled + discussions( + first: $pageSize, + after: $after, + orderBy: {field: UPDATED_AT, direction: DESC} + ) { + totalCount + nodes { + id + number + title + updatedAt + } + pageInfo { + hasNextPage + endCursor + } + } + } +} +""" + +DISCUSSION_DETAIL_QUERY = """ +query( + $owner: String!, + $name: String!, + $number: Int!, + $commentsCursor: String, + $pageSize: Int! +) { + repository(owner: $owner, name: $name) { + discussion(number: $number) { + activeLockReason + answer { + id + databaseId + url + } + answerChosenAt + answerChosenBy { + ...ActorFields + } + author { + ...ActorFields + } + authorAssociation + body + bodyHTML + bodyText + category { + createdAt + description + emoji + emojiHTML + id + isAnswerable + name + slug + updatedAt + } + closed + closedAt + createdAt + createdViaEmail + databaseId + editor { + ...ActorFields + } + id + includesCreatedEdit + isAnswered + labels(first: 100) { + totalCount + nodes { + id + name + color + description + } + } + lastEditedAt + locked + number + poll { + id + question + totalVoteCount + options(first: 100) { + totalCount + nodes { + id + option + totalVoteCount + } + } + } + publishedAt + reactionGroups { + ...ReactionGroupFields + } + resourcePath + stateReason + title + updatedAt + upvoteCount + url + comments(first: $pageSize, after: $commentsCursor) { + totalCount + nodes { + ...DiscussionCommentFields + replies(first: $pageSize) { + totalCount + nodes { + ...DiscussionReplyFields + } + pageInfo { + hasNextPage + endCursor + } + } + } + pageInfo { + hasNextPage + endCursor + } + } + } + } +} + +fragment ActorFields on Actor { + avatarUrl + login + resourcePath + url +} + +fragment ReactionGroupFields on ReactionGroup { + content + reactors { + totalCount + } +} + +fragment DiscussionCommentFields on DiscussionComment { + author { + ...ActorFields + } + authorAssociation + body + bodyHTML + bodyText + createdAt + createdViaEmail + databaseId + deletedAt + editor { + ...ActorFields + } + id + includesCreatedEdit + isAnswer + isMinimized + lastEditedAt + minimizedReason + publishedAt + reactionGroups { + ...ReactionGroupFields + } + replyTo { + id + databaseId + url + } + resourcePath + updatedAt + upvoteCount + url +} + +fragment DiscussionReplyFields on DiscussionComment { + author { + ...ActorFields + } + authorAssociation + body + bodyHTML + bodyText + createdAt + createdViaEmail + databaseId + deletedAt + editor { + ...ActorFields + } + id + includesCreatedEdit + isAnswer + isMinimized + lastEditedAt + minimizedReason + publishedAt + reactionGroups { + ...ReactionGroupFields + } + replyTo { + id + databaseId + url + } + resourcePath + updatedAt + upvoteCount + url +} +""" + +DISCUSSION_REPLIES_QUERY = """ +query($commentId: ID!, $repliesCursor: String, $pageSize: Int!) { + node(id: $commentId) { + ... on DiscussionComment { + replies(first: $pageSize, after: $repliesCursor) { + totalCount + nodes { + ...DiscussionReplyFields + } + pageInfo { + hasNextPage + endCursor + } + } + } + } +} + +fragment ActorFields on Actor { + avatarUrl + login + resourcePath + url +} + +fragment ReactionGroupFields on ReactionGroup { + content + reactors { + totalCount + } +} + +fragment DiscussionReplyFields on DiscussionComment { + author { + ...ActorFields + } + authorAssociation + body + bodyHTML + bodyText + createdAt + createdViaEmail + databaseId + deletedAt + editor { + ...ActorFields + } + id + includesCreatedEdit + isAnswer + isMinimized + lastEditedAt + minimizedReason + publishedAt + reactionGroups { + ...ReactionGroupFields + } + replyTo { + id + databaseId + url + } + resourcePath + updatedAt + upvoteCount + url +} +""" diff --git a/tests/test_auth.py b/tests/test_auth.py index 504c822..0102878 100644 --- a/tests/test_auth.py +++ b/tests/test_auth.py @@ -56,6 +56,16 @@ def test_token_from_gh_is_cached(create_args): mock_check_output.assert_called_once() +def test_graphql_auth_strips_basic_auth_suffix_for_gh_cli_token(create_args): + args = create_args(token_from_gh=True) + + with patch( + "github_backup.github_backup.subprocess.check_output", + return_value=b"gho_graphql_token\n", + ): + assert github_backup.get_graphql_auth(args) == "gho_graphql_token" + + def test_token_from_gh_rejects_as_app(create_args): args = create_args(token_from_gh=True, as_app=True) diff --git a/tests/test_discussions.py b/tests/test_discussions.py new file mode 100644 index 0000000..89fd8dd --- /dev/null +++ b/tests/test_discussions.py @@ -0,0 +1,222 @@ +"""Tests for GitHub Discussions backup support.""" + +import json +import os +from unittest.mock import patch + +from github_backup import github_backup + + +def test_parse_args_discussions_flag(): + args = github_backup.parse_args(["--discussions", "testuser"]) + assert args.include_discussions is True + + +def test_retrieve_discussion_summaries_stops_at_incremental_since(create_args): + args = create_args() + repository = {"full_name": "owner/repo"} + + page = { + "repository": { + "hasDiscussionsEnabled": True, + "discussions": { + "totalCount": 3, + "nodes": [ + {"number": 3, "title": "new", "updatedAt": "2026-02-01T00:00:00Z"}, + {"number": 2, "title": "also new", "updatedAt": "2026-01-10T00:00:00Z"}, + {"number": 1, "title": "old", "updatedAt": "2025-12-01T00:00:00Z"}, + ], + "pageInfo": {"hasNextPage": True, "endCursor": "NEXT"}, + }, + } + } + + with patch( + "github_backup.github_backup.retrieve_graphql_data", return_value=page + ) as mock_retrieve: + summaries, newest, enabled, total = github_backup.retrieve_discussion_summaries( + args, repository, since="2026-01-01T00:00:00Z" + ) + + assert enabled is True + assert total == 3 + assert newest == "2026-02-01T00:00:00Z" + assert [item["number"] for item in summaries] == [3, 2] + # The old discussion stops pagination, so the next page is not requested. + assert mock_retrieve.call_count == 1 + assert ( + mock_retrieve.call_args.kwargs["log_context"] + == "discussion summaries owner/repo page 1" + ) + + +def test_retrieve_discussion_summaries_disabled_discussions(create_args): + args = create_args() + repository = {"full_name": "owner/repo"} + + with patch( + "github_backup.github_backup.retrieve_graphql_data", + return_value={"repository": {"hasDiscussionsEnabled": False}}, + ): + summaries, newest, enabled, total = github_backup.retrieve_discussion_summaries( + args, repository + ) + + assert summaries == [] + assert newest is None + assert enabled is False + assert total == 0 + + +def _comment(comment_id, body, replies=None, replies_has_next=False): + replies = replies or [] + return { + "id": comment_id, + "body": body, + "replies": { + "totalCount": len(replies) + (1 if replies_has_next else 0), + "nodes": replies, + "pageInfo": { + "hasNextPage": replies_has_next, + "endCursor": "REPLIES2" if replies_has_next else None, + }, + }, + } + + +def _discussion_page(comment_nodes, has_next=False): + return { + "repository": { + "discussion": { + "number": 42, + "title": "Discussion title", + "updatedAt": "2026-02-01T00:00:00Z", + "comments": { + "totalCount": 2, + "nodes": comment_nodes, + "pageInfo": { + "hasNextPage": has_next, + "endCursor": "COMMENTS2" if has_next else None, + }, + }, + } + } + } + + +def test_retrieve_discussion_paginates_comments_and_replies(create_args): + args = create_args() + repository = {"full_name": "owner/repo"} + + reply_1 = {"id": "reply-1", "body": "first reply"} + reply_2 = {"id": "reply-2", "body": "second reply"} + comment_1 = _comment("comment-1", "first comment", [reply_1], replies_has_next=True) + comment_2 = _comment("comment-2", "second comment") + + responses = [ + _discussion_page([comment_1], has_next=True), + { + "node": { + "replies": { + "totalCount": 2, + "nodes": [reply_2], + "pageInfo": {"hasNextPage": False, "endCursor": None}, + } + } + }, + _discussion_page([comment_2], has_next=False), + ] + + with patch( + "github_backup.github_backup.retrieve_graphql_data", side_effect=responses + ) as mock_retrieve: + discussion = github_backup.retrieve_discussion(args, repository, 42) + + assert discussion["number"] == 42 + assert discussion["comment_count"] == 2 + assert len(discussion["comment_data"]) == 2 + assert discussion["comment_data"][0]["body"] == "first comment" + assert discussion["comment_data"][0]["reply_count"] == 2 + assert [r["body"] for r in discussion["comment_data"][0]["reply_data"]] == [ + "first reply", + "second reply", + ] + assert discussion["comment_data"][1]["body"] == "second comment" + assert mock_retrieve.call_count == 3 + assert [ + call.kwargs["log_context"] for call in mock_retrieve.call_args_list + ] == [ + "discussion owner/repo#42 details/comments page 1", + "discussion owner/repo#42 comment comment-1 replies page 2", + "discussion owner/repo#42 details/comments page 2", + ] + + +def test_backup_discussions_uses_incremental_checkpoint(create_args, tmp_path): + args = create_args(token_classic="fake_token", include_discussions=True, incremental=True) + repository = {"full_name": "owner/repo"} + discussions_dir = tmp_path / "discussions" + discussions_dir.mkdir() + (discussions_dir / "last_update").write_text("2026-01-01T00:00:00Z") + + def fake_summaries(passed_args, passed_repository, since=None): + assert passed_args is args + assert passed_repository == repository + assert since == "2026-01-01T00:00:00Z" + return ( + [{"number": 7, "title": "updated", "updatedAt": "2026-02-01T00:00:00Z"}], + "2026-02-01T00:00:00Z", + True, + 1, + ) + + with patch( + "github_backup.github_backup.retrieve_discussion_summaries", + side_effect=fake_summaries, + ), patch( + "github_backup.github_backup.retrieve_discussion", + return_value={"number": 7, "title": "updated"}, + ): + github_backup.backup_discussions(args, tmp_path, repository) + + with open(discussions_dir / "7.json", encoding="utf-8") as f: + assert json.load(f) == {"number": 7, "title": "updated"} + assert (discussions_dir / "last_update").read_text() == "2026-02-01T00:00:00Z" + + +def test_backup_discussions_does_not_advance_checkpoint_on_discussion_error( + create_args, tmp_path +): + args = create_args(token_classic="fake_token", include_discussions=True, incremental=True) + repository = {"full_name": "owner/repo"} + discussions_dir = tmp_path / "discussions" + discussions_dir.mkdir() + (discussions_dir / "last_update").write_text("2026-01-01T00:00:00Z") + + with patch( + "github_backup.github_backup.retrieve_discussion_summaries", + return_value=( + [{"number": 7, "title": "updated", "updatedAt": "2026-02-01T00:00:00Z"}], + "2026-02-01T00:00:00Z", + True, + 1, + ), + ), patch( + "github_backup.github_backup.retrieve_discussion", + side_effect=Exception("temporary GraphQL error"), + ): + github_backup.backup_discussions(args, tmp_path, repository) + + assert (discussions_dir / "last_update").read_text() == "2026-01-01T00:00:00Z" + assert not os.path.exists(discussions_dir / "7.json") + + +def test_backup_discussions_skips_without_auth(create_args, tmp_path): + args = create_args(include_discussions=True) + repository = {"full_name": "owner/repo"} + + with patch("github_backup.github_backup.retrieve_discussion_summaries") as mock_retrieve: + github_backup.backup_discussions(args, tmp_path, repository) + + assert not mock_retrieve.called + assert not os.path.exists(tmp_path / "discussions") diff --git a/tests/test_retrieve_data.py b/tests/test_retrieve_data.py index 014c309..51848ef 100644 --- a/tests/test_retrieve_data.py +++ b/tests/test_retrieve_data.py @@ -1,6 +1,7 @@ """Tests for retrieve_data function.""" import json +import logging import socket from unittest.mock import Mock, patch from urllib.error import HTTPError, URLError @@ -355,6 +356,33 @@ def mock_urlopen(*args, **kwargs): ) # 1 initial + 5 retries = 6 attempts +class TestRetrieveGraphqlDataLogging: + """Tests for GraphQL request logging.""" + + def test_logs_graphql_context(self, create_args, caplog): + args = create_args(token_classic="fake_token") + mock_response = Mock() + mock_response.getcode.return_value = 200 + mock_response.read.return_value = json.dumps({"data": {}}).encode("utf-8") + mock_response.headers = {"x-ratelimit-remaining": "5000"} + + caplog.set_level(logging.INFO, logger="github_backup.github_backup") + with patch( + "github_backup.github_backup.make_request_with_retry", + return_value=mock_response, + ): + github_backup.retrieve_graphql_data( + args, + "query { viewer { login } }", + log_context="discussion owner/repo#1", + ) + + assert ( + "Requesting https://api.github.com/graphql (discussion owner/repo#1)" + in caplog.text + ) + + class TestRetrieveDataThrottling: """Tests for throttling behavior in retrieve_data.""" From 24b3fdb4f34f85be090c335426e41403331e3ddf Mon Sep 17 00:00:00 2001 From: Duncan Ogilvie Date: Sun, 26 Apr 2026 14:08:42 +0200 Subject: [PATCH 126/148] Add support for pull request reviews Closes #124 --- CHANGES.rst | 2 + README.rst | 16 ++- github_backup/github_backup.py | 148 ++++++++++++++++++-- tests/test_pull_reviews.py | 237 +++++++++++++++++++++++++++++++++ 4 files changed, 388 insertions(+), 15 deletions(-) create mode 100644 tests/test_pull_reviews.py diff --git a/CHANGES.rst b/CHANGES.rst index 50f8d54..b790ce1 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -5,6 +5,8 @@ Unreleased ---------- - Add GitHub Discussions backups via GraphQL, including comments, replies, optional attachment downloads, and per-repository incremental checkpoints. +- Add pull request review backups with ``--pull-reviews`` and one-time + incremental backfill for existing backups. - Add ``--token-from-gh`` to read authentication from ``gh auth token``. diff --git a/README.rst b/README.rst index 4135743..52d7222 100644 --- a/README.rst +++ b/README.rst @@ -42,7 +42,8 @@ CLI Help output:: [--starred] [--all-starred] [--starred-skip-size-over MB] [--watched] [--followers] [--following] [--all] [--issues] [--issue-comments] [--issue-events] [--pulls] - [--pull-comments] [--pull-commits] [--pull-details] + [--pull-comments] [--pull-reviews] [--pull-commits] + [--pull-details] [--labels] [--hooks] [--milestones] [--security-advisories] [--discussions] [--repositories] [--bare] [--no-prune] [--lfs] [--wikis] [--gists] [--starred-gists] @@ -97,6 +98,7 @@ CLI Help output:: --issue-events include issue events in backup --pulls include pull requests in backup --pull-comments include pull request review comments in backup + --pull-reviews include pull request reviews in backup --pull-commits include pull request commits in backup --pull-details include more pull request details in backup [*] --labels include labels in backup @@ -340,6 +342,14 @@ For finer control, avoid using ``--assets`` with starred repos, or use ``--skip- Alternatively, consider just storing links to starred repos in JSON format with ``--starred``. +About pull request reviews +-------------------------- + +Use ``--pull-reviews`` with ``--pulls`` to include GitHub pull request review metadata under each pull request's ``review_data`` key. Reviews are separate from review comments: ``--pull-comments`` backs up inline review comments via ``comment_data`` and regular PR conversation comments via ``comment_regular_data``, while ``--pull-reviews`` backs up review state, submitted time, commit ID, and the top-level review body. + +``--pull-reviews`` is included in ``--all``. Incremental backups use a per-repository checkpoint at ``repositories/{repo}/pulls/reviews_last_update``. If ``--pull-reviews`` is enabled on an existing incremental backup, the first run performs a one-time backfill for pull request reviews so older PRs are not skipped by the existing repository checkpoint. Existing ``comment_data``, ``comment_regular_data`` and ``commit_data`` fields are preserved when only review data is being added. + + Incremental Backup ------------------ @@ -431,14 +441,14 @@ Quietly and incrementally backup useful Github user data (public and private rep export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN GH_USER=YOUR-GITHUB-USER - github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --security-advisories --discussions --repositories --wikis --releases --assets --attachments --pull-details --gists --starred-gists $GH_USER + github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-reviews --pull-commits --labels --milestones --security-advisories --discussions --repositories --wikis --releases --assets --attachments --pull-details --gists --starred-gists $GH_USER Debug an error/block or incomplete backup into a temporary directory. Omit "incremental" to fill a previous incomplete backup. :: export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN GH_USER=YOUR-GITHUB-USER - github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --discussions --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER + github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-reviews --pull-commits --labels --milestones --discussions --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER Pipe a token from stdin to avoid storing it in environment variables or command history (Unix-like systems only):: diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index c1245bd..054d0c6 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -293,6 +293,12 @@ def parse_args(args=None): dest="include_pull_comments", help="include pull request review comments in backup", ) + parser.add_argument( + "--pull-reviews", + action="store_true", + dest="include_pull_reviews", + help="include pull request reviews in backup", + ) parser.add_argument( "--pull-commits", action="store_true", @@ -2427,6 +2433,57 @@ def backup_issues(args, repo_cwd, repository, repos_template): os.replace(issue_file + ".temp", issue_file) # Atomic write +PULL_OPTIONAL_DATA_KEYS = ( + "comment_regular_data", + "comment_data", + "commit_data", + "review_data", +) +PULL_REVIEWS_LAST_UPDATE_FILENAME = "reviews_last_update" + + +def read_json_file_if_exists(path): + if not os.path.isfile(path): + return None + + try: + with codecs.open(path, "r", encoding="utf-8") as f: + return json.load(f) + except (OSError, UnicodeDecodeError, json.decoder.JSONDecodeError) as e: + logger.debug("Error reading existing JSON file {0}: {1}".format(path, e)) + return None + + +def restore_existing_pull_optional_data(pull, existing_pull): + if not existing_pull: + return + + for key in PULL_OPTIONAL_DATA_KEYS: + if key not in pull and key in existing_pull: + pull[key] = existing_pull[key] + + +def get_pull_reviews_since(args, pulls_cwd): + args_since = getattr(args, "since", None) + if not args.incremental: + return args_since, None, None + + reviews_last_update_path = os.path.join( + pulls_cwd, PULL_REVIEWS_LAST_UPDATE_FILENAME + ) + if not os.path.exists(reviews_last_update_path): + # One-time backfill for existing incremental backups: if the user adds + # --pull-reviews after a repository checkpoint already exists, the + # repository-level checkpoint would otherwise skip old PRs forever. + return None, None, reviews_last_update_path + + reviews_since = open(reviews_last_update_path).read().strip() + if args_since and reviews_since: + return min(args_since, reviews_since), reviews_since, reviews_last_update_path + + return args_since or reviews_since, reviews_since, reviews_last_update_path + + def backup_pulls(args, repo_cwd, repository, repos_template): has_pulls_dir = os.path.isdir("{0}/pulls/.git".format(repo_cwd)) if args.skip_existing and has_pulls_dir: @@ -2436,7 +2493,20 @@ def backup_pulls(args, repo_cwd, repository, repos_template): pulls_cwd = os.path.join(repo_cwd, "pulls") mkdir_p(repo_cwd, pulls_cwd) + include_pull_reviews = args.include_pull_reviews or args.include_everything + repository_since = getattr(args, "since", None) + pulls_since = repository_since + pull_reviews_since = None + pull_reviews_last_update_path = None + if include_pull_reviews: + ( + pulls_since, + pull_reviews_since, + pull_reviews_last_update_path, + ) = get_pull_reviews_since(args, pulls_cwd) + pulls = {} + newest_pull_update = None _pulls_template = "{0}/{1}/pulls".format(repos_template, repository["full_name"]) _issue_template = "{0}/{1}/issues".format(repos_template, repository["full_name"]) query_args = { @@ -2446,27 +2516,43 @@ def backup_pulls(args, repo_cwd, repository, repos_template): "direction": "desc", } + def track_newest_pull_update(pull): + nonlocal newest_pull_update + updated_at = pull.get("updated_at") + if updated_at and ( + newest_pull_update is None or updated_at > newest_pull_update + ): + newest_pull_update = updated_at + + def pull_is_due_for_repository_checkpoint(pull): + return not repository_since or pull["updated_at"] >= repository_since + if not args.include_pull_details: pull_states = ["open", "closed"] for pull_state in pull_states: query_args["state"] = pull_state _pulls = retrieve_data(args, _pulls_template, query_args=query_args) for pull in _pulls: - if args.since and pull["updated_at"] < args.since: + track_newest_pull_update(pull) + if pulls_since and pull["updated_at"] < pulls_since: break - if not args.since or pull["updated_at"] >= args.since: + if not pulls_since or pull["updated_at"] >= pulls_since: pulls[pull["number"]] = pull else: _pulls = retrieve_data(args, _pulls_template, query_args=query_args) for pull in _pulls: - if args.since and pull["updated_at"] < args.since: + track_newest_pull_update(pull) + if pulls_since and pull["updated_at"] < pulls_since: break - if not args.since or pull["updated_at"] >= args.since: - pulls[pull["number"]] = retrieve_data( - args, - _pulls_template + "/{}".format(pull["number"]), - paginated=False, - )[0] + if not pulls_since or pull["updated_at"] >= pulls_since: + if pull_is_due_for_repository_checkpoint(pull): + pulls[pull["number"]] = retrieve_data( + args, + _pulls_template + "/{}".format(pull["number"]), + paginated=False, + )[0] + else: + pulls[pull["number"]] = pull logger.info("Saving {0} pull requests to disk".format(len(list(pulls.keys())))) # Comments from pulls API are only _review_ comments @@ -2476,24 +2562,50 @@ def backup_pulls(args, repo_cwd, repository, repos_template): comments_regular_template = _issue_template + "/{0}/comments" comments_template = _pulls_template + "/{0}/comments" commits_template = _pulls_template + "/{0}/commits" + reviews_template = _pulls_template + "/{0}/reviews" + pull_review_errors = False + for number, pull in list(pulls.items()): pull_file = "{0}/{1}.json".format(pulls_cwd, number) + existing_pull = read_json_file_if_exists(pull_file) + needs_review_backfill = ( + include_pull_reviews + and (not existing_pull or "review_data" not in existing_pull) + ) + if args.incremental_by_files and os.path.isfile(pull_file): modified = os.path.getmtime(pull_file) modified = datetime.fromtimestamp(modified).strftime("%Y-%m-%dT%H:%M:%SZ") - if modified > pull["updated_at"]: + if modified > pull["updated_at"] and not needs_review_backfill: logger.info( "Skipping pull request {0} because it wasn't modified since last backup".format( number ) ) continue - if args.include_pull_comments or args.include_everything: + + should_fetch_non_review_data = pull_is_due_for_repository_checkpoint(pull) + if ( + args.include_pull_comments or args.include_everything + ) and should_fetch_non_review_data: template = comments_regular_template.format(number) pulls[number]["comment_regular_data"] = retrieve_data(args, template) template = comments_template.format(number) pulls[number]["comment_data"] = retrieve_data(args, template) - if args.include_pull_commits or args.include_everything: + if include_pull_reviews: + template = reviews_template.format(number) + try: + pulls[number]["review_data"] = retrieve_data(args, template) + except Exception as e: + pull_review_errors = True + logger.warning( + "Unable to retrieve reviews for pull request {0}#{1}, skipping reviews: {2}".format( + repository["full_name"], number, e + ) + ) + if ( + args.include_pull_commits or args.include_everything + ) and should_fetch_non_review_data: template = commits_template.format(number) pulls[number]["commit_data"] = retrieve_data(args, template) if args.include_attachments: @@ -2501,10 +2613,22 @@ def backup_pulls(args, repo_cwd, repository, repos_template): args, pulls_cwd, pulls[number], number, repository, item_type="pull" ) + restore_existing_pull_optional_data(pull, existing_pull) + with codecs.open(pull_file + ".temp", "w", encoding="utf-8") as f: json_dump(pull, f) os.replace(pull_file + ".temp", pull_file) # Atomic write + if ( + include_pull_reviews + and args.incremental + and pull_reviews_last_update_path + and newest_pull_update + and not pull_review_errors + and (not pull_reviews_since or newest_pull_update > pull_reviews_since) + ): + open(pull_reviews_last_update_path, "w").write(newest_pull_update) + def backup_milestones(args, repo_cwd, repository, repos_template): milestone_cwd = os.path.join(repo_cwd, "milestones") diff --git a/tests/test_pull_reviews.py b/tests/test_pull_reviews.py new file mode 100644 index 0000000..6130269 --- /dev/null +++ b/tests/test_pull_reviews.py @@ -0,0 +1,237 @@ +"""Tests for pull request review backups.""" + +import json +import os + +from github_backup import github_backup + + +def test_parse_args_pull_reviews_flag(): + args = github_backup.parse_args(["--pull-reviews", "testuser"]) + assert args.include_pull_reviews is True + + +def test_backup_pulls_includes_review_data(create_args, tmp_path, monkeypatch): + args = create_args(include_pulls=True, include_pull_reviews=True) + repository = {"full_name": "owner/repo"} + calls = [] + + def fake_retrieve_data(passed_args, template, query_args=None, paginated=True): + calls.append((template, query_args)) + if template == "https://api.github.com/repos/owner/repo/pulls": + if query_args["state"] == "open": + return [ + { + "number": 1, + "updated_at": "2026-02-01T00:00:00Z", + "title": "Add feature", + } + ] + return [] + if template == "https://api.github.com/repos/owner/repo/pulls/1/reviews": + return [ + { + "id": 123, + "state": "APPROVED", + "body": "Looks good", + "submitted_at": "2026-02-01T00:00:00Z", + } + ] + raise AssertionError("Unexpected template: {0}".format(template)) + + monkeypatch.setattr(github_backup, "retrieve_data", fake_retrieve_data) + + github_backup.backup_pulls( + args, tmp_path, repository, "https://api.github.com/repos" + ) + + with open(tmp_path / "pulls" / "1.json", encoding="utf-8") as f: + pull = json.load(f) + + assert pull["review_data"] == [ + { + "body": "Looks good", + "id": 123, + "state": "APPROVED", + "submitted_at": "2026-02-01T00:00:00Z", + } + ] + assert ( + "https://api.github.com/repos/owner/repo/pulls/1/reviews", + None, + ) in calls + + +def test_pull_reviews_backfill_ignores_repository_checkpoint( + create_args, tmp_path, monkeypatch +): + args = create_args( + include_pulls=True, + include_pull_reviews=True, + incremental=True, + ) + args.since = "2026-01-01T00:00:00Z" + repository = {"full_name": "owner/repo"} + + def fake_retrieve_data(passed_args, template, query_args=None, paginated=True): + if template == "https://api.github.com/repos/owner/repo/pulls": + if query_args["state"] == "open": + return [ + { + "number": 1, + "updated_at": "2025-01-01T00:00:00Z", + "title": "Old pull request", + } + ] + return [] + if template == "https://api.github.com/repos/owner/repo/pulls/1/reviews": + return [{"id": 123, "state": "APPROVED"}] + raise AssertionError("Unexpected template: {0}".format(template)) + + monkeypatch.setattr(github_backup, "retrieve_data", fake_retrieve_data) + + github_backup.backup_pulls( + args, tmp_path, repository, "https://api.github.com/repos" + ) + + with open(tmp_path / "pulls" / "1.json", encoding="utf-8") as f: + pull = json.load(f) + + assert pull["review_data"] == [{"id": 123, "state": "APPROVED"}] + assert (tmp_path / "pulls" / "reviews_last_update").read_text() == ( + "2025-01-01T00:00:00Z" + ) + + +def test_pull_reviews_uses_review_checkpoint_when_older_than_repository_checkpoint( + create_args, tmp_path, monkeypatch +): + args = create_args( + include_pulls=True, + include_pull_reviews=True, + incremental=True, + ) + args.since = "2026-01-01T00:00:00Z" + repository = {"full_name": "owner/repo"} + pulls_dir = tmp_path / "pulls" + pulls_dir.mkdir() + (pulls_dir / "reviews_last_update").write_text("2025-01-01T00:00:00Z") + + def fake_retrieve_data(passed_args, template, query_args=None, paginated=True): + if template == "https://api.github.com/repos/owner/repo/pulls": + if query_args["state"] == "open": + return [ + { + "number": 1, + "updated_at": "2025-06-01T00:00:00Z", + "title": "Review changed while feature was disabled", + }, + { + "number": 2, + "updated_at": "2024-12-01T00:00:00Z", + "title": "Too old", + }, + ] + return [] + if template == "https://api.github.com/repos/owner/repo/pulls/1/reviews": + return [{"id": 123, "state": "COMMENTED"}] + raise AssertionError("Unexpected template: {0}".format(template)) + + monkeypatch.setattr(github_backup, "retrieve_data", fake_retrieve_data) + + github_backup.backup_pulls( + args, tmp_path, repository, "https://api.github.com/repos" + ) + + assert os.path.exists(tmp_path / "pulls" / "1.json") + assert not os.path.exists(tmp_path / "pulls" / "2.json") + assert (tmp_path / "pulls" / "reviews_last_update").read_text() == ( + "2025-06-01T00:00:00Z" + ) + + +def test_pull_reviews_preserves_existing_optional_pull_data( + create_args, tmp_path, monkeypatch +): + args = create_args(include_pulls=True, include_pull_reviews=True) + repository = {"full_name": "owner/repo"} + pulls_dir = tmp_path / "pulls" + pulls_dir.mkdir() + with open(pulls_dir / "1.json", "w", encoding="utf-8") as f: + json.dump( + { + "number": 1, + "updated_at": "2026-01-01T00:00:00Z", + "comment_data": [{"id": 10, "body": "inline comment"}], + "comment_regular_data": [{"id": 11, "body": "regular comment"}], + "commit_data": [{"sha": "abc"}], + }, + f, + ) + + def fake_retrieve_data(passed_args, template, query_args=None, paginated=True): + if template == "https://api.github.com/repos/owner/repo/pulls": + if query_args["state"] == "open": + return [ + { + "number": 1, + "updated_at": "2026-02-01T00:00:00Z", + "title": "Add reviews", + } + ] + return [] + if template == "https://api.github.com/repos/owner/repo/pulls/1/reviews": + return [{"id": 123, "state": "APPROVED"}] + raise AssertionError("Unexpected template: {0}".format(template)) + + monkeypatch.setattr(github_backup, "retrieve_data", fake_retrieve_data) + + github_backup.backup_pulls( + args, tmp_path, repository, "https://api.github.com/repos" + ) + + with open(pulls_dir / "1.json", encoding="utf-8") as f: + pull = json.load(f) + + assert pull["review_data"] == [{"id": 123, "state": "APPROVED"}] + assert pull["comment_data"] == [{"id": 10, "body": "inline comment"}] + assert pull["comment_regular_data"] == [{"id": 11, "body": "regular comment"}] + assert pull["commit_data"] == [{"sha": "abc"}] + + +def test_pull_reviews_does_not_advance_checkpoint_on_review_error( + create_args, tmp_path, monkeypatch +): + args = create_args( + include_pulls=True, + include_pull_reviews=True, + incremental=True, + ) + args.since = "2026-01-01T00:00:00Z" + repository = {"full_name": "owner/repo"} + pulls_dir = tmp_path / "pulls" + pulls_dir.mkdir() + (pulls_dir / "reviews_last_update").write_text("2025-01-01T00:00:00Z") + + def fake_retrieve_data(passed_args, template, query_args=None, paginated=True): + if template == "https://api.github.com/repos/owner/repo/pulls": + if query_args["state"] == "open": + return [ + { + "number": 1, + "updated_at": "2025-06-01T00:00:00Z", + "title": "Review retrieval fails", + } + ] + return [] + if template == "https://api.github.com/repos/owner/repo/pulls/1/reviews": + raise Exception("temporary API failure") + raise AssertionError("Unexpected template: {0}".format(template)) + + monkeypatch.setattr(github_backup, "retrieve_data", fake_retrieve_data) + + github_backup.backup_pulls( + args, tmp_path, repository, "https://api.github.com/repos" + ) + + assert (pulls_dir / "reviews_last_update").read_text() == "2025-01-01T00:00:00Z" From b3a8241c9ab5930acfae2014d6a48a4feabe95ae Mon Sep 17 00:00:00 2001 From: Duncan Ogilvie Date: Sun, 26 Apr 2026 15:03:48 +0200 Subject: [PATCH 127/148] Implement per-resource last_update timestamps Closes #62 --- CHANGES.rst | 5 + README.rst | 12 +- github_backup/github_backup.py | 167 +++++++++++++++++--- tests/test_incremental_per_repository.py | 189 +++++++++++++++++++++++ 4 files changed, 348 insertions(+), 25 deletions(-) create mode 100644 tests/test_incremental_per_repository.py diff --git a/CHANGES.rst b/CHANGES.rst index b790ce1..6cf9f17 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -7,6 +7,11 @@ Unreleased optional attachment downloads, and per-repository incremental checkpoints. - Add pull request review backups with ``--pull-reviews`` and one-time incremental backfill for existing backups. +- Store incremental ``last_update`` checkpoints per repository resource instead + of using one global checkpoint for the whole output directory. Existing + backups use the legacy global checkpoint as a migration fallback, and the + legacy file is removed once existing issue/pull backups have resource + checkpoints (#62). - Add ``--token-from-gh`` to read authentication from ``gh auth token``. diff --git a/README.rst b/README.rst index 52d7222..3a4be3b 100644 --- a/README.rst +++ b/README.rst @@ -347,15 +347,19 @@ About pull request reviews Use ``--pull-reviews`` with ``--pulls`` to include GitHub pull request review metadata under each pull request's ``review_data`` key. Reviews are separate from review comments: ``--pull-comments`` backs up inline review comments via ``comment_data`` and regular PR conversation comments via ``comment_regular_data``, while ``--pull-reviews`` backs up review state, submitted time, commit ID, and the top-level review body. -``--pull-reviews`` is included in ``--all``. Incremental backups use a per-repository checkpoint at ``repositories/{repo}/pulls/reviews_last_update``. If ``--pull-reviews`` is enabled on an existing incremental backup, the first run performs a one-time backfill for pull request reviews so older PRs are not skipped by the existing repository checkpoint. Existing ``comment_data``, ``comment_regular_data`` and ``commit_data`` fields are preserved when only review data is being added. +``--pull-reviews`` is included in ``--all``. Incremental backups use a per-repository checkpoint at ``repositories/{repo}/pulls/reviews_last_update``. If ``--pull-reviews`` is enabled on an existing incremental backup, the first run performs a one-time backfill for pull request reviews so older PRs are not skipped by the existing pull request checkpoint. Existing ``comment_data``, ``comment_regular_data`` and ``commit_data`` fields are preserved when only review data is being added. Incremental Backup ------------------ -Using (``-i, --incremental``) will only request new data from the API **since the last run (successful or not)**. e.g. only request issues from the API since the last run. +Using (``-i, --incremental``) will only request new data from the API **since the last successful resource backup**. e.g. only request issues from the API since the last issue backup for that repository. -This means any blocking errors on previous runs can cause a large amount of missing data in backups. +Incremental checkpoints for issue and pull request API backups are stored per resource in that repository's backup directory (for example ``repositories/{repo}/issues/last_update``, ``repositories/{repo}/pulls/last_update`` or ``starred/{owner}/{repo}/pulls/last_update``). Older versions stored a single global ``last_update`` file in the output directory root. During migration, the legacy global checkpoint is used as a fallback only for resource directories that already contain backup data but do not yet have their own checkpoint. New repositories or newly enabled resources with no existing data get a full backup instead of inheriting an unrelated global checkpoint. + +After all existing issue and pull request resource directories have per-resource checkpoints, the legacy global ``last_update`` file is removed automatically. + +This means any blocking errors on previous runs can cause missing data in backups for the affected repository resource. Using (``--incremental-by-files``) will request new data from the API **based on when the file was modified on filesystem**. e.g. if you modify the file yourself you may miss something. @@ -368,7 +372,7 @@ Known blocking errors Some errors will block the backup run by exiting the script. e.g. receiving a 403 Forbidden error from the Github API. -If the incremental argument is used, this will result in the next backup only requesting API data since the last blocked/failed run. Potentially causing unexpected large amounts of missing data. +If the incremental argument is used, per-resource checkpoints are only advanced after that resource's backup work completes. A blocking error can still abort the overall run, but repositories and resources that were not processed will keep their previous checkpoints. It's therefore recommended to only use the incremental argument if the output/result is being actively monitored, or complimented with periodic full non-incremental runs, to avoid unexpected missing data in a regular backup runs. diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 054d0c6..e56bb28 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -1928,26 +1928,138 @@ def filter_repositories(args, unfiltered_repositories): return repositories +INCREMENTAL_LAST_UPDATE_FILENAME = "last_update" +INCREMENTAL_RESOURCE_DIRECTORIES = ("issues", "pulls") + + +def get_repository_checkpoint_time(repository): + timestamps = [ + timestamp + for timestamp in (repository.get("updated_at"), repository.get("pushed_at")) + if timestamp + ] + if timestamps: + return max(timestamps) + + return time.strftime("%Y-%m-%dT%H:%M:%SZ", time.localtime()) + + +def resource_backup_exists(resource_cwd): + if not os.path.isdir(resource_cwd): + return False + + ignored_names = { + INCREMENTAL_LAST_UPDATE_FILENAME, + PULL_REVIEWS_LAST_UPDATE_FILENAME, + } + for name in os.listdir(resource_cwd): + if name in ignored_names or name.endswith(".temp"): + continue + return True + + return False + + +def read_legacy_last_update(args, output_directory): + if not args.incremental: + return None, None + + last_update_path = os.path.join(output_directory, INCREMENTAL_LAST_UPDATE_FILENAME) + if os.path.exists(last_update_path): + return last_update_path, open(last_update_path).read().strip() + + return last_update_path, None + + +def read_resource_last_update(args, resource_cwd, legacy_last_update=None): + if not args.incremental: + return None + + last_update_path = os.path.join(resource_cwd, INCREMENTAL_LAST_UPDATE_FILENAME) + if os.path.exists(last_update_path): + return open(last_update_path).read().strip() + + if legacy_last_update and resource_backup_exists(resource_cwd): + return legacy_last_update + + return None + + +def write_resource_last_update(args, resource_cwd, repository): + if not args.incremental: + return + + mkdir_p(resource_cwd) + last_update_path = os.path.join(resource_cwd, INCREMENTAL_LAST_UPDATE_FILENAME) + open(last_update_path, "w").write(get_repository_checkpoint_time(repository)) + + +def iter_incremental_resource_dirs(output_directory): + repositories_dir = os.path.join(output_directory, "repositories") + if os.path.isdir(repositories_dir): + for repository_name in os.listdir(repositories_dir): + repo_cwd = os.path.join(repositories_dir, repository_name) + if not os.path.isdir(repo_cwd): + continue + for resource_name in INCREMENTAL_RESOURCE_DIRECTORIES: + yield os.path.join(repo_cwd, resource_name) + + starred_dir = os.path.join(output_directory, "starred") + if os.path.isdir(starred_dir): + for owner_name in os.listdir(starred_dir): + owner_cwd = os.path.join(starred_dir, owner_name) + if not os.path.isdir(owner_cwd): + continue + for repository_name in os.listdir(owner_cwd): + repo_cwd = os.path.join(owner_cwd, repository_name) + if not os.path.isdir(repo_cwd): + continue + for resource_name in INCREMENTAL_RESOURCE_DIRECTORIES: + yield os.path.join(repo_cwd, resource_name) + + +def has_unmigrated_incremental_resources(output_directory): + for resource_cwd in iter_incremental_resource_dirs(output_directory): + last_update_path = os.path.join( + resource_cwd, INCREMENTAL_LAST_UPDATE_FILENAME + ) + if resource_backup_exists(resource_cwd) and not os.path.exists( + last_update_path + ): + return True + + return False + + +def remove_legacy_last_update_if_migrated( + args, output_directory, legacy_last_update_path +): + if not args.incremental or not legacy_last_update_path: + return + if not os.path.exists(legacy_last_update_path): + return + if has_unmigrated_incremental_resources(output_directory): + logger.info( + "Keeping legacy global last_update until all existing issue/pull " + "backups have per-resource checkpoints" + ) + return + + os.remove(legacy_last_update_path) + logger.info( + "Removed legacy global last_update after migrating incremental checkpoints" + ) + + def backup_repositories(args, output_directory, repositories): logger.info("Backing up repositories") repos_template = "https://{0}/repos".format(get_github_api_host(args)) + legacy_last_update_path, legacy_last_update = read_legacy_last_update( + args, output_directory + ) + incremental_resource_work_attempted = False - if args.incremental: - last_update_path = os.path.join(output_directory, "last_update") - if os.path.exists(last_update_path): - args.since = open(last_update_path).read().strip() - else: - args.since = None - else: - args.since = None - - last_update = "0000-00-00T00:00:00Z" for repository in repositories: - if repository.get("updated_at") and repository["updated_at"] > last_update: - last_update = repository["updated_at"] - elif repository.get("pushed_at") and repository["pushed_at"] > last_update: - last_update = repository["pushed_at"] - if repository.get("is_gist"): repo_cwd = os.path.join(output_directory, "gists", repository["id"]) elif repository.get("is_starred"): @@ -2010,10 +2122,22 @@ def backup_repositories(args, output_directory, repositories): no_prune=args.no_prune, ) if args.include_issues or args.include_everything: + incremental_resource_work_attempted = True + issue_cwd = os.path.join(repo_cwd, "issues") + args.since = read_resource_last_update( + args, issue_cwd, legacy_last_update + ) backup_issues(args, repo_cwd, repository, repos_template) + write_resource_last_update(args, issue_cwd, repository) if args.include_pulls or args.include_everything: + incremental_resource_work_attempted = True + pulls_cwd = os.path.join(repo_cwd, "pulls") + args.since = read_resource_last_update( + args, pulls_cwd, legacy_last_update + ) backup_pulls(args, repo_cwd, repository, repos_template) + write_resource_last_update(args, pulls_cwd, repository) if args.include_discussions or args.include_everything: backup_discussions(args, repo_cwd, repository) @@ -2021,7 +2145,9 @@ def backup_repositories(args, output_directory, repositories): if args.include_milestones or args.include_everything: backup_milestones(args, repo_cwd, repository, repos_template) - if args.include_security_advisories or (args.include_everything and not repository.get("private", False)): + if args.include_security_advisories or ( + args.include_everything and not repository.get("private", False) + ): backup_security_advisories(args, repo_cwd, repository, repos_template) if args.include_labels or args.include_everything: @@ -2045,11 +2171,10 @@ def backup_repositories(args, output_directory, repositories): logger.info(f"Skipping remaining resources for {repository['full_name']}") continue - if args.incremental: - if last_update == "0000-00-00T00:00:00Z": - last_update = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.localtime()) - - open(last_update_path, "w").write(last_update) + if incremental_resource_work_attempted: + remove_legacy_last_update_if_migrated( + args, output_directory, legacy_last_update_path + ) def _repository_owner_name(repository): diff --git a/tests/test_incremental_per_repository.py b/tests/test_incremental_per_repository.py new file mode 100644 index 0000000..f1fd67a --- /dev/null +++ b/tests/test_incremental_per_repository.py @@ -0,0 +1,189 @@ +"""Tests for per-resource incremental checkpoints.""" + +import json +import os + +from github_backup import github_backup + + +def _repo(name, updated_at, pushed_at=None): + return { + "name": name, + "full_name": "owner/{0}".format(name), + "owner": {"login": "owner"}, + "clone_url": "https://github.com/owner/{0}.git".format(name), + "private": False, + "fork": False, + "has_wiki": False, + "updated_at": updated_at, + "pushed_at": pushed_at, + } + + +def test_incremental_uses_per_resource_last_update( + create_args, tmp_path, monkeypatch +): + args = create_args(incremental=True, include_issues=True) + repositories = [ + _repo("repo-one", "2026-02-01T00:00:00Z"), + _repo("repo-two", "2026-03-01T00:00:00Z"), + ] + repo_one_issues = tmp_path / "repositories" / "repo-one" / "issues" + repo_two_issues = tmp_path / "repositories" / "repo-two" / "issues" + repo_one_issues.mkdir(parents=True) + repo_two_issues.mkdir(parents=True) + (repo_one_issues / "last_update").write_text("2026-01-01T00:00:00Z") + (repo_two_issues / "last_update").write_text("2025-01-01T00:00:00Z") + + seen_since = [] + + def fake_backup_issues(passed_args, repo_cwd, repository, repos_template): + seen_since.append((repository["name"], passed_args.since)) + + monkeypatch.setattr(github_backup, "backup_issues", fake_backup_issues) + + github_backup.backup_repositories(args, tmp_path, repositories) + + assert seen_since == [ + ("repo-one", "2026-01-01T00:00:00Z"), + ("repo-two", "2025-01-01T00:00:00Z"), + ] + assert (repo_one_issues / "last_update").read_text() == "2026-02-01T00:00:00Z" + assert (repo_two_issues / "last_update").read_text() == "2026-03-01T00:00:00Z" + assert not os.path.exists(tmp_path / "last_update") + + +def test_incremental_uses_independent_issue_and_pull_checkpoints( + create_args, tmp_path, monkeypatch +): + args = create_args(incremental=True, include_issues=True, include_pulls=True) + repository = _repo("repo-one", "2026-02-01T00:00:00Z") + repo_dir = tmp_path / "repositories" / "repo-one" + issues_dir = repo_dir / "issues" + pulls_dir = repo_dir / "pulls" + issues_dir.mkdir(parents=True) + pulls_dir.mkdir(parents=True) + (issues_dir / "last_update").write_text("2026-01-01T00:00:00Z") + (pulls_dir / "last_update").write_text("2025-01-01T00:00:00Z") + + seen_since = [] + + def fake_backup_issues(passed_args, repo_cwd, repository, repos_template): + seen_since.append(("issues", passed_args.since)) + + def fake_backup_pulls(passed_args, repo_cwd, repository, repos_template): + seen_since.append(("pulls", passed_args.since)) + + monkeypatch.setattr(github_backup, "backup_issues", fake_backup_issues) + monkeypatch.setattr(github_backup, "backup_pulls", fake_backup_pulls) + + github_backup.backup_repositories(args, tmp_path, [repository]) + + assert seen_since == [ + ("issues", "2026-01-01T00:00:00Z"), + ("pulls", "2025-01-01T00:00:00Z"), + ] + assert (issues_dir / "last_update").read_text() == "2026-02-01T00:00:00Z" + assert (pulls_dir / "last_update").read_text() == "2026-02-01T00:00:00Z" + + +def test_incremental_uses_legacy_global_last_update_for_existing_resource_backup( + create_args, tmp_path, monkeypatch +): + args = create_args(incremental=True, include_issues=True) + repository = _repo("repo-one", "2026-02-01T00:00:00Z") + (tmp_path / "last_update").write_text("2026-01-01T00:00:00Z") + issues_dir = tmp_path / "repositories" / "repo-one" / "issues" + issues_dir.mkdir(parents=True) + with open(issues_dir / "1.json", "w", encoding="utf-8") as f: + json.dump({"number": 1}, f) + + seen_since = [] + + def fake_backup_issues(passed_args, repo_cwd, repository, repos_template): + seen_since.append(passed_args.since) + + monkeypatch.setattr(github_backup, "backup_issues", fake_backup_issues) + + github_backup.backup_repositories(args, tmp_path, [repository]) + + assert seen_since == ["2026-01-01T00:00:00Z"] + assert (issues_dir / "last_update").read_text() == "2026-02-01T00:00:00Z" + assert not os.path.exists(tmp_path / "last_update") + + +def test_incremental_does_not_use_legacy_global_last_update_for_new_resource_backup( + create_args, tmp_path, monkeypatch +): + args = create_args(incremental=True, include_issues=True) + repository = _repo("repo-one", "2026-02-01T00:00:00Z") + (tmp_path / "last_update").write_text("2099-01-01T00:00:00Z") + + seen_since = [] + + def fake_backup_issues(passed_args, repo_cwd, repository, repos_template): + seen_since.append(passed_args.since) + + monkeypatch.setattr(github_backup, "backup_issues", fake_backup_issues) + + github_backup.backup_repositories(args, tmp_path, [repository]) + + assert seen_since == [None] + assert ( + tmp_path / "repositories" / "repo-one" / "issues" / "last_update" + ).read_text() == "2026-02-01T00:00:00Z" + assert not os.path.exists(tmp_path / "last_update") + + +def test_incremental_keeps_legacy_global_last_update_until_all_existing_resources_migrated( + create_args, tmp_path, monkeypatch +): + args = create_args(incremental=True, include_issues=True) + repository = _repo("repo-one", "2026-02-01T00:00:00Z") + (tmp_path / "last_update").write_text("2026-01-01T00:00:00Z") + repo_one_issues = tmp_path / "repositories" / "repo-one" / "issues" + repo_two_issues = tmp_path / "repositories" / "repo-two" / "issues" + repo_one_issues.mkdir(parents=True) + repo_two_issues.mkdir(parents=True) + with open(repo_one_issues / "1.json", "w", encoding="utf-8") as f: + json.dump({"number": 1}, f) + with open(repo_two_issues / "2.json", "w", encoding="utf-8") as f: + json.dump({"number": 2}, f) + + def fake_backup_issues(passed_args, repo_cwd, repository, repos_template): + pass + + monkeypatch.setattr(github_backup, "backup_issues", fake_backup_issues) + + github_backup.backup_repositories(args, tmp_path, [repository]) + + assert (repo_one_issues / "last_update").read_text() == "2026-02-01T00:00:00Z" + assert not os.path.exists(repo_two_issues / "last_update") + assert (tmp_path / "last_update").read_text() == "2026-01-01T00:00:00Z" + + +def test_incremental_does_not_remove_legacy_checkpoint_without_resource_work( + create_args, tmp_path +): + args = create_args(incremental=True, include_repository=True) + repository = _repo("repo-one", "2026-02-01T00:00:00Z") + (tmp_path / "last_update").write_text("2026-01-01T00:00:00Z") + + github_backup.backup_repositories(args, tmp_path, [repository]) + + assert (tmp_path / "last_update").read_text() == "2026-01-01T00:00:00Z" + assert not os.path.exists( + tmp_path / "repositories" / "repo-one" / "issues" / "last_update" + ) + + +def test_repository_checkpoint_time_uses_newest_available_repo_timestamp(): + repository = _repo( + "repo-one", + updated_at="2026-02-01T00:00:00Z", + pushed_at="2026-03-01T00:00:00Z", + ) + + assert github_backup.get_repository_checkpoint_time(repository) == ( + "2026-03-01T00:00:00Z" + ) From 6cd0ab3633df812ab586968b5b2e448e0e1b3efc Mon Sep 17 00:00:00 2001 From: Duncan Ogilvie Date: Sun, 26 Apr 2026 15:15:22 +0200 Subject: [PATCH 128/148] Reduce unnecessary pull requests with incremental fetching --- CHANGES.rst | 2 + github_backup/github_backup.py | 18 +++-- tests/test_pull_incremental_pagination.py | 85 +++++++++++++++++++++++ tests/test_pull_reviews.py | 10 +-- 4 files changed, 104 insertions(+), 11 deletions(-) create mode 100644 tests/test_pull_incremental_pagination.py diff --git a/CHANGES.rst b/CHANGES.rst index 6cf9f17..8b62d33 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -12,6 +12,8 @@ Unreleased backups use the legacy global checkpoint as a migration fallback, and the legacy file is removed once existing issue/pull backups have resource checkpoints (#62). +- Stop paginating pull requests during incremental backups once the sorted + results are older than the active checkpoint. - Add ``--token-from-gh`` to read authentication from ``gh auth token``. diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index e56bb28..f83bdb3 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -717,11 +717,12 @@ def calculate_retry_delay(attempt, headers): return delay + random.uniform(0, delay * 0.1) -def retrieve_data(args, template, query_args=None, paginated=True): +def retrieve_data(args, template, query_args=None, paginated=True, lazy=False): """ Fetch the data from GitHub API. - Handle both single requests and pagination with yield of individual dicts. + Handle both single requests and pagination. Returns a list by default, or + a generator when lazy=True so callers can stop before fetching every page. Handles throttling, retries, read errors, and DMCA takedowns. """ query_args = query_args or {} @@ -851,6 +852,9 @@ def _extract_legal_url(response_body_bytes): ): break # No more data + if lazy: + return fetch_all() + return list(fetch_all()) @@ -2656,16 +2660,18 @@ def pull_is_due_for_repository_checkpoint(pull): pull_states = ["open", "closed"] for pull_state in pull_states: query_args["state"] = pull_state - _pulls = retrieve_data(args, _pulls_template, query_args=query_args) - for pull in _pulls: + for pull in retrieve_data( + args, _pulls_template, query_args=query_args, lazy=True + ): track_newest_pull_update(pull) if pulls_since and pull["updated_at"] < pulls_since: break if not pulls_since or pull["updated_at"] >= pulls_since: pulls[pull["number"]] = pull else: - _pulls = retrieve_data(args, _pulls_template, query_args=query_args) - for pull in _pulls: + for pull in retrieve_data( + args, _pulls_template, query_args=query_args, lazy=True + ): track_newest_pull_update(pull) if pulls_since and pull["updated_at"] < pulls_since: break diff --git a/tests/test_pull_incremental_pagination.py b/tests/test_pull_incremental_pagination.py new file mode 100644 index 0000000..11230b0 --- /dev/null +++ b/tests/test_pull_incremental_pagination.py @@ -0,0 +1,85 @@ +"""Tests for incremental pull request pagination.""" + +import json +import os +from unittest.mock import patch + +from github_backup import github_backup + + +class MockHTTPResponse: + def __init__(self, data, link_header=None): + self._content = json.dumps(data).encode("utf-8") + self._link_header = link_header + self._read = False + self.reason = "OK" + + def getcode(self): + return 200 + + def read(self): + if self._read: + return b"" + self._read = True + return self._content + + @property + def headers(self): + headers = {"x-ratelimit-remaining": "5000"} + if self._link_header: + headers["Link"] = self._link_header + return headers + + +def test_backup_pulls_incremental_stops_before_fetching_old_pages( + create_args, tmp_path +): + args = create_args(include_pulls=True, incremental=True) + args.since = "2026-04-26T08:13:46Z" + repository = {"full_name": "owner/repo"} + + responses = [ + MockHTTPResponse([]), + MockHTTPResponse( + [ + { + "number": 2, + "title": "new pull", + "updated_at": "2026-04-26T09:00:00Z", + }, + { + "number": 1, + "title": "old pull", + "updated_at": "2026-04-26T07:00:00Z", + }, + ], + link_header='; rel="next"', + ), + MockHTTPResponse( + [ + { + "number": 0, + "title": "older pull on page 2", + "updated_at": "2026-04-25T07:00:00Z", + } + ] + ), + ] + requests_made = [] + + def mock_urlopen(request, *args, **kwargs): + requests_made.append(request.get_full_url()) + return responses[len(requests_made) - 1] + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + github_backup.backup_pulls( + args, tmp_path, repository, "https://api.github.com/repos" + ) + + assert len(requests_made) == 2 + assert "state=open" in requests_made[0] + assert "state=closed" in requests_made[1] + assert all("page=2" not in url for url in requests_made) + assert os.path.exists(tmp_path / "pulls" / "2.json") + assert not os.path.exists(tmp_path / "pulls" / "1.json") + assert not os.path.exists(tmp_path / "pulls" / "0.json") diff --git a/tests/test_pull_reviews.py b/tests/test_pull_reviews.py index 6130269..2ce9ad1 100644 --- a/tests/test_pull_reviews.py +++ b/tests/test_pull_reviews.py @@ -16,7 +16,7 @@ def test_backup_pulls_includes_review_data(create_args, tmp_path, monkeypatch): repository = {"full_name": "owner/repo"} calls = [] - def fake_retrieve_data(passed_args, template, query_args=None, paginated=True): + def fake_retrieve_data(passed_args, template, query_args=None, paginated=True, **kwargs): calls.append((template, query_args)) if template == "https://api.github.com/repos/owner/repo/pulls": if query_args["state"] == "open": @@ -73,7 +73,7 @@ def test_pull_reviews_backfill_ignores_repository_checkpoint( args.since = "2026-01-01T00:00:00Z" repository = {"full_name": "owner/repo"} - def fake_retrieve_data(passed_args, template, query_args=None, paginated=True): + def fake_retrieve_data(passed_args, template, query_args=None, paginated=True, **kwargs): if template == "https://api.github.com/repos/owner/repo/pulls": if query_args["state"] == "open": return [ @@ -117,7 +117,7 @@ def test_pull_reviews_uses_review_checkpoint_when_older_than_repository_checkpoi pulls_dir.mkdir() (pulls_dir / "reviews_last_update").write_text("2025-01-01T00:00:00Z") - def fake_retrieve_data(passed_args, template, query_args=None, paginated=True): + def fake_retrieve_data(passed_args, template, query_args=None, paginated=True, **kwargs): if template == "https://api.github.com/repos/owner/repo/pulls": if query_args["state"] == "open": return [ @@ -169,7 +169,7 @@ def test_pull_reviews_preserves_existing_optional_pull_data( f, ) - def fake_retrieve_data(passed_args, template, query_args=None, paginated=True): + def fake_retrieve_data(passed_args, template, query_args=None, paginated=True, **kwargs): if template == "https://api.github.com/repos/owner/repo/pulls": if query_args["state"] == "open": return [ @@ -213,7 +213,7 @@ def test_pull_reviews_does_not_advance_checkpoint_on_review_error( pulls_dir.mkdir() (pulls_dir / "reviews_last_update").write_text("2025-01-01T00:00:00Z") - def fake_retrieve_data(passed_args, template, query_args=None, paginated=True): + def fake_retrieve_data(passed_args, template, query_args=None, paginated=True, **kwargs): if template == "https://api.github.com/repos/owner/repo/pulls": if query_args["state"] == "open": return [ From 9d0cfdb61da1cea97b381c2177ccc4e52e9a6352 Mon Sep 17 00:00:00 2001 From: Duncan Ogilvie Date: Sun, 26 Apr 2026 16:05:20 +0200 Subject: [PATCH 129/148] Avoid redundant release asset list requests --- CHANGES.rst | 2 + github_backup/github_backup.py | 7 ++- tests/test_releases.py | 95 ++++++++++++++++++++++++++++++++++ 3 files changed, 103 insertions(+), 1 deletion(-) create mode 100644 tests/test_releases.py diff --git a/CHANGES.rst b/CHANGES.rst index 8b62d33..3d2ceb0 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -14,6 +14,8 @@ Unreleased checkpoints (#62). - Stop paginating pull requests during incremental backups once the sorted results are older than the active checkpoint. +- Avoid extra release asset list requests by using asset metadata already + included in GitHub's releases response. - Add ``--token-from-gh`` to read authentication from ``gh auth token``. diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index f83bdb3..6edfb05 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -2919,7 +2919,12 @@ def backup_releases(args, repo_cwd, repository, repos_template, include_assets=F written_count += 1 if include_assets and not skip_assets: - assets = retrieve_data(args, release["assets_url"]) + # The releases list API already includes release asset metadata. Use + # it to avoid an extra /releases/{id}/assets request per release. + # Keep a fallback for older/enterprise responses that might omit it. + assets = release.get("assets") + if assets is None: + assets = retrieve_data(args, release["assets_url"]) if len(assets) > 0: # give release asset files somewhere to live & download them (not including source archives) release_assets_cwd = os.path.join(release_cwd, release_name_safe) diff --git a/tests/test_releases.py b/tests/test_releases.py new file mode 100644 index 0000000..b8584f4 --- /dev/null +++ b/tests/test_releases.py @@ -0,0 +1,95 @@ +"""Tests for release backup behavior.""" + +from github_backup import github_backup + + +def test_backup_releases_uses_embedded_assets_without_extra_asset_list_request( + create_args, tmp_path, monkeypatch +): + args = create_args(include_releases=True, include_assets=True) + repository = {"full_name": "owner/repo", "name": "repo"} + calls = [] + downloads = [] + + def fake_retrieve_data(passed_args, template, query_args=None, paginated=True, **kwargs): + calls.append(template) + if template == "https://api.github.com/repos/owner/repo/releases": + return [ + { + "tag_name": "v1.0.0", + "created_at": "2026-01-01T00:00:00Z", + "updated_at": "2026-01-01T00:00:00Z", + "prerelease": False, + "draft": False, + "assets_url": "https://api.github.com/repos/owner/repo/releases/1/assets", + "assets": [ + { + "name": "artifact.zip", + "url": "https://api.github.com/repos/owner/repo/releases/assets/1", + } + ], + } + ] + raise AssertionError("Unexpected API request: {0}".format(template)) + + def fake_download_file(url, path, auth, as_app=False, fine=False): + downloads.append((url, path)) + + monkeypatch.setattr(github_backup, "retrieve_data", fake_retrieve_data) + monkeypatch.setattr(github_backup, "download_file", fake_download_file) + + github_backup.backup_releases( + args, + tmp_path, + repository, + "https://api.github.com/repos", + include_assets=True, + ) + + assert calls == ["https://api.github.com/repos/owner/repo/releases"] + assert downloads == [ + ( + "https://api.github.com/repos/owner/repo/releases/assets/1", + str(tmp_path / "releases" / "v1.0.0" / "artifact.zip"), + ) + ] + + +def test_backup_releases_falls_back_to_assets_url_when_assets_missing( + create_args, tmp_path, monkeypatch +): + args = create_args(include_releases=True, include_assets=True) + repository = {"full_name": "owner/repo", "name": "repo"} + calls = [] + + def fake_retrieve_data(passed_args, template, query_args=None, paginated=True, **kwargs): + calls.append(template) + if template == "https://api.github.com/repos/owner/repo/releases": + return [ + { + "tag_name": "v1.0.0", + "created_at": "2026-01-01T00:00:00Z", + "updated_at": "2026-01-01T00:00:00Z", + "prerelease": False, + "draft": False, + "assets_url": "https://api.github.com/repos/owner/repo/releases/1/assets", + } + ] + if template == "https://api.github.com/repos/owner/repo/releases/1/assets": + return [] + raise AssertionError("Unexpected API request: {0}".format(template)) + + monkeypatch.setattr(github_backup, "retrieve_data", fake_retrieve_data) + + github_backup.backup_releases( + args, + tmp_path, + repository, + "https://api.github.com/repos", + include_assets=True, + ) + + assert calls == [ + "https://api.github.com/repos/owner/repo/releases", + "https://api.github.com/repos/owner/repo/releases/1/assets", + ] From 014eff395a999f82674547efd77a6470b038ce91 Mon Sep 17 00:00:00 2001 From: Duncan Ogilvie Date: Sun, 26 Apr 2026 16:09:42 +0200 Subject: [PATCH 130/148] Skip checkpoint-equal incremental items --- CHANGES.rst | 4 +- github_backup/github_backup.py | 12 +++--- tests/test_discussions.py | 35 +++++++++++++++++ tests/test_pull_incremental_pagination.py | 46 +++++++++++++++++++++++ 4 files changed, 90 insertions(+), 7 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 3d2ceb0..3d4cdce 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -13,7 +13,9 @@ Unreleased legacy file is removed once existing issue/pull backups have resource checkpoints (#62). - Stop paginating pull requests during incremental backups once the sorted - results are older than the active checkpoint. + results are at or older than the active checkpoint. +- Avoid re-fetching discussions and pull requests whose ``updated_at`` exactly + matches the active incremental checkpoint. - Avoid extra release asset list requests by using asset metadata already included in GitHub's releases response. - Add ``--token-from-gh`` to read authentication from ``gh auth token``. diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 6edfb05..ae4ef2e 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -2233,7 +2233,7 @@ def retrieve_discussion_summaries(args, repository, since=None): if updated_at and (newest_seen is None or updated_at > newest_seen): newest_seen = updated_at - if since and updated_at and updated_at < since: + if since and updated_at and updated_at <= since: stop = True break @@ -2654,7 +2654,7 @@ def track_newest_pull_update(pull): newest_pull_update = updated_at def pull_is_due_for_repository_checkpoint(pull): - return not repository_since or pull["updated_at"] >= repository_since + return not repository_since or pull["updated_at"] > repository_since if not args.include_pull_details: pull_states = ["open", "closed"] @@ -2664,18 +2664,18 @@ def pull_is_due_for_repository_checkpoint(pull): args, _pulls_template, query_args=query_args, lazy=True ): track_newest_pull_update(pull) - if pulls_since and pull["updated_at"] < pulls_since: + if pulls_since and pull["updated_at"] <= pulls_since: break - if not pulls_since or pull["updated_at"] >= pulls_since: + if not pulls_since or pull["updated_at"] > pulls_since: pulls[pull["number"]] = pull else: for pull in retrieve_data( args, _pulls_template, query_args=query_args, lazy=True ): track_newest_pull_update(pull) - if pulls_since and pull["updated_at"] < pulls_since: + if pulls_since and pull["updated_at"] <= pulls_since: break - if not pulls_since or pull["updated_at"] >= pulls_since: + if not pulls_since or pull["updated_at"] > pulls_since: if pull_is_due_for_repository_checkpoint(pull): pulls[pull["number"]] = retrieve_data( args, diff --git a/tests/test_discussions.py b/tests/test_discussions.py index 89fd8dd..2b5e3fb 100644 --- a/tests/test_discussions.py +++ b/tests/test_discussions.py @@ -50,6 +50,41 @@ def test_retrieve_discussion_summaries_stops_at_incremental_since(create_args): ) +def test_retrieve_discussion_summaries_excludes_checkpoint_timestamp(create_args): + args = create_args() + repository = {"full_name": "owner/repo"} + + page = { + "repository": { + "hasDiscussionsEnabled": True, + "discussions": { + "totalCount": 1, + "nodes": [ + { + "number": 1, + "title": "already backed up", + "updatedAt": "2026-01-01T00:00:00Z", + }, + ], + "pageInfo": {"hasNextPage": True, "endCursor": "NEXT"}, + }, + } + } + + with patch( + "github_backup.github_backup.retrieve_graphql_data", return_value=page + ) as mock_retrieve: + summaries, newest, enabled, total = github_backup.retrieve_discussion_summaries( + args, repository, since="2026-01-01T00:00:00Z" + ) + + assert enabled is True + assert total == 1 + assert newest == "2026-01-01T00:00:00Z" + assert summaries == [] + assert mock_retrieve.call_count == 1 + + def test_retrieve_discussion_summaries_disabled_discussions(create_args): args = create_args() repository = {"full_name": "owner/repo"} diff --git a/tests/test_pull_incremental_pagination.py b/tests/test_pull_incremental_pagination.py index 11230b0..ac0f83f 100644 --- a/tests/test_pull_incremental_pagination.py +++ b/tests/test_pull_incremental_pagination.py @@ -31,6 +31,52 @@ def headers(self): return headers +def test_backup_pulls_incremental_excludes_checkpoint_timestamp(create_args, tmp_path): + args = create_args(include_pulls=True, incremental=True) + args.since = "2026-04-26T08:13:46Z" + repository = {"full_name": "owner/repo"} + + responses = [ + MockHTTPResponse([]), + MockHTTPResponse( + [ + { + "number": 1, + "title": "already backed up", + "updated_at": "2026-04-26T08:13:46Z", + }, + ], + link_header='; rel="next"', + ), + MockHTTPResponse( + [ + { + "number": 0, + "title": "older pull on page 2", + "updated_at": "2026-04-25T07:00:00Z", + } + ] + ), + ] + requests_made = [] + + def mock_urlopen(request, *args, **kwargs): + requests_made.append(request.get_full_url()) + return responses[len(requests_made) - 1] + + with patch("github_backup.github_backup.urlopen", side_effect=mock_urlopen): + github_backup.backup_pulls( + args, tmp_path, repository, "https://api.github.com/repos" + ) + + assert len(requests_made) == 2 + assert "state=open" in requests_made[0] + assert "state=closed" in requests_made[1] + assert all("page=2" not in url for url in requests_made) + assert not os.path.exists(tmp_path / "pulls" / "1.json") + assert not os.path.exists(tmp_path / "pulls" / "0.json") + + def test_backup_pulls_incremental_stops_before_fetching_old_pages( create_args, tmp_path ): From f8cdf55050770bbcb1b5ba178d73b346988f0f89 Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Wed, 29 Apr 2026 12:10:11 +0000 Subject: [PATCH 131/148] Release version 0.62.0 --- CHANGES.rst | 172 +++++++++++++++++++++++++++++++++----- github_backup/__init__.py | 2 +- 2 files changed, 154 insertions(+), 20 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 3d4cdce..86bcb32 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,29 +1,163 @@ Changelog ========= -Unreleased ----------- -- Add GitHub Discussions backups via GraphQL, including comments, replies, - optional attachment downloads, and per-repository incremental checkpoints. -- Add pull request review backups with ``--pull-reviews`` and one-time - incremental backfill for existing backups. -- Store incremental ``last_update`` checkpoints per repository resource instead - of using one global checkpoint for the whole output directory. Existing - backups use the legacy global checkpoint as a migration fallback, and the - legacy file is removed once existing issue/pull backups have resource - checkpoints (#62). -- Stop paginating pull requests during incremental backups once the sorted - results are at or older than the active checkpoint. -- Avoid re-fetching discussions and pull requests whose ``updated_at`` exactly - matches the active incremental checkpoint. -- Avoid extra release asset list requests by using asset metadata already - included in GitHub's releases response. -- Add ``--token-from-gh`` to read authentication from ``gh auth token``. +0.62.0 (2026-04-29) +------------------- +------------------------ +- Skip checkpoint-equal incremental items. [Duncan Ogilvie] +- Avoid redundant release asset list requests. [Duncan Ogilvie] +- Reduce unnecessary pull requests with incremental fetching. [Duncan + Ogilvie] +- Implement per-resource last_update timestamps. [Duncan Ogilvie] + + Closes #62 +- Add support for pull request reviews. [Duncan Ogilvie] + + Closes #124 +- Add support for discussions. [Duncan Ogilvie] + + Closes #290 +- Add --token-from-gh authentication option. [Duncan Ogilvie] +- Chore(deps): bump pytest in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [pytest](https://github.com/pytest-dev/pytest). + + + Updates `pytest` from 9.0.2 to 9.0.3 + - [Release notes](https://github.com/pytest-dev/pytest/releases) + - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) + - [Commits](https://github.com/pytest-dev/pytest/compare/9.0.2...9.0.3) + + --- + updated-dependencies: + - dependency-name: pytest + dependency-version: 9.0.3 + dependency-type: direct:production + update-type: version-update:semver-patch + dependency-group: python-packages + ... +- Chore(deps): bump black in the python-packages group. + [dependabot[bot]] + + Bumps the python-packages group with 1 update: [black](https://github.com/psf/black). + + + Updates `black` from 26.3.0 to 26.3.1 + - [Release notes](https://github.com/psf/black/releases) + - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) + - [Commits](https://github.com/psf/black/compare/26.3.0...26.3.1) + + --- + updated-dependencies: + - dependency-name: black + dependency-version: 26.3.1 + dependency-type: direct:production + update-type: version-update:semver-patch + dependency-group: python-packages + ... +- Chore(deps): bump docker/login-action from 3 to 4. [dependabot[bot]] + + Bumps [docker/login-action](https://github.com/docker/login-action) from 3 to 4. + - [Release notes](https://github.com/docker/login-action/releases) + - [Commits](https://github.com/docker/login-action/compare/v3...v4) + + --- + updated-dependencies: + - dependency-name: docker/login-action + dependency-version: '4' + dependency-type: direct:production + update-type: version-update:semver-major + ... +- Chore(deps): bump docker/setup-qemu-action from 3 to 4. + [dependabot[bot]] + + Bumps [docker/setup-qemu-action](https://github.com/docker/setup-qemu-action) from 3 to 4. + - [Release notes](https://github.com/docker/setup-qemu-action/releases) + - [Commits](https://github.com/docker/setup-qemu-action/compare/v3...v4) + + --- + updated-dependencies: + - dependency-name: docker/setup-qemu-action + dependency-version: '4' + dependency-type: direct:production + update-type: version-update:semver-major + ... +- Chore(deps): bump docker/build-push-action from 6 to 7. + [dependabot[bot]] + + Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 6 to 7. + - [Release notes](https://github.com/docker/build-push-action/releases) + - [Commits](https://github.com/docker/build-push-action/compare/v6...v7) + + --- + updated-dependencies: + - dependency-name: docker/build-push-action + dependency-version: '7' + dependency-type: direct:production + update-type: version-update:semver-major + ... +- Chore(deps): bump docker/setup-buildx-action from 3 to 4. + [dependabot[bot]] + + Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 3 to 4. + - [Release notes](https://github.com/docker/setup-buildx-action/releases) + - [Commits](https://github.com/docker/setup-buildx-action/compare/v3...v4) + + --- + updated-dependencies: + - dependency-name: docker/setup-buildx-action + dependency-version: '4' + dependency-type: direct:production + update-type: version-update:semver-major + ... +- Chore(deps): bump docker/metadata-action from 5 to 6. + [dependabot[bot]] + + Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 5 to 6. + - [Release notes](https://github.com/docker/metadata-action/releases) + - [Commits](https://github.com/docker/metadata-action/compare/v5...v6) + + --- + updated-dependencies: + - dependency-name: docker/metadata-action + dependency-version: '6' + dependency-type: direct:production + update-type: version-update:semver-major + ... +- Chore(deps): bump the python-packages group with 2 updates. + [dependabot[bot]] + + Bumps the python-packages group with 2 updates: [black](https://github.com/psf/black) and [setuptools](https://github.com/pypa/setuptools). + + + Updates `black` from 26.1.0 to 26.3.0 + - [Release notes](https://github.com/psf/black/releases) + - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) + - [Commits](https://github.com/psf/black/compare/26.1.0...26.3.0) + + Updates `setuptools` from 82.0.0 to 82.0.1 + - [Release notes](https://github.com/pypa/setuptools/releases) + - [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst) + - [Commits](https://github.com/pypa/setuptools/compare/v82.0.0...v82.0.1) + + --- + updated-dependencies: + - dependency-name: black + dependency-version: 26.3.0 + dependency-type: direct:production + update-type: version-update:semver-minor + dependency-group: python-packages + - dependency-name: setuptools + dependency-version: 82.0.1 + dependency-type: direct:production + update-type: version-update:semver-patch + dependency-group: python-packages + ... 0.61.5 (2026-02-18) ------------------- ------------------------- - Fix empty repository crash due to None timestamp comparison (#489) [Rodos] diff --git a/github_backup/__init__.py b/github_backup/__init__.py index 294be4d..647040d 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.61.5" +__version__ = "0.62.0" From 0638666bc7ebc9c55134648d0c4f3cb21932a680 Mon Sep 17 00:00:00 2001 From: Changaco Date: Fri, 10 Apr 2026 13:38:23 +0000 Subject: [PATCH 132/148] handle more network errors ```python-traceback Traceback (most recent call last): File ".local/bin/github-backup", line 6, in sys.exit(main()) ~~~~^^ File ".local/share/pipx/venvs/github-backup/lib/python3.14/site-packages/github_backup/cli.py", line 83, in main backup_repositories(args, output_directory, repositories) ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".local/share/pipx/venvs/github-backup/lib/python3.14/site-packages/github_backup/github_backup.py", line 1845, in backup_repositories backup_pulls(args, repo_cwd, repository, repos_template) ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".local/share/pipx/venvs/github-backup/lib/python3.14/site-packages/github_backup/github_backup.py", line 2019, in backup_pulls pulls[number]["commit_data"] = retrieve_data(args, template) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^ File ".local/share/pipx/venvs/github-backup/lib/python3.14/site-packages/github_backup/github_backup.py", line 766, in retrieve_data return list(fetch_all()) File ".local/share/pipx/venvs/github-backup/lib/python3.14/site-packages/github_backup/github_backup.py", line 717, in fetch_all response = json.loads(http_response.read().decode("utf-8")) ~~~~~~~~~~~~~~~~~~^^ File "/usr/lib/python3.14/http/client.py", line 500, in read s = self._safe_read(self.length) File "/usr/lib/python3.14/http/client.py", line 648, in _safe_read data = self.fp.read(cursize) File "/usr/lib/python3.14/socket.py", line 725, in readinto return self._sock.recv_into(b) ~~~~~~~~~~~~~~~~~~~~^^^ File "/usr/lib/python3.14/ssl.py", line 1304, in recv_into return self.read(nbytes, buffer) ~~~~~~~~~^^^^^^^^^^^^^^^^ File "/usr/lib/python3.14/ssl.py", line 1138, in read return self._sslobj.read(len, buffer) ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^ ConnectionResetError: [Errno 104] Connection reset by peer ``` --- github_backup/github_backup.py | 1 + 1 file changed, 1 insertion(+) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index ae4ef2e..73a8a75 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -806,6 +806,7 @@ def _extract_legal_url(response_body_bytes): response = json.loads(http_response.read().decode("utf-8")) break # Exit retry loop and handle the data returned except ( + ConnectionError, IncompleteRead, json.decoder.JSONDecodeError, TimeoutError, From ddf82f1115f7d635993aa44454fb58c034624272 Mon Sep 17 00:00:00 2001 From: Changaco Date: Fri, 10 Apr 2026 15:25:05 +0000 Subject: [PATCH 133/148] suppress output of call to `git lfs version` --- github_backup/github_backup.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index ae4ef2e..317a803 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -1781,7 +1781,10 @@ def get_authenticated_user(args): def check_git_lfs_install(): - exit_code = subprocess.call(["git", "lfs", "version"]) + exit_code = subprocess.call( + ["git", "lfs", "version"], + stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, + ) if exit_code != 0: raise Exception( "The argument --lfs requires you to have Git LFS installed.\nYou can get it from https://git-lfs.github.com." From ddf7f82e65e5e57f0d5c499ed6f56234cb686eb3 Mon Sep 17 00:00:00 2001 From: Changaco Date: Fri, 10 Apr 2026 13:46:44 +0000 Subject: [PATCH 134/148] add missing `context` argument to `urlopen` call --- github_backup/github_backup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index ae4ef2e..6670d2d 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -1297,7 +1297,7 @@ def get_jwt_signed_url_via_markdown_api(url, token, repo_context): request.add_header("Content-Type", "application/json") request.add_header("Accept", "application/vnd.github+json") - html = urlopen(request, timeout=30).read().decode("utf-8") + html = urlopen(request, context=https_ctx, timeout=30).read().decode("utf-8") # Parse JWT-signed URL from HTML response # Format: From 2f130ecd6692bf8bc6e51bade07b5f36e56181ff Mon Sep 17 00:00:00 2001 From: Changaco Date: Fri, 10 Apr 2026 13:54:13 +0000 Subject: [PATCH 135/148] remove bad invocation of the system shell --- github_backup/github_backup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 6670d2d..80689b8 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -2980,7 +2980,7 @@ def fetch_repository( masked_remote_url = mask_password(remote_url) initialized = subprocess.call( - "git ls-remote " + remote_url, stdout=FNULL, stderr=FNULL, shell=True + ["git", "ls-remote", remote_url], stdout=FNULL, stderr=FNULL ) if initialized == 128: if ".wiki.git" in remote_url: From b92aee6f114f98502fea616abeefbbe924229ff0 Mon Sep 17 00:00:00 2001 From: Changaco Date: Fri, 10 Apr 2026 15:12:13 +0000 Subject: [PATCH 136/148] use `subprocess.DEVNULL` instead of emulating it --- github_backup/github_backup.py | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 8b96622..990993b 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -40,7 +40,6 @@ DISCUSSION_REPLIES_QUERY, ) -FNULL = open(os.devnull, "w") FILE_URI_PREFIX = "file://" logger = logging.getLogger(__name__) @@ -529,19 +528,18 @@ def get_auth(args, encode=True, for_git_cli=False): if platform.system() != "Darwin": raise Exception("Keychain arguments are only supported on Mac OSX") try: - with open(os.devnull, "w") as devnull: - token = subprocess.check_output( - [ - "security", - "find-generic-password", - "-s", - args.osx_keychain_item_name, - "-a", - args.osx_keychain_item_account, - "-w", - ], - stderr=devnull, - ).strip() + token = subprocess.check_output( + [ + "security", + "find-generic-password", + "-s", + args.osx_keychain_item_name, + "-a", + args.osx_keychain_item_account, + "-w", + ], + stderr=subprocess.DEVNULL, + ).strip() token = token.decode("utf-8") auth = token + ":" + "x-oauth-basic" except subprocess.SubprocessError: @@ -2984,7 +2982,8 @@ def fetch_repository( masked_remote_url = mask_password(remote_url) initialized = subprocess.call( - ["git", "ls-remote", remote_url], stdout=FNULL, stderr=FNULL + ["git", "ls-remote", remote_url], + stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, ) if initialized == 128: if ".wiki.git" in remote_url: From f3eabf0bfe522b7749d693ceaa65c5de4f13d8bc Mon Sep 17 00:00:00 2001 From: Changaco Date: Fri, 10 Apr 2026 16:23:03 +0000 Subject: [PATCH 137/148] don't pass stdin when doing so can't do any good When the child process doesn't inherit stderr, it can't ask the user for input, so it shouldn't inherit stdin either. --- github_backup/github_backup.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 990993b..b76322a 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -1781,7 +1781,7 @@ def get_authenticated_user(args): def check_git_lfs_install(): exit_code = subprocess.call( - ["git", "lfs", "version"], + ["git", "lfs", "version"], stdin=subprocess.DEVNULL, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, ) if exit_code != 0: @@ -2982,7 +2982,7 @@ def fetch_repository( masked_remote_url = mask_password(remote_url) initialized = subprocess.call( - ["git", "ls-remote", remote_url], + ["git", "ls-remote", remote_url], stdin=subprocess.DEVNULL, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, ) if initialized == 128: From ccc27b95f7203ec42bf695cc270317fdd73f4489 Mon Sep 17 00:00:00 2001 From: Changaco Date: Thu, 30 Apr 2026 10:46:46 +0000 Subject: [PATCH 138/148] remove legacy code in `mkdir_p` function --- github_backup/github_backup.py | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index b76322a..4c07808 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -6,7 +6,6 @@ import base64 import calendar import codecs -import errno import json import logging import os @@ -127,13 +126,7 @@ def check_io(): def mkdir_p(*args): for path in args: - try: - os.makedirs(path) - except OSError as exc: # Python >2.5 - if exc.errno == errno.EEXIST and os.path.isdir(path): - pass - else: - raise + os.makedirs(path, exist_ok=True) def mask_password(url, secret="*****"): From f1fca0f9b7379e02c3d0903daee9d1954d7009eb Mon Sep 17 00:00:00 2001 From: Changaco Date: Thu, 30 Apr 2026 10:53:40 +0000 Subject: [PATCH 139/148] don't leave files open --- github_backup/github_backup.py | 41 ++++++++++++++++++++-------------- 1 file changed, 24 insertions(+), 17 deletions(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 4c07808..e567d3e 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -624,7 +624,8 @@ def get_github_host(args): def read_file_contents(file_uri): - return open(file_uri[len(FILE_URI_PREFIX) :], "rt").readline().strip() + with open(file_uri[len(FILE_URI_PREFIX) :], "rt") as f: + return f.readline().strip() def read_token_from_gh_cli(args): @@ -1964,10 +1965,11 @@ def read_legacy_last_update(args, output_directory): return None, None last_update_path = os.path.join(output_directory, INCREMENTAL_LAST_UPDATE_FILENAME) - if os.path.exists(last_update_path): - return last_update_path, open(last_update_path).read().strip() - - return last_update_path, None + try: + with open(last_update_path) as f: + return last_update_path, f.read().strip() + except FileNotFoundError: + return last_update_path, None def read_resource_last_update(args, resource_cwd, legacy_last_update=None): @@ -1975,13 +1977,13 @@ def read_resource_last_update(args, resource_cwd, legacy_last_update=None): return None last_update_path = os.path.join(resource_cwd, INCREMENTAL_LAST_UPDATE_FILENAME) - if os.path.exists(last_update_path): - return open(last_update_path).read().strip() - - if legacy_last_update and resource_backup_exists(resource_cwd): - return legacy_last_update - - return None + try: + with open(last_update_path) as f: + return f.read().strip() + except FileNotFoundError: + if legacy_last_update and resource_backup_exists(resource_cwd): + return legacy_last_update + return None def write_resource_last_update(args, resource_cwd, repository): @@ -1990,7 +1992,8 @@ def write_resource_last_update(args, resource_cwd, repository): mkdir_p(resource_cwd) last_update_path = os.path.join(resource_cwd, INCREMENTAL_LAST_UPDATE_FILENAME) - open(last_update_path, "w").write(get_repository_checkpoint_time(repository)) + with open(last_update_path, "w") as f: + f.write(get_repository_checkpoint_time(repository)) def iter_incremental_resource_dirs(output_directory): @@ -2378,7 +2381,8 @@ def backup_discussions(args, repo_cwd, repository): discussions_since = None discussion_last_update_path = os.path.join(discussion_cwd, "last_update") if args.incremental and os.path.exists(discussion_last_update_path): - discussions_since = open(discussion_last_update_path).read().strip() + with open(discussion_last_update_path) as f: + discussions_since = f.read().strip() logger.info("Retrieving {0} discussions".format(repository["full_name"])) try: @@ -2464,7 +2468,8 @@ def backup_discussions(args, repo_cwd, repository): and newest_seen and (not discussions_since or newest_seen > discussions_since) ): - open(discussion_last_update_path, "w").write(newest_seen) + with open(discussion_last_update_path, "w") as f: + f.write(newest_seen) attempted_count = len(summaries) - skipped_count if not summaries: @@ -2601,7 +2606,8 @@ def get_pull_reviews_since(args, pulls_cwd): # repository-level checkpoint would otherwise skip old PRs forever. return None, None, reviews_last_update_path - reviews_since = open(reviews_last_update_path).read().strip() + with open(reviews_last_update_path) as f: + reviews_since = f.read().strip() if args_since and reviews_since: return min(args_since, reviews_since), reviews_since, reviews_last_update_path @@ -2753,7 +2759,8 @@ def pull_is_due_for_repository_checkpoint(pull): and not pull_review_errors and (not pull_reviews_since or newest_pull_update > pull_reviews_since) ): - open(pull_reviews_last_update_path, "w").write(newest_pull_update) + with open(pull_reviews_last_update_path, "w") as f: + f.write(newest_pull_update) def backup_milestones(args, repo_cwd, repository, repos_template): From 17b79fcbef880e529ab376090fbd193f102300ac Mon Sep 17 00:00:00 2001 From: Changaco Date: Thu, 30 Apr 2026 10:58:08 +0000 Subject: [PATCH 140/148] rename a function to match what it actually does --- github_backup/github_backup.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index e567d3e..f4a94b9 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -545,7 +545,7 @@ def get_auth(args, encode=True, for_git_cli=False): ) elif args.token_fine: if args.token_fine.startswith(FILE_URI_PREFIX): - args.token_fine = read_file_contents(args.token_fine) + args.token_fine = read_first_line(args.token_fine) if args.token_fine.startswith("github_pat_"): auth = args.token_fine @@ -561,7 +561,7 @@ def get_auth(args, encode=True, for_git_cli=False): ) args.token_classic = read_token_from_gh_cli(args) elif args.token_classic.startswith(FILE_URI_PREFIX): - args.token_classic = read_file_contents(args.token_classic) + args.token_classic = read_first_line(args.token_classic) if not args.as_app: auth = args.token_classic + ":" + "x-oauth-basic" @@ -623,7 +623,7 @@ def get_github_host(args): return host -def read_file_contents(file_uri): +def read_first_line(file_uri): with open(file_uri[len(FILE_URI_PREFIX) :], "rt") as f: return f.readline().strip() From 3cda5a01fdf094ea33de7d3c02aa7cc60d553e9b Mon Sep 17 00:00:00 2001 From: Changaco Date: Fri, 10 Apr 2026 20:32:16 +0000 Subject: [PATCH 141/148] document that `--all` doesn't imply `--attachments` --- README.rst | 2 +- github_backup/github_backup.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/README.rst b/README.rst index 3a4be3b..ed037fd 100644 --- a/README.rst +++ b/README.rst @@ -325,7 +325,7 @@ Gotchas / Known-issues All is not everything --------------------- -The ``--all`` argument does not include: cloning private repos (``-P, --private``), cloning forks (``-F, --fork``), cloning starred repositories (``--all-starred``), ``--pull-details``, cloning LFS repositories (``--lfs``), cloning gists (``--gists``) or cloning starred gist repos (``--starred-gists``). See examples for more. +The ``--all`` argument does not include: downloading attachments from issue and pull request comments (``--attachments``), cloning private repos (``-P, --private``), cloning forks (``-F, --fork``), cloning starred repositories (``--all-starred``), ``--pull-details``, cloning LFS repositories (``--lfs``), cloning gists (``--gists``) or cloning starred gist repos (``--starred-gists``). See examples for more. Starred repository size ----------------------- diff --git a/github_backup/github_backup.py b/github_backup/github_backup.py index 8b96622..dc872c7 100644 --- a/github_backup/github_backup.py +++ b/github_backup/github_backup.py @@ -488,7 +488,7 @@ def parse_args(args=None): "--attachments", action="store_true", dest="include_attachments", - help="download user-attachments from issues, pull requests, and discussions", + help="download user-attachments from issues, pull requests, and discussions [*]", ) parser.add_argument( "--throttle-limit", From 543d76f24bc4eb808618e7a8b5ccbabea80fa700 Mon Sep 17 00:00:00 2001 From: Changaco Date: Fri, 10 Apr 2026 20:35:06 +0000 Subject: [PATCH 142/148] fix a typo in the README --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index ed037fd..e5f0f14 100644 --- a/README.rst +++ b/README.rst @@ -363,7 +363,7 @@ This means any blocking errors on previous runs can cause missing data in backup Using (``--incremental-by-files``) will request new data from the API **based on when the file was modified on filesystem**. e.g. if you modify the file yourself you may miss something. -Still saver than the previous version. +Still safer than the previous version. Specifically, issues and pull requests are handled like this. From 9340aa3aaada4c2d41aa8f9c1b6164f9ee9ed082 Mon Sep 17 00:00:00 2001 From: Changaco Date: Fri, 10 Apr 2026 20:35:47 +0000 Subject: [PATCH 143/148] try to clarify what `--incremental` actually does --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index e5f0f14..1bd3ff6 100644 --- a/README.rst +++ b/README.rst @@ -365,7 +365,7 @@ Using (``--incremental-by-files``) will request new data from the API **based on Still safer than the previous version. -Specifically, issues and pull requests are handled like this. +Incremental backup only changes how issue and pull request data is fetched. Known blocking errors --------------------- From a2391a550e45ff4882f006696599fcd408317781 Mon Sep 17 00:00:00 2001 From: Changaco Date: Fri, 10 Apr 2026 20:37:05 +0000 Subject: [PATCH 144/148] remove pointless and unsafe `export`s in examples --- README.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.rst b/README.rst index 1bd3ff6..33a89fb 100644 --- a/README.rst +++ b/README.rst @@ -429,12 +429,12 @@ Github Backup Examples Backup all repositories, including private ones using a classic token:: - export ACCESS_TOKEN=SOME-GITHUB-TOKEN + ACCESS_TOKEN=SOME-GITHUB-TOKEN github-backup WhiteHouse --token $ACCESS_TOKEN --organization --output-directory /tmp/white-house --repositories --private Use a fine-grained access token to backup a single organization repository with everything else (wiki, pull requests, comments, issues etc):: - export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN + FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN ORGANIZATION=docker REPO=cli # e.g. git@github.com:docker/cli.git @@ -442,14 +442,14 @@ Use a fine-grained access token to backup a single organization repository with Quietly and incrementally backup useful Github user data (public and private repos with SSH) including; all issues, pulls, all public starred repos and gists (omitting "hooks", "releases" and therefore "assets" to prevent blocking). *Great for a cron job.* :: - export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN + FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN GH_USER=YOUR-GITHUB-USER github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-reviews --pull-commits --labels --milestones --security-advisories --discussions --repositories --wikis --releases --assets --attachments --pull-details --gists --starred-gists $GH_USER Debug an error/block or incomplete backup into a temporary directory. Omit "incremental" to fill a previous incomplete backup. :: - export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN + FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN GH_USER=YOUR-GITHUB-USER github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-reviews --pull-commits --labels --milestones --discussions --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER From d30d9bfe6034b174ae3839f7aa13f4ad2eff4dc3 Mon Sep 17 00:00:00 2001 From: Changaco Date: Fri, 10 Apr 2026 20:38:31 +0000 Subject: [PATCH 145/148] eliminate trailing spaces --- README.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.rst b/README.rst index 33a89fb..c4d0fd0 100644 --- a/README.rst +++ b/README.rst @@ -22,7 +22,7 @@ Using PIP via PyPI:: Using PIP via Github (more likely the latest version):: pip install git+https://github.com/josegonzalez/python-github-backup.git#egg=github-backup - + *Install note for python newcomers:* Python scripts are unlikely to be included in your ``$PATH`` by default, this means it cannot be run directly in terminal with ``$ github-backup ...``, you can either add python's install path to your environments ``$PATH`` or call the script directly e.g. using ``$ ~/.local/bin/github-backup``.* @@ -249,7 +249,7 @@ Note: When you run github-backup, you will be asked whether you want to allow " Github Rate-limit and Throttling -------------------------------- -"github-backup" will automatically throttle itself based on feedback from the Github API. +"github-backup" will automatically throttle itself based on feedback from the Github API. Their API is usually rate-limited to 5000 calls per hour. The API will ask github-backup to pause until a specific time when the limit is reset again (at the start of the next hour). This continues until the backup is complete. @@ -446,7 +446,7 @@ Quietly and incrementally backup useful Github user data (public and private rep GH_USER=YOUR-GITHUB-USER github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-reviews --pull-commits --labels --milestones --security-advisories --discussions --repositories --wikis --releases --assets --attachments --pull-details --gists --starred-gists $GH_USER - + Debug an error/block or incomplete backup into a temporary directory. Omit "incremental" to fill a previous incomplete backup. :: FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN From 8e76089565d7822bd94816433c2509daee40f26b Mon Sep 17 00:00:00 2001 From: Changaco Date: Sat, 25 Apr 2026 07:07:24 +0000 Subject: [PATCH 146/148] document that nothing is saved by default --- README.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/README.rst b/README.rst index c4d0fd0..c3d5d5d 100644 --- a/README.rst +++ b/README.rst @@ -327,6 +327,11 @@ All is not everything The ``--all`` argument does not include: downloading attachments from issue and pull request comments (``--attachments``), cloning private repos (``-P, --private``), cloning forks (``-F, --fork``), cloning starred repositories (``--all-starred``), ``--pull-details``, cloning LFS repositories (``--lfs``), cloning gists (``--gists``) or cloning starred gist repos (``--starred-gists``). See examples for more. +Saves nothing if no arguments are passed +---------------------------------------- + +At least one argument like ``--all`` or ``--repositories`` is needed for github-backup to actually save data. Without relevant arguments, github-backup fetches some data from GitHub but doesn't put any of it into files. + Starred repository size ----------------------- From bd6eea02d5095a83d25f2d57202bb78c93be1cc2 Mon Sep 17 00:00:00 2001 From: GitHub Action Date: Thu, 30 Apr 2026 15:52:41 +0000 Subject: [PATCH 147/148] Release version 0.62.1 --- CHANGES.rst | 58 ++++++++++++++++++++++++++++++++++++++- github_backup/__init__.py | 2 +- 2 files changed, 58 insertions(+), 2 deletions(-) diff --git a/CHANGES.rst b/CHANGES.rst index 86bcb32..20ac838 100644 --- a/CHANGES.rst +++ b/CHANGES.rst @@ -1,9 +1,65 @@ Changelog ========= -0.62.0 (2026-04-29) +0.62.1 (2026-04-30) ------------------- ------------------------ +- Document that nothing is saved by default. [Changaco] +- Eliminate trailing spaces. [Changaco] +- Remove pointless and unsafe `export`s in examples. [Changaco] +- Try to clarify what `--incremental` actually does. [Changaco] +- Fix a typo in the README. [Changaco] +- Document that `--all` doesn't imply `--attachments` [Changaco] +- Rename a function to match what it actually does. [Changaco] +- Don't leave files open. [Changaco] +- Remove legacy code in `mkdir_p` function. [Changaco] +- Don't pass stdin when doing so can't do any good. [Changaco] + + When the child process doesn't inherit stderr, it can't ask the user for input, so it shouldn't inherit stdin either. +- Use `subprocess.DEVNULL` instead of emulating it. [Changaco] +- Remove bad invocation of the system shell. [Changaco] +- Add missing `context` argument to `urlopen` call. [Changaco] +- Suppress output of call to `git lfs version` [Changaco] +- Handle more network errors. [Changaco] + + ```python-traceback + Traceback (most recent call last): + File ".local/bin/github-backup", line 6, in + sys.exit(main()) + ~~~~^^ + File ".local/share/pipx/venvs/github-backup/lib/python3.14/site-packages/github_backup/cli.py", line 83, in main + backup_repositories(args, output_directory, repositories) + ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File ".local/share/pipx/venvs/github-backup/lib/python3.14/site-packages/github_backup/github_backup.py", line 1845, in backup_repositories + backup_pulls(args, repo_cwd, repository, repos_template) + ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File ".local/share/pipx/venvs/github-backup/lib/python3.14/site-packages/github_backup/github_backup.py", line 2019, in backup_pulls + pulls[number]["commit_data"] = retrieve_data(args, template) + ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^ + File ".local/share/pipx/venvs/github-backup/lib/python3.14/site-packages/github_backup/github_backup.py", line 766, in retrieve_data + return list(fetch_all()) + File ".local/share/pipx/venvs/github-backup/lib/python3.14/site-packages/github_backup/github_backup.py", line 717, in fetch_all + response = json.loads(http_response.read().decode("utf-8")) + ~~~~~~~~~~~~~~~~~~^^ + File "/usr/lib/python3.14/http/client.py", line 500, in read + s = self._safe_read(self.length) + File "/usr/lib/python3.14/http/client.py", line 648, in _safe_read + data = self.fp.read(cursize) + File "/usr/lib/python3.14/socket.py", line 725, in readinto + return self._sock.recv_into(b) + ~~~~~~~~~~~~~~~~~~~~^^^ + File "/usr/lib/python3.14/ssl.py", line 1304, in recv_into + return self.read(nbytes, buffer) + ~~~~~~~~~^^^^^^^^^^^^^^^^ + File "/usr/lib/python3.14/ssl.py", line 1138, in read + return self._sslobj.read(len, buffer) + ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^ + ConnectionResetError: [Errno 104] Connection reset by peer + ``` + + +0.62.0 (2026-04-29) +------------------- - Skip checkpoint-equal incremental items. [Duncan Ogilvie] - Avoid redundant release asset list requests. [Duncan Ogilvie] - Reduce unnecessary pull requests with incremental fetching. [Duncan diff --git a/github_backup/__init__.py b/github_backup/__init__.py index 647040d..b7b61f3 100644 --- a/github_backup/__init__.py +++ b/github_backup/__init__.py @@ -1 +1 @@ -__version__ = "0.62.0" +__version__ = "0.62.1" From 2cbce1425cbb2a2f00ba7996f795415d2ede6c37 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 18 May 2026 22:45:36 +0000 Subject: [PATCH 148/148] chore(deps): bump black in the python-packages group Bumps the python-packages group with 1 update: [black](https://github.com/psf/black). Updates `black` from 26.3.1 to 26.5.1 - [Release notes](https://github.com/psf/black/releases) - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) - [Commits](https://github.com/psf/black/compare/26.3.1...26.5.1) --- updated-dependencies: - dependency-name: black dependency-version: 26.5.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: python-packages ... Signed-off-by: dependabot[bot] --- release-requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/release-requirements.txt b/release-requirements.txt index ad8bc5c..117aeea 100644 --- a/release-requirements.txt +++ b/release-requirements.txt @@ -1,6 +1,6 @@ # Linting & Formatting autopep8==2.3.2 -black==26.3.1 +black==26.5.1 flake8==7.3.0 # Testing