From ff2409daf0404783f265669767d38cd4ea228f00 Mon Sep 17 00:00:00 2001 From: "roger.yu" Date: Thu, 31 Jul 2025 13:27:53 -0700 Subject: [PATCH 1/5] update instructions --- .github/chatmodes/GitHub Tools.chatmode.md | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 .github/chatmodes/GitHub Tools.chatmode.md diff --git a/.github/chatmodes/GitHub Tools.chatmode.md b/.github/chatmodes/GitHub Tools.chatmode.md new file mode 100644 index 0000000..5060b7a --- /dev/null +++ b/.github/chatmodes/GitHub Tools.chatmode.md @@ -0,0 +1,5 @@ +--- +description: 'Description of the custom chat mode.' +tools: ['changes', 'codebase', 'editFiles', 'extensions', 'fetch', 'findTestFiles', 'githubRepo', 'new', 'openSimpleBrowser', 'problems', 'runCommands', 'runNotebooks', 'runTasks', 'runTests', 'search', 'searchResults', 'terminalLastCommand', 'terminalSelection', 'testFailure', 'usages', 'vscodeAPI', 'github', 'figma', 'filesystem', 'pylance mcp server', 'activePullRequest', 'copilotCodingAgent', 'configurePythonEnvironment', 'getPythonEnvironmentInfo', 'getPythonExecutableCommand', 'installPythonPackage'] +--- +Define the purpose of this chat mode and how AI should behave: response style, available tools, focus areas, and any mode-specific instructions or constraints. \ No newline at end of file From b58df65b51e31cd630cebdeb155e0faa5ad21234 Mon Sep 17 00:00:00 2001 From: "roger.yu" Date: Thu, 31 Jul 2025 13:42:57 -0700 Subject: [PATCH 2/5] Add project description HTML and external CSS for documentation page --- github_backup/docs/assets/css/style.css | 21 ++++++++++++ github_backup/docs/idex.html | 43 +++++++++++++++++++++++++ 2 files changed, 64 insertions(+) create mode 100644 github_backup/docs/assets/css/style.css create mode 100644 github_backup/docs/idex.html diff --git a/github_backup/docs/assets/css/style.css b/github_backup/docs/assets/css/style.css new file mode 100644 index 0000000..d64d354 --- /dev/null +++ b/github_backup/docs/assets/css/style.css @@ -0,0 +1,21 @@ +body { + font-family: Arial, sans-serif; + margin: 2em; + background: #f9f9f9; + color: #222; +} +h1 { + color: #0366d6; +} +code { + background: #eee; + padding: 2px 4px; + border-radius: 3px; +} +.section { + margin-bottom: 2em; +} +footer hr { + margin-top: 2em; + margin-bottom: 1em; +} diff --git a/github_backup/docs/idex.html b/github_backup/docs/idex.html new file mode 100644 index 0000000..828e345 --- /dev/null +++ b/github_backup/docs/idex.html @@ -0,0 +1,43 @@ + + + + + + python-github-backup + + + +

python-github-backup

+
+

python-github-backup is a command-line tool to backup your entire GitHub account, including repositories, issues, pull requests, releases, wikis, gists, and more. It is designed for disaster recovery, migration, and archiving purposes.

+
+
+

Features

+
    +
  • Backup all repositories (including private, forks, and gists)
  • +
  • Backup issues, pull requests, releases, labels, milestones, and hooks
  • +
  • Backup repository wikis and GitHub Pages content (gh-pages branch)
  • +
  • Incremental and full backup modes
  • +
  • Support for personal, organization, and GitHub Enterprise accounts
  • +
  • Flexible authentication (token, OAuth, username/password, keychain)
  • +
  • JSON output for metadata and configuration
  • +
+
+
+

Usage Example

+
github-backup <username> --token <your_token> --repositories --issues --pulls --releases --wikis --gists --output-directory ./backup
+

See the README.rst for full usage instructions and options.

+
+
+

Project Links

+ +
+
+
+

© 2025 AvePoint. Project: python-github-backup

+
+ + From 3c00fa157dec6764ad0633ab20b36add6f1b8625 Mon Sep 17 00:00:00 2001 From: "roger.yu" Date: Thu, 31 Jul 2025 13:44:36 -0700 Subject: [PATCH 3/5] css styles --- github_backup/docs/assets/style.css | 0 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 github_backup/docs/assets/style.css diff --git a/github_backup/docs/assets/style.css b/github_backup/docs/assets/style.css new file mode 100644 index 0000000..e69de29 From 1bcd0f784770e10a00521c30ea8736f009e5b097 Mon Sep 17 00:00:00 2001 From: "roger.yu" Date: Thu, 31 Jul 2025 13:47:24 -0700 Subject: [PATCH 4/5] move pages files location to /docs --- {github_backup/docs => docs}/assets/css/style.css | 0 {github_backup/docs => docs}/idex.html | 0 github_backup/docs/assets/style.css | 0 3 files changed, 0 insertions(+), 0 deletions(-) rename {github_backup/docs => docs}/assets/css/style.css (100%) rename {github_backup/docs => docs}/idex.html (100%) delete mode 100644 github_backup/docs/assets/style.css diff --git a/github_backup/docs/assets/css/style.css b/docs/assets/css/style.css similarity index 100% rename from github_backup/docs/assets/css/style.css rename to docs/assets/css/style.css diff --git a/github_backup/docs/idex.html b/docs/idex.html similarity index 100% rename from github_backup/docs/idex.html rename to docs/idex.html diff --git a/github_backup/docs/assets/style.css b/github_backup/docs/assets/style.css deleted file mode 100644 index e69de29..0000000 From bb04736a2dd7d72d12d8e9706ad9f8eaded198cc Mon Sep 17 00:00:00 2001 From: "roger.yu" Date: Thu, 31 Jul 2025 17:54:22 -0700 Subject: [PATCH 5/5] add mcp config --- .github/copilot-instructions.md | 110 ++++++++++++++++++++++++++++++++ 1 file changed, 110 insertions(+) create mode 100644 .github/copilot-instructions.md diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 0000000..6cd5617 --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,110 @@ +# GitHub Backup - AI Coding Instructions + +## Project Overview + +This is a Python CLI tool for comprehensive GitHub data backup. The architecture follows a single-module design with clear separation of concerns across functional areas. + +## Core Architecture + +### Main Entry Points +- **`bin/github-backup`**: CLI entry point that orchestrates the backup workflow +- **`github_backup/github_backup.py`**: Single module containing all core functionality (~1400+ lines) +- **`github_backup/__init__.py`**: Version tracking only + +### Data Flow Pattern +1. **Parse & Authenticate** → `parse_args()` → `get_auth()` → `get_authenticated_user()` +2. **Discover** → `retrieve_repositories()` → `filter_repositories()` +3. **Backup** → `backup_repositories()` + `backup_account()` + +### GitHub API Integration +- Uses `retrieve_data_gen()` for paginated API calls with automatic rate limiting +- Template-based URL construction: `"https://{host}/repos/{owner}/{name}/issues"` +- Built-in retry logic for 502 errors and incomplete reads +- Supports both classic tokens (`-t`) and fine-grained tokens (`-f`) + +## Key Development Patterns + +### Authentication Flexibility +```python +# Supports multiple auth methods in get_auth(): +# - Fine-grained tokens (github_pat_...) +# - Classic tokens with x-oauth-basic +# - Basic username/password +# - OSX Keychain integration +# - GitHub App authentication (--as-app) +``` + +### Incremental Backup Strategy +- **Time-based**: `--incremental` uses API `since` parameter with last backup timestamp +- **File-based**: `--incremental-by-files` compares filesystem modification times +- State stored in `{output_dir}/last_update` file + +### Git Repository Handling +- Uses `logging_subprocess()` wrapper for all git operations +- Supports both regular clones and bare/mirror clones (`--bare` → `git clone --mirror`) +- SSH vs HTTPS preference via `--prefer-ssh` flag +- LFS support with `git lfs fetch --all --prune` + +### Output Directory Structure +``` +{output_dir}/ +├── repositories/{repo_name}/repository/ # Git clones +├── starred/{owner}/{repo_name}/ # Starred repos +├── gists/{gist_id}/ # User gists +├── account/{starred,followers,following}.json +└── {repo}/issues/{number}.json # Per-repo data +``` + +## Development Workflows + +### Testing & Linting +```bash +# No unit tests exist - this is acknowledged in README +pip install flake8 +flake8 --ignore=E501,E203,W503 # Same as CI +``` + +### Docker Development +```bash +docker run --rm -v /path/to/backup:/data --name github-backup \ + ghcr.io/josegonzalez/python-github-backup -o /data $OPTIONS $USER +``` + +### Release Process +- Automated via GitHub Actions (`automatic-release.yml`, `tagged-release.yml`) +- Version bumping in `github_backup/__init__.py` +- Docker image publishing to ghcr.io + +## Critical Implementation Details + +### Rate Limiting Strategy +- Automatic throttling based on `x-ratelimit-remaining` header +- Custom throttling via `--throttle-limit` and `--throttle-pause` +- Exponential backoff for 403 rate limit responses + +### Error Handling Philosophy +- Graceful degradation for missing data (404s logged but don't block) +- Blocking errors (403 auth failures) exit entirely +- Incomplete reads get 3 retry attempts with 5-second delays + +### File I/O Patterns +- Atomic writes via `.temp` files then `os.rename()` +- UTF-8 encoding with `codecs.open()` for JSON files +- JSON formatting: `ensure_ascii=False, sort_keys=True, indent=4` + +## Common Gotchas + +1. **`--all` doesn't include everything**: Missing private repos, forks, starred repos, LFS, gists +2. **`--bare` is actually `--mirror`**: Uses `git clone --mirror`, not `git clone --bare` +3. **Starred gists**: Stored in same directory as user gists, not separately +4. **Incremental risks**: Failed runs can cause missing data in subsequent incremental backups +5. **Authentication scope**: Fine-grained tokens need specific repository and user permissions + +## Extension Points + +When adding new backup types, follow the pattern: +1. Add CLI argument in `parse_args()` +2. Create `backup_*()` function following existing patterns +3. Call from `backup_repositories()` or `backup_account()` +4. Use `retrieve_data()` for API calls and `mkdir_p()` for directories +5. Follow atomic file writing pattern with `.temp` files