Add AI_POLICY for clarification how to use AI agents by leborchuk · Pull Request #1740 · apache/cloudberry · GitHub

leborchuk · 2026-05-12T11:36:37Z

See detailed discussion in https://lists.apache.org/thread/3kq1391n3n0rzo0wchygmt0cyy59rzq9

As for the discussion results I've added:

AI_GUIDELINE.md - note for the developer what using AI agents means
AGENTS.md.template - template to create your own AGENTS.md file
.github/pull_request_template.md - new flag "This PR contains AI-assisted code generation"
README.md - link to guideline from README file

Fixes #ISSUE_Number

What does this PR do?

Type of Change

Bug fix (non-breaking change)
New feature (non-breaking change)
Breaking change (fix or feature with breaking changes)
Documentation update

Breaking Changes

Test Plan

Unit tests added/updated
Integration tests added/updated
Passed make installcheck
Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Followed contribution guide
Added/updated documentation
Reviewed code for security implications
Requested review from cloudberry committers

Additional Context

CI Skip Instructions

Copilot

Pull request overview

Adds project documentation and process guardrails for AI-assisted contributions, including a new AI policy doc, agent guidance for working in the repository, and a PR-template disclosure checkbox.

Changes:

Add AI_POLICY.md describing expectations for responsible AI-assisted development.
Add AGENTS.md with repository map and rules for agent-style coding tools.
Update the PR template and README to surface AI policy/disclosure.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
README.md	Adds a link to the new AI policy in the contribution resources table.
AI_POLICY.md	New AI policy document for AI-assisted development expectations.
AGENTS.md	New guidance for agent tools working in this repo (layout, principles, workflow).
.github/pull_request_template.md	Adds an AI-assisted code generation disclosure checkbox.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tuhaihe · 2026-05-13T03:38:47Z

+
+### 6. LLM code review
+
+So far, it is not possible to use paid LLM models for code review in open source ASF projects. However, one could use personal licenses for LLMs to do the same. 


I think we need to improve the description here, like:

Some AI review tools (for example, GitHub Copilot review or CodeRabbit)
may not currently be enabled for ASF-hosted repositories due to operational, budget, or permission considerations. Contributors may still use personal AI tools locally, but remain responsible for code quality, licensing, and review outcomes.

+
+- [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) - Official Apache guidance on AI tool usage
+- [GitHub Copilot](https://github.com/features/copilot) - AI pair programmer and code reviewer we use
+- [LLM Leaderboard](https://llm-stats.com/) - LLM Stats Score, it's better to use high-ranked models


jiaqizho · 2026-05-12T11:59:23Z

Hi @leborchuk, I don't think that adding a project-level AGENTS.md or CLAUDE.md in project is a good idea.

My first concern is that these files are not just passive documentation. For tools like Codex or Claude Code, project-level instruction files can be loaded automatically and affect every local AI session in this repository. As far as Codex is concerned, AGENTS.md is project-scoped; there is no separate global AGENTS.md file that each developer can maintain independently. If the repository ships one, it effectively becomes part of the local agent behavior for everyone who checks out the project. I would prefer each contributor to define their own AI agent style, level of autonomy, communication pattern, and workflow preferences.

My second concern is the content. A good project-level agent file should mainly capture hard project constraints and practical project knowledge: required license/header rules, build and test commands, high-risk subsystems, generated files ... It should constrain or assist the agent where the project has real requirements. It should not try to tune the agent's general coding style, personality, or workflow. Many items in the proposed AGENTS.md read more like agent prompting or personal preferences about how an AI assistant should behave, rather than Cloudberry-specific rules. Those preferences can vary by developer and by tool, and I do not think the project should freeze one particular AI-agent behavior into the repository.

@my-ship-it Please consider this change seriously.

yjhjstz · 2026-05-12T13:15:12Z

AGENTS.md is not suitable for committing to git, as it is platform- and user-specific.

See detailed discussion in https://lists.apache.org/thread/3kq1391n3n0rzo0wchygmt0cyy59rzq9 As for the discussion results I've added: 1. AI_POLICY.md - note for the developer what using AI agents means 2. AGENTS.md - description for LLM models how to work with project code 3. .github/pull_request_template.md - new flag "This PR contains AI-assisted code generation" 4. README.md - link to policy from README file

leborchuk · 2026-05-12T15:20:58Z

Thank you, got it, remove AGENTS.md

tuhaihe

One more small suggestion: it may be good to follow a formatting style similar to README.md or CONTRIBUTING.md, especially for line wrapping. Limiting line length to around 76–80 characters would make the document more consistent with other markdown documentation and improve readability in terminal/editor views.

tuhaihe · 2026-05-13T03:38:47Z

+
+### 6. LLM code review
+
+So far, it is not possible to use paid LLM models for code review in open source ASF projects. However, one could use personal licenses for LLMs to do the same. 


I think we need to improve the description here, like:

Some AI review tools (for example, GitHub Copilot review or CodeRabbit)
may not currently be enabled for ASF-hosted repositories due to operational, budget, or permission considerations. Contributors may still use personal AI tools locally, but remain responsible for code quality, licensing, and review outcomes.

tuhaihe · 2026-05-13T04:30:14Z

A couple of additional suggestions that may help improve the overall developer workflow around AI-assisted contributions:

It may be useful to add an optional attribution line in the git commit message template, similar to the approach discussed in the Linux kernel coding assistants guidance:
https://docs.kernel.org/process/coding-assistants.html

For example:

Assisted-by: ChatGPT
Assisted-by: GitHub Copilot

This could provide lightweight transparency/provenance information without making AI disclosure a strict requirement. If this direction looks reasonable, we could also update .gitmessage accordingly.

Regarding AGENTS.md, I think the idea itself is still useful for contributor onboarding, especially for developers who are newer to AI-assisted workflows. To avoid conflicts with local developer files, perhaps we could rename it to something like:

AGENTS.md.template

This would position it more as a reference/example template rather than a required project file, while still helping contributors quickly get started with AI-assisted development practices.

Some other ASF projects also ship AGENTS.md:

tuhaihe

Just left some new comments. Thanks again!

tuhaihe · 2026-05-14T03:28:38Z

+
+## Project overview
+
+Apache Cloudberry is an Apache Incubator project and an open-source massively parallel processing database. It evolved from Greenplum Database and is built on a PostgreSQL kernel. It is used for data warehouse, large-scale analytics, and AI or ML workloads.


Suggested change

Apache Cloudberry is an Apache Incubator project and an open-source massively parallel processing database. It evolved from Greenplum Database and is built on a PostgreSQL kernel. It is used for data warehouse, large-scale analytics, and AI or ML workloads.

Apache Cloudberry is an Apache Incubator project and an open-source massively parallel processing database. It evolved from Greenplum Database and is built on a modern PostgreSQL kernel. It is used for data warehouse, large-scale analytics, and AI or ML workloads.

tuhaihe · 2026-05-14T03:29:23Z

+
+- Keep changes as small and direct as possible.
+- Do not perform broad code refactoring. Cloudberry's core is PostgreSQL-based, and unnecessary refactoring makes familiar code harder for maintainers to recognize and review.
+- Preserve PostgreSQL and Greenplum coding style in the area being edited.


Suggested change

- Preserve PostgreSQL and Greenplum coding style in the area being edited.

- Preserve PostgreSQL and Cloudberry coding style in the area being edited.

tuhaihe · 2026-05-14T03:30:55Z

+
+- [README.md](README.md) — project introduction, community links, contribution overview, and license information.
+- [CONTRIBUTING.md](CONTRIBUTING.md) — contribution expectations and community guidance.
+- [AI_POLICY.md](AI_POLICY.md) — rules for AI-assisted development.


Suggested change

- [AI_POLICY.md](AI_POLICY.md) — rules for AI-assisted development.

- [AI_GUIDELINE.md](AI_GUIDELINE.md) — rules for AI-assisted development.

tuhaihe · 2026-05-14T03:37:59Z

+
+## AI-assisted contribution policy
+
+Follow [AI_POLICY.md](AI_POLICY.md):


Suggested change

Follow [AI_POLICY.md](AI_POLICY.md):

Follow [AI_GUIDELINE.md](AI_GUIDELINE.md):

tuhaihe · 2026-05-14T03:39:45Z

+- AI-assisted changes must pass normal review, testing, and CI standards.
+- The contributor must ensure license compatibility.
+- Significant AI-generated code should be disclosed using the PR template checkbox.
+- Do not use AI to auto-generate responses to maintainer review feedback.


Do not use AI to auto-generate responses to maintainer review feedback.

Maybe we need to keep the description aligned with the new words in the AI guidelines.

tuhaihe · 2026-05-14T03:52:59Z

+- Confirm documentation updates when needed.
+- Confirm security review consideration.
+- Disclose significant AI-assisted code generation.
+


Can add the guidelines on the commit message, like:

## Commit Conventions - Add the standard Apache License header for the newly created files (no need for the third-party files). - When drafting the commit message, please take the [.gitmessage](.gitmessage) template as a reference. - ...

leborchuk marked this pull request as ready for review May 12, 2026 11:36

Copilot AI review requested due to automatic review settings May 12, 2026 11:36

Copilot started reviewing on behalf of leborchuk May 12, 2026 11:37 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

leborchuk added 2 commits May 12, 2026 18:18

Remove AGENTS.md as totally unsuitable

469e2e2

leborchuk force-pushed the AddAIPolicy branch from 991ed21 to 469e2e2 Compare May 12, 2026 15:19

tuhaihe reviewed May 13, 2026

View reviewed changes

leborchuk added 2 commits May 13, 2026 12:22

Rename AI_POLICY to AI_GUIDELINE and add AGENTS.md.template file

cf3d7a8

Use long -

f022292

tuhaihe reviewed May 14, 2026

View reviewed changes


		### 6. LLM code review

		So far, it is not possible to use paid LLM models for code review in open source ASF projects. However, one could use personal licenses for LLMs to do the same.


		## Project overview

		Apache Cloudberry is an Apache Incubator project and an open-source massively parallel processing database. It evolved from Greenplum Database and is built on a PostgreSQL kernel. It is used for data warehouse, large-scale analytics, and AI or ML workloads.

	- Preserve PostgreSQL and Greenplum coding style in the area being edited.
	- Preserve PostgreSQL and Cloudberry coding style in the area being edited.

	- [AI_POLICY.md](AI_POLICY.md) — rules for AI-assisted development.
	- [AI_GUIDELINE.md](AI_GUIDELINE.md) — rules for AI-assisted development.


		## AI-assisted contribution policy

		Follow [AI_POLICY.md](AI_POLICY.md):

	Follow [AI_POLICY.md](AI_POLICY.md):
	Follow [AI_GUIDELINE.md](AI_GUIDELINE.md):

Conversation

leborchuk commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of Change

Breaking Changes

Test Plan

Impact

Checklist

Additional Context

CI Skip Instructions

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jiaqizho commented May 12, 2026

Uh oh!

yjhjstz commented May 12, 2026

Uh oh!

leborchuk commented May 12, 2026

Uh oh!

tuhaihe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tuhaihe commented May 13, 2026

Uh oh!

tuhaihe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

leborchuk commented May 12, 2026 •

edited

Loading