You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(The below stuff is produced by AI, but the main idea is mine)
If you've been following the open-source community lately, you've probably heard the frustration firsthand. Just days ago, Peter Steinberger, the creator of OpenClaw, publicly banned an AI agent's GitHub account after it submitted an untested, low-effort pull request to his project — a project maintained by serious, experienced engineers. He's not alone. Maintainers across the ecosystem are reporting the same thing: a rising flood of AI-generated PRs that are half-baked at best, and actively disrespectful of maintainers' time at worst. Some projects are quietly considering moving their repositories off GitHub entirely. The only tools available right now are blunt ones: close, ignore, ban. That's not a solution. It's just noise management.
But here's a thought: what if we didn't fight this tide, but redirected it?
The Idea: Shadow Repository Project
The basic concept is simple. When a maintainer receives a PR that feels like "too noisy to merge, too interesting to just throw away," instead of closing it cold, they can route it to a Shadow Repository — a sandboxed mirror of their project, specifically designed to absorb and evaluate experimental contributions.
Inside the Shadow Repository:
A dedicated Test-Writer AI reads the original issue description and writes unit tests independently — without ever looking at the incoming patch. This keeps the tests honest, and prevents the system from gaming its own evaluation.
A separate Patch-Evaluation AI iterates silently on the incoming contributions, using those independently-written tests as its benchmark. No score is assigned at this stage. The AI simply works — refining, evaluating, discarding — on its own.
Only when the AI believes a patch is genuinely promising does it surface a hint to the actual maintainer: "Hey, something in your Shadow repository might be worth a look."
The maintainer then reviews the patch and assigns a score from -10 to +10. That human judgment is the only real reward signal in the entire system.
That score — whether positive or negative — feeds directly back into the training pipeline.(More aggresively, I would suggest GitHub PAY to encourage those senior GitHub repository maintainers to review them).
From the maintainer's perspective, this changes almost nothing about their workflow. They just get a new button: "Send to Shadow." The chaos gets quietly absorbed somewhere else, and only the good stuff bubbles back up.
flowchart TD
A1["🤖 AI Agent PR"] --> S
A2["👤 Human Contributor PR"] --> S
A3["💡 VS Code 'Send to Shadow' Button<br/>(idea from non-developer)"] --> S
S["📦 Shadow Repository<br/>(sandboxed mirror)"]
S --> TW["✍️ Test-Writer AI<br/>Writes unit tests from issue description<br/>(never sees the patch)"]
TW --> PA["⚙️ Patch-Evaluation AI<br/>Iterates silently using those tests<br/>No score assigned yet"]
PA -->|"Not promising yet"| RETRY["🔄 Keep Iterating"]
RETRY --> PA
PA -->|"AI believes it's ready"| HINT["💬 LLM hints maintainer:<br/>'Something looks promising'"]
HINT --> HUMAN["👨💻 Human Maintainer<br/>reviews the patch"]
HUMAN -->|"Score: +1 to +10 ✅ — Merged"| MAIN["🏠 Main Repository"]
HUMAN -->|"Score: -10 to 0 ❌ — Rejected"| LOG["🗃️ Logged as Negative Training Signal"]
MAIN --> TRAIN["🧠 Microsoft LLM<br/>Training Pipeline"]
LOG --> TRAIN
Loading
Why This Could Be Bigger Than It Looks
The structure of this idea is inspired — loosely — by how AlphaZero works. AlphaZero doesn't try to manually engineer every intermediate step of a chess game. It just plays, gets a result (win or lose), and learns from that signal at massive scale. The process takes care of itself, as long as the feedback is honest and the volume is high enough.
The Shadow Repository works the same way. We don't need every incoming PR to be good. We don't need contributors to know what they're doing. We just need honest human judgment at the end of the pipeline, and enough volume flowing through it. Over millions of such interactions, across thousands of repositories, the model learns not just what "correct code" looks like — but what experienced human engineers actually find valuable. That's a much harder and more useful thing to learn.
And Microsoft/GitHub is arguably the only organization in the world positioned to build this, because it simultaneously controls the platform (GitHub), the tooling (Copilot, VS Code), and the research infrastructure to close the loop.
If something like this were built into VS Code — say, a lightweight "I have a feature idea but don't know how to implement it, send to Shadow" button — it could also serve as an on-ramp for non-developers to meaningfully participate in open-source, without ever polluting the main branch of serious projects.
What I'm Not Asking For
I'm not asking GitHub to solve AI spam overnight. I know that's hard. I'm also not proposing a fully designed system here — I'm sure there are engineering challenges I'm not seeing.
What I'm asking is: is anyone on the GitHub or Copilot team thinking about this angle? The idea that the noise itself could become the signal — that feels worth at least a conversation.
Happy to elaborate on any part of this. Thanks for reading.
GeneralGeneral topics and discussions that don't fit into other categories, but are related to GitHubWelcome 🎉Used to greet and highlight first-time discussion participants. Welcome to the community!
1 participant
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
(The below stuff is produced by AI, but the main idea is mine)
If you've been following the open-source community lately, you've probably heard the frustration firsthand. Just days ago, Peter Steinberger, the creator of OpenClaw, publicly banned an AI agent's GitHub account after it submitted an untested, low-effort pull request to his project — a project maintained by serious, experienced engineers. He's not alone. Maintainers across the ecosystem are reporting the same thing: a rising flood of AI-generated PRs that are half-baked at best, and actively disrespectful of maintainers' time at worst. Some projects are quietly considering moving their repositories off GitHub entirely. The only tools available right now are blunt ones: close, ignore, ban. That's not a solution. It's just noise management.
But here's a thought: what if we didn't fight this tide, but redirected it?
The Idea: Shadow Repository Project
The basic concept is simple. When a maintainer receives a PR that feels like "too noisy to merge, too interesting to just throw away," instead of closing it cold, they can route it to a Shadow Repository — a sandboxed mirror of their project, specifically designed to absorb and evaluate experimental contributions.
Inside the Shadow Repository:
From the maintainer's perspective, this changes almost nothing about their workflow. They just get a new button: "Send to Shadow." The chaos gets quietly absorbed somewhere else, and only the good stuff bubbles back up.
flowchart TD A1["🤖 AI Agent PR"] --> S A2["👤 Human Contributor PR"] --> S A3["💡 VS Code 'Send to Shadow' Button<br/>(idea from non-developer)"] --> S S["📦 Shadow Repository<br/>(sandboxed mirror)"] S --> TW["✍️ Test-Writer AI<br/>Writes unit tests from issue description<br/>(never sees the patch)"] TW --> PA["⚙️ Patch-Evaluation AI<br/>Iterates silently using those tests<br/>No score assigned yet"] PA -->|"Not promising yet"| RETRY["🔄 Keep Iterating"] RETRY --> PA PA -->|"AI believes it's ready"| HINT["💬 LLM hints maintainer:<br/>'Something looks promising'"] HINT --> HUMAN["👨💻 Human Maintainer<br/>reviews the patch"] HUMAN -->|"Score: +1 to +10 ✅ — Merged"| MAIN["🏠 Main Repository"] HUMAN -->|"Score: -10 to 0 ❌ — Rejected"| LOG["🗃️ Logged as Negative Training Signal"] MAIN --> TRAIN["🧠 Microsoft LLM<br/>Training Pipeline"] LOG --> TRAINWhy This Could Be Bigger Than It Looks
The structure of this idea is inspired — loosely — by how AlphaZero works. AlphaZero doesn't try to manually engineer every intermediate step of a chess game. It just plays, gets a result (win or lose), and learns from that signal at massive scale. The process takes care of itself, as long as the feedback is honest and the volume is high enough.
The Shadow Repository works the same way. We don't need every incoming PR to be good. We don't need contributors to know what they're doing. We just need honest human judgment at the end of the pipeline, and enough volume flowing through it. Over millions of such interactions, across thousands of repositories, the model learns not just what "correct code" looks like — but what experienced human engineers actually find valuable. That's a much harder and more useful thing to learn.
And Microsoft/GitHub is arguably the only organization in the world positioned to build this, because it simultaneously controls the platform (GitHub), the tooling (Copilot, VS Code), and the research infrastructure to close the loop.
If something like this were built into VS Code — say, a lightweight "I have a feature idea but don't know how to implement it, send to Shadow" button — it could also serve as an on-ramp for non-developers to meaningfully participate in open-source, without ever polluting the main branch of serious projects.
What I'm Not Asking For
I'm not asking GitHub to solve AI spam overnight. I know that's hard. I'm also not proposing a fully designed system here — I'm sure there are engineering challenges I'm not seeing.
What I'm asking is: is anyone on the GitHub or Copilot team thinking about this angle? The idea that the noise itself could become the signal — that feels worth at least a conversation.
Happy to elaborate on any part of this. Thanks for reading.
Beta Was this translation helpful? Give feedback.
All reactions