KV-CPU Linux Control Plane (Reference Driver)

LLM inference is becoming memory-orchestration bound, not compute bound. As context windows and batch sizes expand, the bottleneck shifts from raw FLOPS to the efficient movement and placement of the KV-cache across memory tiers.

This repository provides a reference Linux kernel driver that introduces a semantic memory control plane for a hypothetical KV-CPU device. It demonstrates how high-level transformer inference semantics can be exposed to hardware to enable intelligent, autonomous memory orchestration.

The Problem

KV Cache Growth: Modern LLMs require massive KV caches that often exceed available GPU HBM, forcing offloading to slower memory tiers.
Semantic Blindness: Existing OS memory management (LRU, swapping) is "semantic-blind" to inference workloads. The kernel does not understand autoregressive decode steps, prefix sharing, or the predictable future access patterns of KV blocks, leading to inefficient eviction and high-latency page faults on the hot path.

The Key Idea: Semantic Signaling

The KV-CPU architecture proposes a shift where the inference runtime (e.g., vLLM, TensorRT-LLM) signals high-level intent to the hardware via the Linux kernel:

Decode Step Synchronization: Informs the hardware of the current global iteration ($t$), allowing silicon-level logic to calculate block "freshness."
Semantic Lifecycle Hints: Explicitly identifies blocks as HOT (protected), EVICTABLE (candidate for offload), or PREFETCHABLE (needed in future steps).
Hardware-Level Orchestration: Moves the eviction policy from reactive software to an autonomous hardware controller (HEPC) that operates directly on the memory data plane.

What This Repo Implements

This repository is a control-plane prototype and architectural reference. It implements:

Kernel Driver Skeleton: A standard Linux pci_driver with character device registration (/dev/kvcpu0).
IOCTL Control Plane: A standardized UAPI for signaling decode steps and block management.
MMIO Abstraction: A clean register-access layer with a robust Mock Mode for testing without physical hardware.
Userspace Utility: A reference tool (kvctl) to demonstrate interaction with the driver.
vLLM Adapter Sketch: A lightweight Python wrapper under integrations/vllm/ that can issue STEP_ADVANCE on each decode step against /dev/kvcpu0.
Hardware Collateral: Supporting RTL, MMIO, thermal, diagrams, and specification artifacts under hardware/.
FPGA Emulation Scaffold: A Phase 1 emulation plan and starter integration structure under fpga/.

Implementation Status

The following table clarifies the scope of this reference implementation versus the conceptual hardware architecture.

Feature	Status
Kernel Driver	Implemented (Reference skeleton & lifecycle)
IOCTL Control Plane	Implemented (UAPI and Dispatcher)
MMIO Interface	Stub / Mock (Fake register writes)
DMA Engines	Not Implemented (Signaling only)
HEPC Eviction Logic	Conceptual (Silicon-level logic)
RTBD Directory	Conceptual (Hardware tag storage)
madvise Integration	Not Implemented
io_uring Integration	Not Implemented
NUMA / HMAT	Not Implemented

Architecture Diagram

  [ User Space ]          [ Kernel Space ]          [ Hardware Space ]
  (LLM Runtime)           (kv_cpu Driver)           (KV-CPU Silicon)
       |                        |                         |
       |--- ioctl(STEP) ------->|                         |
       |                        |--- MMIO Write --------->| [ HEPC Scorer ]
       |                        |                         |       |
       |--- ioctl(HOT/EVICT) -->|                         | [ DMA Engine ]
       |                        |--- MMIO Doorbell ------>|       |
       |                        |                         |       v
       |                        |                         | [ Memory Tiers ]

Example Flow: Semantic Lifecycle

STEP: Runtime signals STEP 120 to the driver. Hardware HEPC re-evaluates all cached KV blocks based on their proximity to step 120.
HOT: Runtime identifies a specific prefix range as HOT. Hardware boosts its priority to prevent eviction.
PREFETCH: Runtime hints that a block will be needed at STEP 256. Hardware DMA autonomously moves the block from DRAM to LPDDR.
EVICT: Runtime marks a completed request's cache as EVICTABLE. Hardware immediately reclaims the space.

In this reference implementation, these lifecycle operations are modeled as MMIO control signals only. No DMA submission, page pinning, or physical data movement is performed by the driver.

Build & Run

1. Build the Driver and Tools

make

2. Load the Module (Mock Mode)

# Load without physical hardware requirement
sudo insmod kv_cpu.ko mock=1

3. Use the Reference Tool

# Signal a decode step
./tools/kvctl step 128

# Mark a range as HOT
./tools/kvctl hot 0x7f001000 4096

4. Exercise the Python vLLM-Side Handshake

# Issue the same STEP_ADVANCE ioctl from Python
python3 integrations/vllm/kv_cpu_allocator.py 128

The adapter in integrations/vllm/ is intentionally small: it wraps a vLLM-style allocator, delegates real block management to vLLM, and adds an explicit advance_decode_step() hook that can be called once per decode iteration in mock mode or against future hardware.

Disclaimer

This repository demonstrates a control-plane model for a KV-CPU device and is not a production-ready driver. It is intended for systems architects and kernel engineers to evaluate the integration semantics of AI memory accelerators. No actual memory movement or performance simulation is performed by this driver.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
docs		docs
drivers/misc/kv_cpu		drivers/misc/kv_cpu
fpga		fpga
hardware		hardware
include/uapi/linux		include/uapi/linux
integrations/vllm		integrations/vllm
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
.mailmap		.mailmap
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASE_NOTES_v0.1.0.md		RELEASE_NOTES_v0.1.0.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KV-CPU Linux Control Plane (Reference Driver)

The Problem

The Key Idea: Semantic Signaling

What This Repo Implements

Implementation Status

Architecture Diagram

Example Flow: Semantic Lifecycle

Build & Run

1. Build the Driver and Tools

2. Load the Module (Mock Mode)

3. Use the Reference Tool

4. Exercise the Python vLLM-Side Handshake

Disclaimer

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KV-CPU Linux Control Plane (Reference Driver)

The Problem

The Key Idea: Semantic Signaling

What This Repo Implements

Implementation Status

Architecture Diagram

Example Flow: Semantic Lifecycle

Build & Run

1. Build the Driver and Tools

2. Load the Module (Mock Mode)

3. Use the Reference Tool

4. Exercise the Python vLLM-Side Handshake

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages