Skip to content

kaby76/claudes-c-compiler

 
 

Repository files navigation

CCC β€” Claude's C Compiler

A C compiler written entirely from scratch in Rust, targeting x86-64, i686, AArch64, and RISC-V 64. Zero compiler-specific dependencies β€” the frontend, SSA-based IR, optimizer, code generator, peephole optimizers, assembler, linker, and DWARF debug info generation are all implemented from scratch. Claude's C Compiler produces ELF executables without any external toolchain.

Note: With the exception of this one paragraph that was written by a human, 100% of the code and documentation in this repository was written by Claude Opus 4.6. A human guided some of this process by writing test cases that Claude was told to pass, but never interactively pair-programmed with Claude to debug or to provide feedback on code quality. As a result, I do not recommend you use this code! None of it has been validated for correctness. Claude wrote this exclusively on a Linux host; it probably will not work on MacOS/Windows β€” neither I nor Claude have tried. The docs may be wrong and make claims that are false. See our blog post for more detail.

Prerequisites

  • Rust (stable, 2021 edition) β€” install via rustup
  • Linux host β€” the compiler targets Linux ELF executables and relies on Linux system headers / C runtime libraries (glibc or musl) being installed on the host
  • For cross-compilation targets (ARM, RISC-V, i686), the corresponding cross-compilation sysroots should be installed (e.g., aarch64-linux-gnu-gcc, riscv64-linux-gnu-gcc)

Building

cargo build --release

This produces five binaries in target/release/, all compiled from the same source. The target architecture is selected by the binary name at runtime:

Binary Target
ccc x86-64 (default)
ccc-x86 x86-64
ccc-arm AArch64
ccc-riscv RISC-V 64
ccc-i686 i686 (32-bit x86)

Quick Start

Compile and run a simple C program:

# Write a test program
cat > hello.c << 'EOF'
#include <stdio.h>
int main(void) {
    printf("Hello from CCC!\n");
    return 0;
}
EOF

# Compile and run (x86-64)
./target/release/ccc -o hello hello.c
./hello

# Cross-compile for AArch64 and run under QEMU
./target/release/ccc-arm -o hello-arm hello.c
qemu-aarch64 -L /usr/aarch64-linux-gnu ./hello-arm

CCC works as a drop-in GCC replacement. Point your build system at it:

# Build a project with make
make CC=/path/to/ccc-x86

# Build a project with CMake
cmake -DCMAKE_C_COMPILER=/path/to/ccc-x86 ..

# Build a project with configure scripts
./configure CC=/path/to/ccc-x86

Usage

# Compile and link
ccc -o output input.c                # x86-64
ccc-arm -o output input.c            # AArch64
ccc-riscv -o output input.c          # RISC-V 64
ccc-i686 -o output input.c           # i686

# GCC-compatible flags
ccc -S input.c                       # Emit assembly
ccc -c input.c                       # Compile to object file
ccc -E input.c                       # Preprocess only
ccc -O2 -o output input.c            # Optimize (accepts -O0 through -O3, -Os, -Oz)
ccc -g -o output input.c             # DWARF debug info
ccc -DFOO=1 -Iinclude/ input.c       # Define macros, add include paths
ccc -Werror -Wall input.c            # Warning control
ccc -fPIC -shared -o lib.so lib.c    # Position-independent code
ccc -x c -E -                        # Read from stdin

# Build system integration (reports as GCC 14.2.0 for compatibility)
ccc -dumpmachine     # x86_64-linux-gnu / aarch64-linux-gnu / riscv64-linux-gnu / i686-linux-gnu
ccc -dumpversion     # 14

The compiler accepts most GCC flags. Unrecognized flags (e.g., architecture- specific -m flags, unknown -f flags) are silently ignored so ccc can serve as a drop-in GCC replacement in build systems.

Assembler and Linker Modes

By default, the compiler uses its builtin assembler and linker for all four architectures. No external toolchain is required. You can verify this with --version, which shows Backend: standalone when using the builtin tools.

To build with optional GCC fallback support (e.g., for debugging), enable Cargo features at compile time:

# Build with GCC assembler and linker fallback
cargo build --release --features gcc_assembler,gcc_linker

# Build with GCC fallback for -m16 boot code only
cargo build --release --features gcc_m16
Feature Description
gcc_assembler Use GCC as the assembler instead of the builtin
gcc_linker Use GCC as the linker instead of the builtin
gcc_m16 Use GCC for -m16 (16-bit real mode boot code)

When compiled with GCC fallback features enabled, --version shows which components use GCC (e.g., Backend: gcc_assembler, gcc_linker).

Status

The compiler can build real-world C codebases across all four architectures, including the Linux kernel. Projects that compile and pass their test suites include PostgreSQL (all 237 regression tests), SQLite, QuickJS, zlib, Lua, libsodium, libpng, jq, libjpeg-turbo, mbedTLS, libuv, Redis, libffi, musl, TCC, and DOOM β€” all using the fully standalone assembler and linker with no external toolchain. Over 150 additional projects have also been built successfully, including FFmpeg (all 7331 FATE checkasm tests on x86-64 and AArch64), GNU coreutils, Busybox, CPython, QEMU, and LuaJIT.

Known Limitations

  • Optimization levels: All levels (-O0 through -O3, -Os, -Oz) run the same optimization pipeline. Separate tiers will be added as the compiler matures.
  • Long double: x86 80-bit extended precision is supported via x87 FPU instructions. On ARM/RISC-V, long double is IEEE binary128 via compiler-rt/libgcc soft-float libcalls.
  • Complex numbers: _Complex arithmetic has some edge-case failures.
  • GNU extensions: Partial __attribute__ support. NEON intrinsics are partially implemented (core 128-bit operations work).
  • Atomics: _Atomic is parsed but treated as the underlying type (the qualifier is not tracked through the type system).

C Dialect

CCC does not implement any ISO C standard. Its accepted language is the dialect GCC accepts when building the Linux kernel: C11 as a baseline, a handful of C23 features that the kernel already uses, and the full set of GCC extensions the kernel depends on. The complete formal grammar is in GRAMMAR.md.

What determines the feature set

The target is not a language specification but a codebase. Every syntactic feature present in the parser appears in Linux kernel source or in the headers it includes. Every C23 feature that is absent (see below) is absent because the kernel does not use it. This is also why the parser pre-seeds a large set of typedef names (pid_t, uid_t, __u32, …) β€” the compiler does not process system headers, so it must already know these names are types in order to parse kernel code correctly.

Feature set by origin

Feature Origin Notes
K&R function definitions C89 Old driver and arch code still has them
Implicit int C89 Pre-C99 headers and arch code
_Bool, _Complex, restrict, inline, designated initializers, compound literals, VLA array-parameter qualifiers C99
_Static_assert, _Alignas, _Alignof, _Generic, _Thread_local, _Atomic, _Noreturn C11 _Atomic is parsed but the qualifier is discarded
Single-argument _Static_assert(expr) C23 Kernel uses this via macro
typeof as a keyword C23 Kernel has used GCC __typeof__ since before standardisation
Declarations after labels / case C23 Required by scoped_guard() and modern kernel patterns
__attribute__((...)) β€” packed, aligned, section, visibility, alias, weak, constructor, destructor, noreturn, naked, mode, vector_size, cleanup, symver, transparent_union, fastcall, … GCC extension Exhaustive because the kernel uses all of these
__seg_gs / __seg_fs GCC extension x86 per-CPU variable access via segment registers
__int128 / __uint128_t GCC extension Kernel crypto and arithmetic; 64-bit targets only
__auto_type GCC extension Used in kernel min()/max() macros
__label__, &&label, goto *expr GCC extension Local labels, label-as-value, computed goto
case lo ... hi:, [lo ... hi] = GCC extension Range cases and range designators
cond ?: else GCC extension Omitted-middle ternary
({ ... }) statement expressions GCC extension Used throughout kernel macros
__real__ / __imag__ GCC extension Complex-number part extraction
__builtin_va_arg, __builtin_types_compatible_p GCC extension
#pragma pack, #pragma GCC visibility GCC extension Device-driver and ABI structs

Absent C23 features

constexpr, nullptr, lowercase bool/true/false, lowercase static_assert/thread_local/alignas/alignof, _BitInt(N), _Decimal32/64/128, [[attributes]] standard attribute syntax, #embed. None of these appear in the kernel's GCC C11 dialect.

Testing

The compiler has two kinds of tests:

Unit tests (in-source #[test] functions for individual passes and modules):

cargo test --release

Integration tests (end-to-end compilation tests in tests/). Each test is a directory containing a main.c source file and expected output files:

tests/
  some-test-name/
    main.c              # C source to compile
    expected.stdout     # Expected stdout (if any)
    expected.ret        # Expected exit code (if any)
    expected.skip.arm   # Skip marker for specific architectures (optional)

Tests are run by compiling main.c with ccc, executing the resulting binary, and comparing stdout and the exit code against the expected files.

Environment Variables

Variable Purpose
CCC_TIME_PHASES Print per-phase compilation timing to stderr
CCC_TIME_PASSES Print per-pass optimization timing and change counts to stderr
CCC_DISABLE_PASSES Disable specific optimization passes (comma-separated, or all)
CCC_KEEP_ASM Preserve intermediate .s files next to output
CCC_ASM_DEBUG Dump preprocessed assembly to /tmp/asm_debug_<name>.s

Project Organization

src/                Compiler source code (Rust)
  frontend/         C source -> typed AST (preprocessor, lexer, parser, sema)
  ir/               Target-independent SSA IR (lowering, mem2reg)
  passes/           SSA optimization passes (15 passes + shared loop analysis)
  backend/          IR -> assembly -> machine code -> ELF (4 architectures)
  common/           Shared types, symbol table, diagnostics
  driver/           CLI parsing, pipeline orchestration

include/            Bundled C headers (x86 SIMD: SSE through AVX-512, AES-NI, FMA, SHA, BMI2; ARM NEON)
tests/              Compiler tests (each test is a directory with main.c and expected output)
ideas/              Future work proposals and improvement notes

Each src/ subdirectory has its own README.md with detailed design documentation. For the full architecture, compilation pipeline data flow, and key design decisions, see DESIGN_DOC.md.

About

Claude Opus 4.6 wrote a dependency-free C compiler in Rust, with backends targeting x86 (64- and 32-bit), ARM, and RISC-V, capable of compiling a booting Linux kernel.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Rust 96.2%
  • C 3.8%