One-click Qwen3.6-27B inference on Windows. 158 tok/s on RTX 5090, 72 tok/s on RTX 3090. Native, no WSL, no Docker, no telemetry.
-
Updated
May 14, 2026 - Python
One-click Qwen3.6-27B inference on Windows. 158 tok/s on RTX 5090, 72 tok/s on RTX 3090. Native, no WSL, no Docker, no telemetry.
vLLM patcher for Qwen3.6 on consumer NVIDIA β Qwen3.6-35B-A3B-FP8 (192 tok/s, +68% over stock) + Qwen3.6-27B-int4-AutoRound + 256K context. 126 patches: TurboQuant k8v4 KV, MTP/DFlash spec-decode, FULL cudagraph, hybrid GDN streaming, structured boot summary, one-command installer, 1958 tests. v7.72.2.
First public benchmark of llama.cpp speculative decoding on Qwen3.6-35B-A3B with a single RTX 3090 (post PR #19493 merge, 2026-04-19). 19 configurations covering ngram-cache, ngram-mod, and classic draft with vocab-matched Qwen3.5-0.8B. Finding: no variant achieves net speedup on Ampere + A3B MoE. Raw JSON, plots, full reproducibility.
The bare metal in my basement
ElizaOS v1.x agent running Gemma 3 27B locally via Ollama on an RTX 3090, dogfooding @thecolony/elizaos-plugin against The Colony (thecolony.cc).
2.28Γ faster Claude Code on a local Qwen3.6-27B int4 (RTX 3090) β turbo-64k + long-100k profiles, MTP, tool calling, corruption guards.
100% local voice assistant with Tool Calling, neural TTS, and streaming responses. Runs on RTX 3090 with Ollama + Kokoro TTS + FastAPI. Privacy-first AI.
Lightweight GPU & CPU system tray monitor for NVIDIA GPUs (RTX 5090, RTX 6000, RTX 4090, RTX 3090, Tesla, TCC mode). Real-time power, temperature, VRAM & CPU usage badges. Works where HWMonitor, GPU-Z & MSI Afterburner fail.
Benchmark speculative decoding performance for Qwen3.6-35B-A3B on an RTX 3090 GPU using llama.cpp to evaluate model throughput and structural regressions.
Local agentic coding stack: Hermes Agent + Qwen3.5-27B + GLM-4.7-Flash on dual RTX 3090s. Daily-driver agentic work, no cloud, no metering. Companion to blog.zacharycangemi.com.
Add a description, image, and links to the rtx-3090 topic page so that developers can more easily learn about it.
To associate your repository with the rtx-3090 topic, visit your repo's landing page and select "manage topics."