Skip to content
View LessUp's full-sized avatar
  • shenzhen
  • 04:22 (UTC +08:00)

Block or report LessUp

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
LessUp/README.md
Header
Typing SVG

聚焦 AI 基础设施、CUDA Kernel 与高性能系统工程

🔬 Focus: AI Infrastructure · CUDA Kernels · LLM Inference · HPC Systems
🌱 Currently: Building high-throughput inference pipelines and GPU-first systems
🤝 Open to: AI infrastructure, performance engineering, research collaboration, and open-source collaboration


Followers   Stars   Views



Profile  Selected Work  Background  Stack  Signals  Connect



👨‍💻 About Me / 关于我

Top Languages

I build AI infrastructure and GPU-first high-performance systems with C++/CUDA, Python, and Go. 主要聚焦 AI 基础设施、GPU 算子优化与高性能系统工程实践。

  • 🔥 GPU Kernel Engineering — CUDA/Triton kernels for FlashAttention, GEMM, quantization, and memory-aware operator design
  • 🧠 AI Inference Systems — lightweight LLM runtimes, KV Cache, W8A16/FP8 quantization, and inference path optimization
  • High-Performance Computing — simulation, rendering, and image-processing pipelines tuned for throughput and scalability
  • 🌐 Real-time Systems — RTC signaling, streaming applications, and digital human platforms with system-level integration

Currently / 当前关注: inference acceleration, kernel fusion, and end-to-end GPU system design.



🚀 Selected Work / 项目全景

Featured Projects / 核心项目 — Start here for the quickest overview of my work in CUDA kernels, inference systems, HPC simulation, and production-facing applications.
如果你想快速判断我的技术重心与代表作,建议先看下面 4 个项目。
Best entry points for collaboration, hiring conversations, and technical review.

Flagship CUDA kernel library covering GEMM, FlashAttention, Conv2D, SpMV, and FP8 quantization.

Compact LLM inference engine focused on W8A16 quantization, KV Cache, and practical runtime design.

Million-particle GPU simulation exploring direct N², Barnes-Hut, and CUDA-OpenGL interop.

3D digital human platform combining real-time rendering, interaction, and behavior control.

⚡ GPU Kernel Optimization / GPU 算子优化

Modern C++17/CUDA kernel library for elementwise ops, GEMM, FlashAttention, Conv2D, SpMV, and FP8 quantization.

C++17 CUDA Tensor Core

Stepwise CUDA SGEMM optimization from naive loops to Tensor Core kernels, reaching 40% of cuBLAS.

CUDA WMMA Roofline

Triton fusion kernels for RMSNorm+RoPE, Gated MLP, and FP8 GEMM with auto-tuning.

Triton FP8 Python

CUDA kernel playground for FlashAttention, FP16/INT8 GEMM, and Tensor Core inference primitives.

CUDA PyTorch FlashAttention

🧠 AI Inference Engines / AI 推理引擎

Lightweight LLM runtime with W8A16 quantization, KV Cache, and practical multi-sampling support.

CUDA C++17 INT8

Educational CUDA inference engine with seven GEMM optimization stages, reaching 72% of cuBLAS.

CUDA C++17 FP16

WebGPU micro inference engine implementing Conv2d, kernel fusion, Im2Col, and MNIST classification.

WebGPU TypeScript WGSL

Real-time multi-model vision stack combining YOLO, DETR, OWL-ViT, BLIP, and WebSocket streaming.

FastAPI YOLOv8 Docker

🎮 GPU Computing & Simulation / GPU 计算与仿真

CUDA ray tracer featuring Phong shading, path tracing, BVH acceleration, and warp-divergence tuning.

CUDA Path Tracing BVH

Million-particle CUDA simulation covering direct N², Barnes-Hut, spatial hashing, and OpenGL interop.

CUDA OpenGL Barnes-Hut

Real-time WebGPU fluid simulation with 10K particles, compute shaders, and visual trail effects.

WebGPU TypeScript WGSL

CUDA image-processing library covering convolution, morphology, geometric transforms, and pipeline stages.

CUDA C++17 Image Processing

DAG-based heterogeneous image pipeline with multi-stream scheduling and pinned-memory pools.

CUDA C++17 DAG

🌐 Applications / 应用项目

3D digital human platform integrating real-time rendering, voice interaction, behavior control, and emotion FSM.

React Three.js TypeScript

Minimal WebRTC demo with Go signaling, room management, and peer-to-peer media delivery.

Go WebRTC Docker

End-to-end encrypted note sync with AES-256, mnemonic recovery, and real-time collaboration.

React Express Socket.IO

Browser-based memory training app with N-back, spaced reinforcement, adaptive difficulty, and PWA support.

JavaScript Tailwind PWA


🎓 Background & Experience / 教育与经历

🎓 Education

Xidian University Xidian University

Background in communications engineering. / 通信与信息工程相关背景

💼 Experience

Mindray Mindray · ZEGO ZEGO · BGI BGI

Engineering across medical imaging, RTC systems, and genomic-scale data workflows. / 覆盖医疗影像、实时音视频系统与基因数据工程。


🛠️ Tech Stack / 技术栈

Category Technologies
Languages Languages
AI & HPC AI   CUDA · Triton · cuBLAS · Tensor Core · WebGPU · Quantization
System & DevOps System   Inference pipelines · Performance tuning
Web & Frontend Web   Real-time apps · Visualization

📊 Signals & Activity / 数据概览

LessUp's GitHub stats   GitHub Streak

🏆 Highlights & More Stats / 高亮与更多数据

📈 Activity Graph / 活动图
GitHub Activity Graph

🧬 Visual Signature / 视觉标识
Snake animation

📫 Collaboration & Contact / 联系方式

Reach out if you're building AI infrastructure, inference acceleration, GPU systems, or performance-critical tooling.
欢迎联系我交流 AI 基础设施、推理加速、GPU 系统,以及对性能敏感的工程项目。
Open to technical collaboration, engineering roles, research discussions, and thoughtful open-source work.
Email   GitHub

Footer

Pinned Loading

  1. awesome-cursorrules-zh awesome-cursorrules-zh Public

    🇨🇳 Cursor AI 编辑器 .cursorrules 规则精选集合 | 132+ 规则 · 32 领域 · 双语站点

    JavaScript 176 23

  2. meta-human meta-human Public

    Browser-native 3D digital human engine with voice, vision & dialogue. Zero-config, offline-ready, production-grade AI avatar platform.

    TypeScript 16 6

  3. ⚡ GLM Coding Rush — 智谱编程助手一键抢购脚本 | A... ⚡ GLM Coding Rush — 智谱编程助手一键抢购脚本 | Auto-Purchase Userscript for GLM Coding | 自动解锁售罄 · 高速重试 · 定时触发 · 支付保护 · 中英双语面板 | Auto-unlock sold-out · High-speed retry · Scheduled trigger · Payment guard · Bilingual panel | Tampermonkey/Violentmonkey | 点击 Raw 安装 · Click Raw to install
    1
    // ==UserScript==
    2
    // @name         GLM Coding Rush - 智谱编程助手抢购脚本
    3
    // @namespace    https://gist.github.com/LessUp
    4
    // @version      1.1.0
    5
    // @description  智谱 GLM Coding 一键抢购脚本 — 自动解锁售罄按钮 / 高速重试引擎 / bizId 双重校验 / 错误弹窗自动恢复 / 支付弹窗保护 / 秒级定时触发 / 可拖拽浮动面板
  4. micos-2024 micos-2024 Public

    MICOS-2024: 端到端宏基因组综合分析平台 | End-to-end Metagenomic Intelligence and Comprehensive Omics Suite

    R 11 3

  5. cpp-high-performance-guide cpp-high-performance-guide Public

    High-performance C++ optimization guide with lock-free data structures, SIMD, and memory optimization examples

    C++ 6 1

  6. wiki-bioinfo wiki-bioinfo Public

    面向中文社区的生物信息学体系化知识库 | Systematic knowledge base for bioinformatics (Chinese)

    MDX 5 1