K-Wu

Follow

Kun Wu K-Wu

Follow

Making the Stack Data-Efficient, Composable & Scalable!⚓@NVIDIA Backend Compiler Engineer⚓PhD (@illinois-impact)⚓BEng (Tsinghua)

196 followers · 289 following

Achievements

Achievements

Highlights

Developer Program Member
Pro

Organizations

Pinned Loading

pytorch-direct_dgl pytorch-direct_dgl Public

Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture (accepted by PVLDB)

44 4
hst10/pylog hst10/pylog Public

PyLog: An Algorithm-Centric FPGA Programming and Synthesis Flow

Python 67 14
FlashTrain FlashTrain Public

An Activation Offloading Framework to SSDs for Faster Large Language Model Training

Python 3
cwpearson/tempi cwpearson/tempi Public

Topology Experiments for MPI

C++ 10 4
intrasm_engine intrasm_engine Public

Enhancing CUDA Intra-Streaming-Multiprocessor Parallelism for Large Language Models via Fine-Grained Task Graph

Jupyter Notebook
CV-tsinghua-template CV-tsinghua-template Public template

All hail, Thy Highest University (THU)

TeX 37 12