Making the Stack Data-Efficient, Composable & Scalable!⚓@NVIDIA Backend Compiler Engineer⚓PhD (@illinois-impact)⚓BEng (Tsinghua)
- Santa Clara
-
21:51
(UTC -07:00) - kunwu.me
- https://orcid.org/0000-0002-0149-1409
- in/kun-wu-069a14105
- https://go.kunwu.me/wakatime
Highlights
Pinned Loading
-
pytorch-direct_dgl
pytorch-direct_dgl PublicLarge Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture (accepted by PVLDB)
-
hst10/pylog
hst10/pylog PublicPyLog: An Algorithm-Centric FPGA Programming and Synthesis Flow
-
FlashTrain
FlashTrain PublicAn Activation Offloading Framework to SSDs for Faster Large Language Model Training
Python 3
-
-
intrasm_engine
intrasm_engine PublicEnhancing CUDA Intra-Streaming-Multiprocessor Parallelism for Large Language Models via Fine-Grained Task Graph
Jupyter Notebook
-
CV-tsinghua-template
CV-tsinghua-template Public templateAll hail, Thy Highest University (THU)
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.