Personal Projects

HPC recommender — tail latency vs request rate

HPC-Accelerated Music Recommender

CUDA · cuBLAS · OpenBLAS · OpenMP · C++ · Python

Matrix-factorization recommender for ~120K users and ~5.6M tracks from the Last.fm LFM-1b dataset. Both training and inference are implemented as a CPU baseline and a GPU-accelerated path, with end-to-end benchmarks on training wall-clock and tail latency.