I'm a PhD student at the University of
North Carolina at Chapel Hill in the Department of
Computer Science.
I'm currently working with Prof. Gedas
Bertasius on video understanding and AI for
sports. I build large-scale video benchmarks and datasets
— BASKET, ExAct, and SVI-Bench — that push
models from fine-grained skill recognition toward
higher-level causal and strategic reasoning.
Previously, I graduated from UNC in May 2023 with a B.S. in Computer
Science and a B.A. in Mathematics, and received
M.S. in Computer Science in May 2025. I worked with Prof. Roni
Sengupta on computer vision during my undergraduate
studies. I was also a research assistant at the Zylka lab, developing
computer vision models for spontaneous pain measurement in
mice.
I'm interested in Computer Vision,
Video Understanding, Video Reasoning, and
AI for Sports. My current focus is on video reasoning
for sports — building large-scale benchmarks that move
models beyond fine-grained skill recognition toward causal,
strategic, and agentic reasoning about complex human actions.
We introduce SVI-Bench, a dynamic microworld for strategic video intelligence built on team sports.
It comprises ~35K hours of broadcast video, 15M annotated actions, and aligned commentary and
statistics across basketball, soccer, and hockey. Spanning nine tasks across four pillars—dynamic
scene understanding, causal reasoning, strategic simulation, and agentic synthesis—it reveals a
sharp performance drop at higher cognitive levels: top models reach ~73% on action-based questions
but only 5% on agentic tasks that require autonomously gathering evidence across 1.8M clips.
We introduce ExAct, a video-language benchmark for expert-level analysis of skilled human actions.
It contains over 3,500 expert-curated video QA pairs across domains like sports, cooking, and music.
Our benchmark reveals a significant performance gap between state-of-the-art VLMs and human experts,
highlighting the need for models with a more nuanced understanding of complex human skills.
We present BASKET, a large-scale basketball video
dataset
for fine-grained skill estimation. BASKET contains more
than 4,400 hours of video capturing 32,232 basketball
players from all over the world. We benchmark multiple
SOTA video recognition models and reveal that these
models
struggle to achieve good results on our benchmark.
Neural Motion Transfer serves as an effective data
augmentation technique for PPG signal estimation from
facial videos. We devise the best strategy to augment
publicly available datasets with motion augmentation,
improving up to 75% over SOTA techniques on five benchmark
datasets.
Introducing a multi-reliability and multi-level feature
augmentation framework for semi-supervised semantic
segmentation, effectively utilizing labeled and unlabeled
images and improving segmentation performance on benchmark
datasets.
Misc
I am enthusiastic in helping other students succeed in
computer science. I have shared my knowledge and support
students' learning journey in the following course:
University of North Carolina at Chapel Hill,
Undergraduate Learning Assistant: