Short bio
My name is Ardian Umam (禹安銳), you can call me Ardian, or 安銳 (An-Rui) in Chinese. I am currently pursuing my Ph.D. in the Department of Electrical Engineering and Computer Science at National Yang Ming Chiao Tung University, Taiwan, working with Prof. Yen-Yu Lin from VLLab and Prof. Jen-Hui Chuang from Islab.
My reseach interests lie in (but not limited to) deep learning, computer vision, natural language processing, and multi-modal AI. Over the past 8 years, I have been working on various topics which involve 1D data (audio and languange), 2D data (image) and 3D data (point cloud and mesh). The tasks include audio quality estimation, optical character recognition, camera calibration, 3D point cloud augmentation, 3D segmentation, unsupervised/weakly supervised segmentation, multi-modal (vision-language) recognition, and large language models using RAG (Retrieval Augmented Generation). Additionally, I interned at Google with the ChromeOS Audio Team and at ITRI working on model compression.
Selected publications
- PartDistill: 3D Shape Part Segmentation by Vision-Language Model DistillationIn IEEE/CVF International Conference on Computer Vision (CVPR) , 2024
Education
- National Yang Ming Chiao Tung UniversityPhD in Computer Science, 2020 - NowJoin Vision and Learning Lab. Thesis: 3D recognition under low annotation costs
- National Chiao Tung UniversityMSc in Computer ScienceJoin Intelligent System Lab. Thesis: A light deep learning based method for bank serial number recognition
- Gadjah Mada UniversityBSc in Electrical EngineeringThesis: Adaptive-PID control system for dc motor speed
Work experience
- LecturerInstitut Teknologi Bandung, 2019 - NowFaculty member in School of Electrical Engineering and Informatics
- Graduate InternGoogle, Apr - Dec 2022Reduced audio quality estimation error for ChromeOS by 45% using a self-supervised learning approach. The method leverages large-scale unlabelled audio data from publicly available datasets and internal company datasets to improve the feature representations (audio encoder)
- AI EngineerComputer Vision Research Center - NCTU, 2018 - 2019Optimized deep learning models for depth map estimation (20 to 40 FPS) and object detection (5 to 20 FPS) on edge device (Jetson TX2) through architectural downscaling and TensorRT optimization
- Summer InternITRI (Industrial Technology Research Institute), Jul - Aug 2018Studied deep learning computational reduction techniques, e.g., network pruning, from recent papers
- Avionic EngineerLAPAN (National Institute of Aeronautics and Space), 2015 - 2016Developed a UAV telemetric monitoring system to track power consumption and UAV states
Award
- Doctoral Scholarship Award
- Best M.S. Student Award
- TOP1 Final Project Competition
- TOP1 in-class Kaggle Competition
- Siswa Teladan Putra 1 - Klaten
Community service
- Conference reviewer
- Journal reviewer