Education
-
Mar. 2010 - Feb. 2020
Ph.D. in Electrical Engineering and Computer Science
Seoul National University, Seoul, Korea
Advisor: Prof. Jaejin Lee
Thesis: High-Level Synthesis of OpenCL Kernels for FPGAs -
Mar. 2006 - Feb. 2010
B.S. in Computer Science and Engineering
Seoul National University, Seoul, Korea
Summa cum laude (GPA: 4.07/4.3)
Received the SNU College of Engineering Alumni Association's citation for academic excellence -
Mar. 2004 - Feb. 2006
Daegu Science High School
Work and Research Experiences
-
Oct. 2021 - present
Chief Executive Officer
Oct. 2020 - Oct. 2021
Chief Architect
Moreh, Seoul, Korea
- We have been developing a PyTorch/TensorFlow-compatible deep learning framework for large-scale clusters and clouds. It generates an intermediate representation at run time and applies a variety of compiler techniques to automatically parallelize and optimize the given workload. It decouples user applications from underlying hardware resources by adding a virtualization layer to enable efficient and flexible resource management. It supports non-NVIDIA accelerators such as AMD GPUs and NPUs.
- Based on the framework, we have been developing comprehensive AI cloud platform software for various levels of services from IaaS to AIaaS.
- Based on our capabilities in computing infrastructure, we also develop large AI models (LLMs and multimodal AIs), and computing-aware training and compression techniques.
-
Aug. 2016 - Sep. 2020
Chief Technology Officer
ManyCoreSoft, Seoul, Korea
- Led the development of iML, a GPU-accelerated machine learning framework for financial applications such as credit risk management. iML supports novel training algorithms to construct more complex and accurate forest models in a reasonable time. It was actually used to develop a new credit scoring system of Korea Credit Bureau in 2020.
- Designed and built several high-performance computing systems of companies, research institutes, and universities.
-
Mar. 2010 - Jul. 2016
Research Assistant
Center for Manycore Programming, Seoul National University
- Designed and implemented SOFF (SNU OpenCL Framework for FPGAs). SOFF is an open-source framework (= compiler + runtime system + board support package) to run OpenCL applications with data-parallel kernels on FPGAs. It is the first OpenCL framework that correctly compiles all applications in the SPEC ACCEL benchmark suite.
- Participated in the development of SnuRHAC. SnuRHAC is a CUDA runtime system that provides a single virtual GPU corresponding to multiple GPUs in different nodes. I designed the driver-level cluster unified memory, and developed the memory access pattern analyzer used in SnuRHAC.
- Maintained SnuCL for six years. SnuCL is an OpenCL framework that naturally extends the original OpenCL semantics to heterogeneous cluster environments. It allows applications to use all the accelerators in multiple nodes as if they were in a single node. Especially, I completely rewrote the source code of the runtime system (version 1.3.X) to support the OpenCL ICD standard and to solve many usability issues.
- Was the lead architect of the Chundoong supercomputer. Chundoong is the first GPU supercomputer in Korea (ranked #277 in the TOP500 list of November 2012). It used cost-efficient gaming GPUs and adopted a self-made water cooling system. I also developed a highly-optimized LINPACK benchmark implementation for Chundoong.
Honors and Awards
-
Nov. 2022
Innovators Under 35 Korea
MIT Technology Review
-
Aug. 2010 - Feb. 2015
Graduate Student Scholarship
Korea Foundation for Advanced Studies
-
Feb. 2010
19th Place
2010 ACM-ICPC World Finals, Harbin, China
-
Mar. 2007
14th Place
2007 ACM-ICPC World Finals, Tokyo, Japan
-
Mar. 2006 - Feb. 2010
Presidential Science Scholarship
Korea Science and Engineering Foundation
-
Aug. 2005
Silver Medal
17th International Olympiad in Informatics, Nowy Sącz, Poland
Teaching Experiences
-
Sep. 2024 - present
Adjunct Assistant Professor
Graduate School of Data Science, Seoul National University
-
Aug. 2015 - Aug. 2019
Lecturer
2019 Accelerator Programming Summer School
2019 Accelerator Programming Winter School
2018 Accelerator Programming Summer School
2018 Accelerator Programming Winter School
2017 Accelerator Programming Summer School
2017 Accelerator Programming Winter School
2016 National Supercomputing Summer School @ SNU
2016 National Supercomputing Winter School @ SNU
2015 National Supercomputing Summer School @ SNU
-
Mar. 2017 - Jun. 2017
Teaching Assistant
SNU 4190.103A Programming Practice (Spring 2017)
-
Jun. 2014 - Sep. 2014
Lecturer
Multicore Programming Expert Course, Employment-Linked R&D Training Center,
Small and Medium Business Administration -
Mar. 2010 - Jun. 2014
Teaching Assistant
SNU 4190.414A Multicore Computing (Spring 2014)
SNU 4190.414A Multicore Computing (Spring 2013)
SNU 4190.409 Compilers (Spring 2012)
SNU 010.133 Digital Computer Concept and Practice (Spring 2011)
SNU 010.133 Digital Computer Concept and Practice (Spring 2010)
-
Jul. 2006 - Jan. 2009
Teaching Assistant
National Training Camp for the Internationl Olympiad in Informatics,
Korean Institute of Information Scientists and Engineers
Professional Activities
-
Nov. 2017 - Dec. 2017
Artifact Evaluation Committee
CGO 2018
-
Jun. 2017 - Jul. 2017
Artifact Evaluation Committee
PACT 2017
-
Nov. 2016 - Dec. 2016
Artifact Evaluation Committee
PPoPP-CGO 2017
Publications
Conference Papers
-
SnuRHAC: A Runtime for Heterogeneous Accelerator Clusters with CUDA Unified Memory
HPDC '21: Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing,
pp. 107—120, Stockholm, Sweden, June 2021.
-
SOFF: An OpenCL High-Level Synthesis Framework for FPGAs
ISCA '20: Proceedings of the 47th Annual International Symposium on Computer Architecture,
pp. 295—308, Valencia, Spain, June 2020.
-
PIPSEA: A Practical IPsec Gateway on Embedded APUs
CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security,
pp. 1255—1267, Vienna, Austria, October 2016.
-
A Distributed OpenCL Framework using Redundant Computation and Data Replication
PLDI '16: Proceedings of the 37th Annual ACM SIGPLAN Conference on Programming Language Design and Implementation,
pp. 553—569, Santa Barbara, California, USA, June 2016.
-
Automatic OpenCL Work-Group Size Selection for Multicore CPUs
PACT '13: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques,
pp. 387—397, Edinburgh, Scotland (UK), September 2013.
-
SnuCL: an OpenCL Framework for Heterogeneous CPU/GPU Clusters
ICS '12: Proceedings of the 26th ACM International Conference on Supercomputing,
pp. 341—352, San Servolo Island, Venice, Italy, June 2012.
-
Performance Characterization of the NAS Parallel Benchmarks in OpenCL
IISWC '11: Proceedings of the 2011 IEEE International Symposium on Workload Characterization,
pp. 137—148, Austin, Texas, USA, November 2011.
Journal Articles
-
Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes
IEEE Transactions on Parallel and Distributed Systems (TPDS),
vol. 26, no. 7, pp. 1814—1825, July 2015.
Workshop Papers
-
Memory-Access-Pattern Analysis Techniques for OpenCL Kernels
LCPC '17: Proceedings of the 30th International Workshop on Languages and Compilers for Parallel Computing,
pp. 109—126, College Station, Texas, USA, October 2017.
-
OpenCL Framework for ARM Processors with NEON Support
WPMVP '14: Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing,
pp. 33—40, Orlando, Florida, USA, February 2014.
-
OpenCL as a Programming Model for GPU Clusters
LCPC '11: Proceedings of the 24th International Workshop on Languages and Compilers for Parallel Computing,
pp. 76—90, Fort Collins, Colorado, USA, September 2011.
Posters
-
MAPA: An Automatic Memory Access Pattern Analyzer for GPU Applications
Poster presentation in PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,
pp. 443—444, Austin, Texas, USA, February 2017.
-
OpenCL as a Unified Programming Model for Heterogeneous CPU/GPU Clusters
Poster presentation in PPoPP '12: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,
pp. 299—300, New Orleans, Louisiana, USA, February 2012.
Book Chapters
-
SnuCL: A unified OpenCL framework for heterogeneous clusters
Advances in GPU Research and Practice,
pp. 23—56, Morgan Kaufmann, September 2016.
ISBN: 9780128037386
Papers in Korean
-
Techniques to Modify and Execute GPU Code in CUDA Application Binaries
Korea Computer Congress 2019 (KCC),
pp. 1101—1103, 2019.
RISS: 106573136
-
Universal Heterogeneous Programming Environment
Communications of KIISE,
vol. 35, no. 10, pp. 18—31, 2017.
RISS: 103625426
-
Automatic Optimization Methods for Image Processing Programs Using OpenCL
KIISE Transactions on Computing Practices,
vol. 23, no. 3, pp. 188—193, 2017.
RISS: 102918274
-
Hardware and Software Support for Deep Learning
Communications of KIISE,
vol. 34, no. 9, pp. 10—20, 2016.
RISS: 102075344
-
Automatic Optimization Methods for Image Processing Programs Using OpenCL
Korea Computer Congress 2016 (KCC),
pp. 1494—1496, 2016.
RISS: 102087847
-
HPC Technology Trends of Big Data Analyses with Supercomputers
Communications of KIISE,
vol. 34, no. 2, pp. 31—42, 2016.
RISS: 101755688
-
LRC: A Lightweight Communication Library for High Performance Computing
Korea Computer Congress 2015 (KCC),
pp. 33—35, 2015.
RISS: 100640266
-
SnuCL: OpenCL Programming Environment for Heterogeneous Manycore Clusters
Communications of KIISE,
vol. 32, no. 5, pp. 66—76, 2014.
RISS: 100034842
-
Trends on Heterogeneous Supercomputers and a Case Study on the Development of a Supercomputer Chundoong
Communications of KIISE,
vol. 31, no. 4, pp. 34—41, 2013.
RISS: 99606073
-
Design and Implementation of Virtual Machines as an Aid in Teaching Computer Concepts
Korea Computer Congress 2012 (KCC),
vol. 39, no. 1A, pp. 131—133, 2012.
RISS: 60197567
-
Current Status and Development Prospects of High Performance Computing Technology
The Journal of Korean Institute of Next Generation Computing,
vol. 8, no. 2, pp. 99—117, 2012.
RISS: 60098652
-
Implementation of Register Allocator for JavaScript JIT Compiler
2011 KIISE Fall Conference,
vol. 38, no. 2A, pp. 194—197, 2011.
RISS: 82732085
-
Measuring JavaScript Performance with a Real World Web Application
2011 KIISE Fall Conference,
vol. 38, no. 2A, pp. 131—134, 2011.
RISS: 82732048
-
Alias Analysis for JavaScript Program Optimization
Korea Computer Congress 2011 (KCC),
vol. 38, no. 1C, pp. 462—465, 2011.
RISS: 82666651