Harim Kim
I received my M.S. degree in Computer Science at Handong Global University under the supervision of Prof. Charmgil Hong in Handong Artificial Intelligence Lab. (HAIL), and I am currently seeking Ph.D. opportunities.
My current research interests lie in developing intent-driven deep learning frameworks to address fundamental challenges in medical AI.
In particular, I am interested in:
- Exploring multimodal data fusion strategies for robust and informative representation learning
- Constructing anomaly detection techniques informed by latent space understanding
For more details about my academic background, publications, and research experiences, please refer to Resume page.
Selected Projects
This section introduces a selection of representative research projects. For each topic, I provide a brief overview and a description of the most recent publication.
Integrating Multimodal Medical Data
• Most Recent Publication •

Unsupervised anomaly detection (UAD) in medical imaging is crucial for identifying pathological abnormalities without requiring extensive labeled data. However, existing diffusion-based UAD models rely solely on imaging features, limiting their ability to distinguish between normal anatomical variations and pathological anomalies. To address this, we propose Diff3M, a multi-modal diffusion-based framework that integrates chest X-rays and structured Electronic Health Records (EHRs) for enhanced anomaly detection. Specifically, we introduce a novel Image-EHR Cross-Attention module to incorporate structured clinical context into the image generation process, improving the model’s ability to differentiate normal from abnormal features. Additionally, we develop a static masking strategy to enhance the reconstruction of normal-like images from anomalies. Extensive evaluations on CheXpert and MIMIC-CXR/IV demonstrate that Diff3M achieves state-of-the-art performance, outperforming existing UAD methods in medical imaging. Our implementation is available at https://github.com/nth221/Diff3M.
Reinterpreting for Enhanced Anomaly Detection
• Most Recent Publication •

In data analysis, unsupervised anomaly detection holds an important position for identifying statistical outliers that signify atypical behavior, erroneous readings, or interesting patterns within data. The Transformer model, known for its ability to capture dependencies within sequences, has revolutionized areas such as text and image data analysis. However, its potential for tabular data, where sequence dependencies are not inherently present, remains underexplored. This paper introduces Transformer for Point Anomaly Detection (TransPAD), a novel Transformer-based AutoEncoder framework specifically designed for point anomaly detection. Our method captures interdependencies across entire datasets, addressing the challenges posed with non-sequential, tabular data. It incorporates unique random and criteria sampling strategies for effective training and anomaly identification, and avoids the common pitfall of trivial generalization that affects many conventional methods. By leveraging an attention weight-based anomaly scoring system, TransPAD offers a more precise approach to detect anomalies. Extensive testing on a range of benchmark tabular datasets shows that TransPAD consistently outperforms existing methods. Our source code is available at https://github.com/nth221/TransPAD.
Development for System-level Application
• Most Recent Publication •

As car-sharing services evolve, there is a growing effort to analyze users’ safe driving behaviors and effectively manage shared vehicles. Unlike previous researches that focus on simple situations like sudden acceleration and lane departure using cameras with additional sensors, we introduce a new approach that detects more complex traffic rule violation, especially red-light violation, using only the monocular dashcam videos. The proposed framework employs the attention mechanism of Transformer, and effectively encodes the traffic signal objects and contextual information within the video. It utilizes a novel method, POISE (Positional Object Information by Spatial Encoding), to handle the positional information of traffic signal objects. Our quantitative and qualitative evaluations demonstrate the effectiveness of our proposed framework in detecting red-light violations compared to existing methods.
