Conference Paper ICDAR Image Classification Zero-Shot Learning Document Analysis

CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification

Sankalp Sinha, Muhammad Saif Ullah Khan, Talha Uddin Sheikh, Didier Stricker, Muhammad Zeshan Afzal

Paper Code

Abstract

Zero-shot learning has been extensively investigated in the broader field of visual recognition, attracting significant interest recently. However, the current work on zero-shot learning in document image classification remains scarce. The existing studies either focus exclusively on zero-shot inference, or their evaluation does not align with the established criteria of zero-shot evaluation in the visual recognition domain. We provide a comprehensive document image classification analysis in Zero-Shot Learning (ZSL) and Generalized Zero-Shot Learning (GZSL) settings to address this gap. Our methodology and evaluation align with the established practices of this domain. Additionally, we propose zero-shot splits for the RVL-CDIP dataset. Furthermore, we introduce CICA (pronounced 'ki-ka'), a framework that enhances the zero-shot learning capabilities of CLIP. CICA consists of a novel 'content module' designed to leverage any generic document-related textual information. The discriminative features extracted by this module are aligned with CLIP's text and image features using a novel 'coupled-contrastive' loss. Our module improves CLIP's ZSL top-1 accuracy by 6.7% and GZSL harmonic mean by 24% on the RVL-CDIP dataset. Our module is lightweight and adds only 3.3% more parameters to CLIP. Our work sets the direction for future research in zero-shot document classification.

TL;DR

We propose CICA, a novel framework for zero-shot document image classification.
CICA enhances CLIP’s zero-shot learning capabilities by leveraging a novel content module.
We set a new benchmark for zero-shot document classification on the RVL-CDIP dataset.

@inproceedings{sinha2024cica, 
  author={Sinha, Sankalp and Khan, Muhammad Saif Ullah and Sheikh, Talha Uddin and Stricker, Didier and Afzal, Muhammad Zeshan}, 
  editor={Barney Smith, Elisa H. and Liwicki, Marcus and Peng, Liangrui}, 
  title={CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification}, 
  booktitle={Document Analysis and Recognition - ICDAR 2024}, 
  year={2024}, 
  publisher={Springer Nature Switzerland}, 
  address={Cham}, 
  pages={124--141}, 
  isbn={978-3-031-70546-5}, 
  doi={10.1007/978-3-031-70546-5_8} 
}

Maintained by saifkhichi96 on GitHub.

The website is distributed under different open-source licenses. For more details, see the notice at the bottom of the page.

Found an issue? Report it or edit this page to help us improve.