The financial industry faces a substantial workload in verifying document images. Existing methods based on visual features struggle to identify fraudulent document images due to the lack of visual clues in the tampered regions. This paper proposes CSIAD (Cross-Sample Image Anomaly Detection), a novel framework that leverages LLMs to identify logical inconsistencies in semantically similar images. CSIAD accurately detects forged images with subtle tampering and provides human-interpretable explanations for each anomaly. Furthermore, we introduce CrossCred, a new benchmark of real-world fraudulent document images with fine-grained manual annotations. Experiments demonstrate that CSIAD outperforms state-of-the-art image fraud detection methods by 79.6% (F1) on CrossCred, and exceeds deployed industrial solutions by 21.7% (F1) on real business data.
Comparison between TTD (Tampered Text Detection) and CSIAD (Cross-Sample Image Anomaly Detection). The forged image is a mobile payment record.
To better address the limitations of existing image fraud detection datasets in the financial sector, we introduce a new benchmark named CrossCred. This dataset is constructed entirely from real-world data collected in complex financial fraud scenarios, where document images are carefully forged to evade current detection technologies. CrossCred includes detailed human annotations, covering both tampered regions and corresponding explanations.
Unlike previous datasets that provide only a single sample per case, CrossCred supports cross-sample inference: each fraudulent case is linked to a small group of semantically similar images, enabling the evaluation of reasoning-based detection methods. Specifically, from a collection of 400K business images, we mined 109 cross-sample anomaly cases (396 total samples, averaging 3.63 samples per case) and 109 randomly selected anomaly-free images. In total, the benchmark consists of 505 images spanning 61 distinct document types.
Examples of the Human Annotation Process. Given the LLM's pre-annotated conclusions, human annotators need to determine whether anomalies exist in the actual images and whether the LLM's pre-annotated conclusions align with the ground truth. They then select one of five predefined labels (TP, FP-A, FN, FP-N, TN) to evaluate the LLM's pre-annotated results. If the selected label is FP-A or FN (marked with *), it indicates that the LLM failed to identify the correct anomalous elements, and annotators are required to correct or supplement the anomalous elements along with the associated images. Figures (a), (b), and (c) demonstrate examples of LLM pre-annotated results as TP, FP-A, and FP-N, respectively.
Framework for CSIAD. The diagram outlines the key modules: (a) Retrieval Ensemble, (b) Sample-Specific Rule Generation, (c) Fact-Driven Verification, and (d) Rule-Based Anomaly Analysis. To determine whether a document image is fraudulent, a natural workflow is to "consult" relevant images to identify potential anomalies. As illustrated in the framework for CSIAD, which consists of four core modules:
Iq, this module retrieves a batch of relevant images, forming a suspicious set Sq.Rinit that normal samples are expected to follow.Rverified.Sq.
@inproceedings{wang-etal-2025-innovative,
title = "Innovative Image Fraud Detection with Cross-Sample Anomaly Analysis: The Power of {LLM}s",
author = "Wang, QiWen and
Yang, Junqi and
Lin, Zhenghao and
Ying, Zhenzhe and
Wang, Weiqiang and
Lin, Chen",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.acl-long.687/",
doi = "10.18653/v1/2025.acl-long.687",
pages = "14058--14078",
ISBN = "979-8-89176-251-0",
abstract = "The financial industry faces a substantial workload in verifying document images. Existing methods based on visual features struggle to identify fraudulent document images due to the lack of visual clues on the tampering region. This paper proposes CSIAD (Cross-Sample Image Anomaly Detection) by leveraging LLMs to identify logical inconsistencies in similar images. This novel framework accurately detects forged images with slight tampering traces and explains anomaly detection results. Furthermore, we introduce CrossCred, a new benchmark of real-world fraudulent images with fine-grained manual annotations. Experiments demonstrate that CSIAD outperforms state-of-the-art image fraud detection methods by 79.6{\%} (F1) on CrossCred and deployed industrial solutions by 21.7{\%} (F1) on business data. The benchmark is available at https://github.com/XMUDM/CSIAD."
}