Innovative Image Fraud Detection with Cross-Sample Anomaly Analysis: The Power of LLMs

Xiamen University Ant Group
*Corresponding Author

Abstract

The financial industry faces a substantial workload in verifying document images. Existing methods based on visual features struggle to identify fraudulent document images due to the lack of visual clues in the tampered regions. This paper proposes CSIAD (Cross-Sample Image Anomaly Detection), a novel framework that leverages LLMs to identify logical inconsistencies in semantically similar images. CSIAD accurately detects forged images with subtle tampering and provides human-interpretable explanations for each anomaly. Furthermore, we introduce CrossCred, a new benchmark of real-world fraudulent document images with fine-grained manual annotations. Experiments demonstrate that CSIAD outperforms state-of-the-art image fraud detection methods by 79.6% (F1) on CrossCred, and exceeds deployed industrial solutions by 21.7% (F1) on real business data.

Introduction

TaskDefine

Comparison between TTD (Tampered Text Detection) and CSIAD (Cross-Sample Image Anomaly Detection). The forged image is a mobile payment record.

  • Left: Existing TTD relies on visual features of a single sample, which struggles with subtle tampering traces.
  • Right: Although image fraud detection is limited when based on single-image visual features, reviewers can often detect anomalous images through cross-sample logical inconsistencies during manual verification processes. As illustrated in Figure  (b), the tampered region includes the account and address information because the forged payment record copies the account number from other images while altering the account name and address. CSIAD incorporates cross-sample analysis based on textual information, effectively increasing the accuracy and explainability of image fraud detection.

To better address the limitations of existing image fraud detection datasets in the financial sector, we introduce a new benchmark named CrossCred. This dataset is constructed entirely from real-world data collected in complex financial fraud scenarios, where document images are carefully forged to evade current detection technologies. CrossCred includes detailed human annotations, covering both tampered regions and corresponding explanations. Unlike previous datasets that provide only a single sample per case, CrossCred supports cross-sample inference: each fraudulent case is linked to a small group of semantically similar images, enabling the evaluation of reasoning-based detection methods. Specifically, from a collection of 400K business images, we mined 109 cross-sample anomaly cases (396 total samples, averaging 3.63 samples per case) and 109 randomly selected anomaly-free images. In total, the benchmark consists of 505 images spanning 61 distinct document types.

Examples of the Human Annotation Process.

Examples of the Human Annotation Process. Given the LLM's pre-annotated conclusions, human annotators need to determine whether anomalies exist in the actual images and whether the LLM's pre-annotated conclusions align with the ground truth. They then select one of five predefined labels (TP, FP-A, FN, FP-N, TN) to evaluate the LLM's pre-annotated results. If the selected label is FP-A or FN (marked with *), it indicates that the LLM failed to identify the correct anomalous elements, and annotators are required to correct or supplement the anomalous elements along with the associated images. Figures (a), (b), and (c) demonstrate examples of LLM pre-annotated results as TP, FP-A, and FP-N, respectively.

Framework for CSIAD.

Framework for CSIAD. The diagram outlines the key modules: (a) Retrieval Ensemble, (b) Sample-Specific Rule Generation, (c) Fact-Driven Verification, and (d) Rule-Based Anomaly Analysis. To determine whether a document image is fraudulent, a natural workflow is to "consult" relevant images to identify potential anomalies. As illustrated in the framework for CSIAD, which consists of four core modules:

  • (a) Retrieval Ensemble: Given a query image Iq, this module retrieves a batch of relevant images, forming a suspicious set Sq.
  • (b) Sample-Specific Rule Generation: Generates a set of rules Rinit that normal samples are expected to follow.
  • (c) Fact-Driven Verification: Validates and filters the rules using factual evidence to yield a refined set Rverified.
  • (d) Rule-Based Anomaly Detection: Uses the verified rules to detect and explain anomalies in Sq.

Demonstrative Example of the Complete CSIAD Workflow.

Complete Workflow of CSIAD Analysis (1/3).
Complete Workflow of CSIAD Analysis (1/3).
Complete Workflow of CSIAD Analysis (2/3).
Complete Workflow of CSIAD Analysis (2/3).
Complete Workflow of CSIAD Analysis (3/3).
Complete Workflow of CSIAD Analysis (3/3).

BibTeX


@inproceedings{wang-etal-2025-innovative,
    title = "Innovative Image Fraud Detection with Cross-Sample Anomaly Analysis: The Power of {LLM}s",
    author = "Wang, QiWen  and
      Yang, Junqi  and
      Lin, Zhenghao  and
      Ying, Zhenzhe  and
      Wang, Weiqiang  and
      Lin, Chen",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-long.687/",
    doi = "10.18653/v1/2025.acl-long.687",
    pages = "14058--14078",
    ISBN = "979-8-89176-251-0",
    abstract = "The financial industry faces a substantial workload in verifying document images. Existing methods based on visual features struggle to identify fraudulent document images due to the lack of visual clues on the tampering region. This paper proposes CSIAD (Cross-Sample Image Anomaly Detection) by leveraging LLMs to identify logical inconsistencies in similar images. This novel framework accurately detects forged images with slight tampering traces and explains anomaly detection results. Furthermore, we introduce CrossCred, a new benchmark of real-world fraudulent images with fine-grained manual annotations. Experiments demonstrate that CSIAD outperforms state-of-the-art image fraud detection methods by 79.6{\%} (F1) on CrossCred and deployed industrial solutions by 21.7{\%} (F1) on business data. The benchmark is available at https://github.com/XMUDM/CSIAD."
}