Microsoft Idea

Enhance OCR Duplicate Detection Upon Receiving the File by Using Content-Based Validation in Addition to Checksum

LOVELY GIE LOPEZ on 8/11/2025 6:04:17 AM

💡 Idea Description:

Currently, Dynamics 365 OCR relies on checksum hash comparison to detect duplicate invoice files. This approach fails when two files contain identical content but are regenerated separately—resulting in different hashes. As a result, duplicate invoices can slip through undetected, leading to potential processing errors and inefficiencies.

Proposal: Introduce a secondary layer of duplicate detection based on file content analysis. This enhancement would allow D365 to:

Detect duplicates even when files are regenerated and have different checksums
Reduce manual intervention and invoice reconciliation errors
Improve overall accuracy and reliability of OCR processing

Why It Matters: In real-world scenarios, invoices are often regenerated or re-exported from ERP systems, especially during corrections or reprocessing. Despite having the same content, these files are treated as unique by D365 due to checksum differences. A content-aware duplicate check would significantly improve invoice automation and reduce operational risks.

Suggested Implementation:

Use text extraction or semantic comparison to identify content-level duplicates
Provide a configurable threshold for similarity detection

If this feature would benefit your organization, please vote and share!

STATUS DETAILS

New