💡 Idea Description:
Currently, Dynamics 365 OCR relies on checksum hash comparison to detect duplicate invoice files. This approach fails when two files contain identical content but are regenerated separately—resulting in different hashes. As a result, duplicate invoices can slip through undetected, leading to potential processing errors and inefficiencies.
Proposal: Introduce a secondary layer of duplicate detection based on file content analysis. This enhancement would allow D365 to:
- Detect duplicates even when files are regenerated and have different checksums
- Reduce manual intervention and invoice reconciliation errors
- Improve overall accuracy and reliability of OCR processing
Why It Matters: In real-world scenarios, invoices are often regenerated or re-exported from ERP systems, especially during corrections or reprocessing. Despite having the same content, these files are treated as unique by D365 due to checksum differences. A content-aware duplicate check would significantly improve invoice automation and reduce operational risks.
Suggested Implementation:
- Use text extraction or semantic comparison to identify content-level duplicates
- Provide a configurable threshold for similarity detection
If this feature would benefit your organization, please vote and share!
