Intellix - Intelligent Adaptive Indexing of heterogenous document collections
In the course of the Intellix project employees of the Technische Universität Dresden work together with the DocuWare company on concepts and solutions for indexing electronic document collections within document management systems (DMS) automatically.
The project concentrates on greatly reducing indexing effort on document collections, in order to make this technology a viable solution for small companies and home users in addition to big companies. For this purpose it is necessary to apply automatic classificators and extractors on these collections, thereby separating invoices and reminders and extracting important information like sender, invoice number and invoice amount. However such information retrieval processes are not absolut precise. Therefore it is necessary to improve them with user feedback, which should be shared among all end users participating at the system. An SaaS approach is taken to satisfy this requirement. The vision is an automatic classification and extraction service of documents from small companies and home users by using the large datasets provided by big companies. These big companies on the other hand get the feedback from the large collection of different documents from the small companies and home users. To make this approach successful it is of course neccessary to protect data of different users from each other.
Prof. Dr. rer. nat. habil. Dr. h. c. Alexander Schill
Dipl.-Medieninf. Klemens Muthmann
Dr.-Ing. Daniel Schuster
Dipl.-Medieninf. David Urbansky
1. Daniel Schuster, Klemens Muthmann, Daniel Esser, Alexander Schill, Michael Berger, Christoph Weidling, Kamil Aliyev, Andreas Hofmeier:
Intellix - End-User Trained Information Extraction for Document Archiving;
The 12th International Conference on Document Analysis and Recognition (ICDAR); Washington, DC, USA; 2013
2. Daniel Esser, Daniel Schuster, Klemens Muthmann, Michael Berger, Alexander Schill:
Automatic Indexing of Scanned Documents - a Layout-based Approach;
IS&T/SPIE Document Recognition and Retrieval XIX (DRR 2012); San Francisco, CA, USA; 2012
3. Marcel Hanke, Klemens Muthmann, Daniel Schuster, Alexander Schill, Kamil Aliyev, Michael Berger:
Continuous User Feedback Learning for Data Capture from Business Documents;
Workshop on Nonstationary Models of Pattern Recognition and Classifier Combinations under the framwork of HAIS2012, Salamanca, Spain; 2012