![Mining similarity between text-documents using Apache Lucene](http://res.cloudinary.com/dfg89e6oo/image/upload/c_thumb,f_auto,g_faces,h_320,w_600/v1497621002/ytqceu7cbekcnvyyylg4.png)
Mining similarity between text-documents using Apache Lucene
One of the main challenges in Big Data environments is to find all similar documents which have common information.To handle the challenge of finding similar free-text documents, there is a need to apply a structured text-mining process to execute two tasks: 1. profile the documents to extract their descriptive metadata, 2. to compare the profiles of pairs of documents to detect their overall similarity. Both tasks can be handled by an open-source text-mining project like Apache Lucene.