Blog

Sharing our knowledge

Visualising the data lake using proximity graphs in Gephi

Gephi is a free graph analysis and visualisation tool developed over the Netbeans platform for Java. It supports computing statistics over graphs, applying algorithms to analyse and to visualise the graphs, and to apply filters and queries over the graphs, all using an intuitive graphical-user interface. It has become very popular today due to its strong capabilities and support by the user community. We will demonstrate in this article how to use it to visualise a proximity graph for the data lake.

 

 

Mining similarity between text-documents using Apache Lucene

One of the main challenges in Big Data environments is to find all similar documents which have common information.To handle the challenge of finding similar free-text documents, there is a need to apply a structured text-mining process to execute two tasks: 1. profile the documents to extract their descriptive metadata, 2. to compare the profiles of pairs of documents to detect their overall similarity. Both tasks can be handled by an open-source text-mining project like Apache Lucene.