BSc Thesis in 2011
Building cubes with MapReduce
In the last years, the problems of using generic storage techniques for very specific applications has been detected and outlined. Thus, some alternatives to relational DBMSs (e.g., BigTable) are blooming. On the other hand, cloud computing is already a reality that helps to save money by eliminating the hardware as well as software fixed costs and just pay per use. Indeed, specific software tools to exploit a cloud are also here. The trend in this case is toward using tools based on the MapReduce paradigm developed by Google. In this paper, we explore the possibility of having data in a cloud by using BigTable to store the corporate historical data and MapReduce as an agile mechanism to deploy cubes in ad-hoc Data Marts. Our main contribution is the comparison of three different approaches to retrieve data cubes from BigTable by means of MapReduce and the definition of criteria to choose among them.