Data-Processing-on-Large-Clusters

所属分类:单片机开发
开发工具:TEXT
文件大小:149KB
下载次数:6
上传日期:2015-07-10 15:35:55
上 传 者Maddy619
说明:  Abstract: Map-Reduce is a programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. Through a simple interface with two functions, map and reduce, this model facilitates parallel implementation of many real-world tasks such as data processing for search engines and machine learning. However, this model does not directly support processing multiple related heterogeneous datasets. While processing relational data is a common need, this limitation causes dif- ficulties and/or inefficiency when Map-Reduce is applied on relational operations like joins. We improve Map-Reduce into a new model called MapReduce-Merge. It adds to Map-Reduce a Merge phase that can efficiently merge data already partitioned and sorted (or hashed) by map and reduce modules. We also demonstrate that this new model can express relational algebra operators as well as implement several join algorithms.

文件列表:
Map-Reduce-Merge Simplified Relational Data Processing on Large Clusters.docx (232212, 2015-07-07)

近期下载者

相关文件


收藏者