ProgrammingWithScalding
所属分类:collect
开发工具:Scala
文件大小:0KB
下载次数:0
上传日期:2015-12-05 15:51:39
上 传 者:
sh-1993
说明: 使用缩放对MapReduce进行编程,
(Programming MapReduce with Scalding,)
文件列表:
LICENSE (0, 2015-05-13)
chapter1/ (0, 2015-05-13)
chapter1/pom.xml (2139, 2015-05-13)
chapter1/run.bat (121, 2015-05-13)
chapter1/run.sh (65, 2015-05-13)
chapter1/src/ (0, 2015-05-13)
chapter1/src/main/ (0, 2015-05-13)
chapter1/src/main/java/ (0, 2015-05-13)
chapter1/src/main/java/CascadingExample.java (1108, 2015-05-13)
chapter1/src/main/resources/ (0, 2015-05-13)
chapter1/src/main/resources/products.tsv (131, 2015-05-13)
chapter2/ (0, 2015-05-13)
chapter2/pom.xml (658, 2015-05-13)
chapter2/runHDFS.sh (327, 2015-05-13)
chapter2/runLocal.bat (333, 2015-05-13)
chapter2/runLocal.sh (268, 2015-05-13)
chapter2/src/ (0, 2015-05-13)
chapter2/src/main/ (0, 2015-05-13)
chapter2/src/main/scala/ (0, 2015-05-13)
chapter2/src/main/scala/HelloWorld.scala (88, 2015-05-13)
chapter2/src/main/scala/WordCountJob.scala (296, 2015-05-13)
chapter3/ (0, 2015-05-13)
chapter3/createJan2014.sh (297, 2015-05-13)
chapter3/data/ (0, 2015-05-13)
chapter3/data/avro/ (0, 2015-05-13)
chapter3/data/avro/part-00000.avro (1894403, 2015-05-13)
chapter3/data/input.parquet (353, 2015-05-13)
chapter3/flatMap.bat (261, 2015-05-13)
chapter3/flatMapHDFS.bat (339, 2015-05-13)
chapter3/pom.xml (4997, 2015-05-13)
chapter3/readAvro.sh (420, 2015-05-13)
chapter3/runHDFS.sh (317, 2015-05-13)
chapter3/runHdfsInputTest.sh (174, 2015-05-13)
chapter3/runLocal.sh (217, 2015-05-13)
chapter3/src/ (0, 2015-05-13)
chapter3/src/main/ (0, 2015-05-13)
chapter3/src/main/resources/ (0, 2015-05-13)
... ...
Source code for PACKT Book '**Programming MapReduce With Scalding**'
Find more information at http://scalding.io/
The book consists of 9 chapters
* **Introduction to Map-Reduce** -
Introduction to Hadoop, Map Reduce, Pipelining, Cascading, Pig and Hive.
Chapter presents benefits of higher level abstractions of Map Reduce (concepts and capabilities).
* **Get ready for Scalding** -
Theory about Scalding - the Scala Domain Specific Language utilising Cascading.
Development environment setup including local hadoop cluster for development.
Execute the first `Hello World` Scalding example.
* **Scalding by example** -
The core capabilities of scalding: i) Map-like functions, ii) Grouping/reducing functions iii) Join operations
* **Intermediate examples** -
A Scalding log processing flow for a News company, aggregating multiple sources will be presented.
Through an example with multiple pipe-lines some more advanced concepts are presented.
* **Scalding Design Patterns** -
Interesting design patterns applicable to Scalding data processing applications. Using the 'External Operations' patters will enable us performing unit testing and structuring our applications in a modular way.
* **Testing & TDD** -
Best practices of first defining behaviour (_Behaviour Driven Development_) then tests (_Test Driven Development_) and then completing the implementation. How to write unit, integration tests and also apply Black-box testing methodologies in the context of Big Data.
* **Running Scalding in Production** -
Tips and tricks on how to execute and schedule jobs. Also how to co-ordinate the execution of Scalding/Scala/Java and even external system processes. Finally how to configure Scalding jobs using property files or Hadoop parameters, how to monitor and optimize jobs and other usefull tips.
* **Using external data stores** -
Interaction with external external SQL, NOSQL and in-memory applications like HBase, SQL, ElasticSearch etc.
* **Matrix Calculations and Machine Learning** -
Matrix calculations using the Matrix API and algebird to calculate text similarity (TF-IDF)
and set similarity (Jaccard). Then another example on Mahout K-Means clustering and outlier detection.
近期下载者:
相关文件:
收藏者: