ProgrammingWithScalding

所属分类:collect
开发工具:Scala
文件大小:0KB
下载次数:0
上传日期:2015-12-05 15:51:39
上 传 者sh-1993
说明:  使用缩放对MapReduce进行编程,
(Programming MapReduce with Scalding,)

文件列表:
LICENSE (0, 2015-05-13)
chapter1/ (0, 2015-05-13)
chapter1/pom.xml (2139, 2015-05-13)
chapter1/run.bat (121, 2015-05-13)
chapter1/run.sh (65, 2015-05-13)
chapter1/src/ (0, 2015-05-13)
chapter1/src/main/ (0, 2015-05-13)
chapter1/src/main/java/ (0, 2015-05-13)
chapter1/src/main/java/CascadingExample.java (1108, 2015-05-13)
chapter1/src/main/resources/ (0, 2015-05-13)
chapter1/src/main/resources/products.tsv (131, 2015-05-13)
chapter2/ (0, 2015-05-13)
chapter2/pom.xml (658, 2015-05-13)
chapter2/runHDFS.sh (327, 2015-05-13)
chapter2/runLocal.bat (333, 2015-05-13)
chapter2/runLocal.sh (268, 2015-05-13)
chapter2/src/ (0, 2015-05-13)
chapter2/src/main/ (0, 2015-05-13)
chapter2/src/main/scala/ (0, 2015-05-13)
chapter2/src/main/scala/HelloWorld.scala (88, 2015-05-13)
chapter2/src/main/scala/WordCountJob.scala (296, 2015-05-13)
chapter3/ (0, 2015-05-13)
chapter3/createJan2014.sh (297, 2015-05-13)
chapter3/data/ (0, 2015-05-13)
chapter3/data/avro/ (0, 2015-05-13)
chapter3/data/avro/part-00000.avro (1894403, 2015-05-13)
chapter3/data/input.parquet (353, 2015-05-13)
chapter3/flatMap.bat (261, 2015-05-13)
chapter3/flatMapHDFS.bat (339, 2015-05-13)
chapter3/pom.xml (4997, 2015-05-13)
chapter3/readAvro.sh (420, 2015-05-13)
chapter3/runHDFS.sh (317, 2015-05-13)
chapter3/runHdfsInputTest.sh (174, 2015-05-13)
chapter3/runLocal.sh (217, 2015-05-13)
chapter3/src/ (0, 2015-05-13)
chapter3/src/main/ (0, 2015-05-13)
chapter3/src/main/resources/ (0, 2015-05-13)
... ...

Source code for PACKT Book '**Programming MapReduce With Scalding**'

Find more information at http://scalding.io/ The book consists of 9 chapters * **Introduction to Map-Reduce** - Introduction to Hadoop, Map Reduce, Pipelining, Cascading, Pig and Hive. Chapter presents benefits of higher level abstractions of Map Reduce (concepts and capabilities). * **Get ready for Scalding** - Theory about Scalding - the Scala Domain Specific Language utilising Cascading. Development environment setup including local hadoop cluster for development. Execute the first `Hello World` Scalding example. * **Scalding by example** - The core capabilities of scalding: i) Map-like functions, ii) Grouping/reducing functions iii) Join operations * **Intermediate examples** - A Scalding log processing flow for a News company, aggregating multiple sources will be presented. Through an example with multiple pipe-lines some more advanced concepts are presented. * **Scalding Design Patterns** - Interesting design patterns applicable to Scalding data processing applications. Using the 'External Operations' patters will enable us performing unit testing and structuring our applications in a modular way. * **Testing & TDD** - Best practices of first defining behaviour (_Behaviour Driven Development_) then tests (_Test Driven Development_) and then completing the implementation. How to write unit, integration tests and also apply Black-box testing methodologies in the context of Big Data. * **Running Scalding in Production** - Tips and tricks on how to execute and schedule jobs. Also how to co-ordinate the execution of Scalding/Scala/Java and even external system processes. Finally how to configure Scalding jobs using property files or Hadoop parameters, how to monitor and optimize jobs and other usefull tips. * **Using external data stores** - Interaction with external external SQL, NOSQL and in-memory applications like HBase, SQL, ElasticSearch etc. * **Matrix Calculations and Machine Learning** - Matrix calculations using the Matrix API and algebird to calculate text similarity (TF-IDF) and set similarity (Jaccard). Then another example on Mahout K-Means clustering and outlier detection.

近期下载者

相关文件


收藏者