Hadoop MapReduce V2参考手册(第2版 影印版 英文版)

《Hadoop MapReduce V2参考手册(第2版 影印版 英文版)》

  • 出版社:东南大学出版社
  • ISBN:9787564160890
  • 版次:2
  • 商品编码:11845851
  • 品牌:南京东南大学出版社
  • 包装:平装
  • 外文名称:HapdoopMapReducev2Cookbook,SecondEdition
  • 开本:16开
  • 出版时间:2016-01-01
  • 用纸:胶版纸
  • 页数:304
  • 字数:392000
  • 正文语种:英文
基本介绍书籍目录点评信息
  • 书籍内容

      《Hadoop MapReduce V2参考手册(第2版 影印版 英文版)》开篇介绍了Hadoop YARN、MapReduce、HDFs以及其他Hadoop生态系统组件的安装。在《Hadoop MapReduce V2参考手册(第2版 影印版 英文版)》的指引下,你很快就会学习到很多激动人心的主题,例如MapReduce模式,使用Hadoop处理分析、归类、在线销售、推荐、数据索引及搜索。你还会学习到如何使用包括Hive、HBase、Pig、Mahout、Nutch~BGi raph在内的Hadoop生态系统项目以及如何在云环境下进行部署。
  • Preface
    Chapter 1:Getting Started with Hadooo v2
    IntrOductiOn
    Setting up Hadoop v2 on your local machine
    Writing a WordCount MapReduce application,bundling it
    and running it using the Hadoop local mode
    Adding a combiner step to the WordCount MapReduce program
    Setting up HDFS
    Setting up Hadoop YARN in a distributed cluster environment
    using Hadoop v2
    Setting up Hadoop ecosystem in a distributed cluster environment
    using a Hadoop distribution
    HDFS command-line file operations
    Running the WordCount program in a distributed cluster environment
    Benchmarking HDFS using DFSIO
    Benchmarking Hadoop MapReduce using TeraSort
    Chapter 2:Cloud Deployments—Using Hadoop YARN on
    Cloud Environments
    Introduction
    Running Hadoop MapReduce v2 computations using Amazon
    Elastic MapReduce
    Saving money using Amazon EC2 Spot Instances to execute EMR job flows
    Executing a Pig script using EMR
    Executing a Hive script using EMR
    Creating an Amazon EMR job flow using the AWS Command Line Interface
    Deploying an Apache HBase cluster on Amazon EC2 using EMR
    Using EMR bootstrap actions to configure VMs for the Amazon EMR jobs
    Using Apache Whirr to deploy an Apache Hadoop cluster in a
    cloud environment
    Chapter 3:Hadoop Essentials—C0nfigurations,Unit Tests,and Other APIs
    Introduction
    Optimizing Hadoop YARN and MapReduce cOnfiguratiOns for
    cluster deployments
    Shared user Hadoop clusters—-using Fair and Capacity schedulers
    Setting classpath precedence to user-provided JARs
    Speculative execution of straggling tasks
    Unit testing Hadoop MapReduce applications using MRUnit
    Integration testing Hadoop MapReduce applications using
    MiniYarnCluster
    Adding a new DataNode
    Decommissioning DataNodes
    Using multiple disks/volumes and limiting HDFS disk usage
    Setting the HDFS block size
    Setting the file replication factor
    Using the HDFs Java API
    Chapter 4:Develooin~ComDlex Hadooo MaoReduce Aoolications
    IntrOductiOn
    Choosing appropriate Hadoop data types
    Implementing a custom Hadoop Writable data type
    Implementing a custom Hadoop key type
    Emitting data of different value types from a Mapper
    Choosing a suitable Hadoop InputFormat for your input data format
    Adding support for new input data formats——implementing
    a custom InputFormat
    Formatting the results of MapReduce computations——using
    Hadoop OutputFormats
    Writing multiple outputs from a MapReduce computation
    Hadoop intermediate data partitioning
    Secondary sorting——sorting Reduce input values
    BrOadcasting and distributing shared resources to tasks in a
    MapReduce job—Hadoop DistributedCache
    Using Hadoop with legacy applications—-Hadoop streaming
    Adding dependencies between MapReduce jobs
    Hadoop counters to report custom metrics
    Chapter5:Analvtics
    Introduction
    Simple analytics using MapReduce
    Performing GROUP BY using MapReduce
    Calculating frequency distributions and sorting using MapReduce
    Plotting the Hadoop MapReduce results using gnuplot
    Calculating histograms using MapReduce
    Calculating Scatter plots using MapReduce
    Parsing a complex dataset with Hadoop
    Joining two datasets using MapReduce
    Chapter6:Hadooo Ecosystem—Apache Hive
    Introduction
    Getting started with Apache Hive
    Creating databases and tables using Hive CLI
    Simple SQL-style data querying using Apache Hive
    Creating and populating Hive tables and views using Hive query results
    Utilizing different storage formats in Hive.storing table data
    using ORC files
    Using Hive built-in functions
    Hive batch mode-using a query file
    Performing a join with Hive
    Creating partitioned Hive tables
    Writing Hive User·defined Functions(UDF)
    HCatalog-·performing Java MapReduce computations on
    data mapped to Hive tables
    HCatalog——writing data to Hive tables from Java
    MapReduce computations
    Chapter7:HadooD Ecosystem II—Pig.HBase.Mahout.and Sannn
    Introduction
    Getting started with Apache Pig
    Joining two datasets using Pig
    Accessing a Hive table data in Pig using HCatalog
    Getting started with Apache HBase
    Data random access using Java client APIs
    Running MapReduce jobs on HBase
    Using Hive to insert data into HBase tables
    Getting started with Apache Mahout
    Running K-means with Mahout
    Importing data to HDFS from a relational database using Apache Sqoop
    Exporting data from HDFs to a relational database using Apache Sqoop
    Tahie OrContencs
    Chapter8:Searching and Indexine
    Introduction
    Generating an inverted index using Hadoop MapReduce
    Intradomain web crawling using Apache Nutch
    Indexing and searching web documents using Apache Solr
    Configuring Apache HBase as the backend data store for Apache Nutch
    Whole web crawling with Apache Nutch using a HadooP/HBase cluster
    Elasticsearch for indexing and searching
    Generating the in-links graph for crawled web pages
    Chapter 9:CIassmcatiOns。Recommendations,and Findineg RelationshipS
    Introduction
    Performing content—based recommendations
    Classification using the naive Bayes classifier
    Assigning advertisements to keywords using the Adwords
    balance algorithm
    Chapter 10:Mass Text Data processing
    Introduction
    Data preprocessing using Hadoop streaming and Python
    De-duplicating data using Hadoop streaming
    Loading large datasets to an Apache HBase data store—importtsv
    and bulkload
    Creating TF and TF-IDF vectors for the text data
    Clustering text data using Apache Mahout
    Topic discovery using Latent Dirichlet Allocation(LDA)
    Document classification using Mahout Naive Bayes Classifier
    Index
微信公众号

热门文章

更多