- 浏览: 468521 次
- 性别:
- 来自: 西安
文章分类
最新评论
-
752258:
...
Java文件操作(FIle类) [转] -
darrendu:
帮我解决了问题
启动出了问题:unexpected inconsistency;RUN fsck MANUALLY -
_lostman:
怎么反着来?
如果我现有一个第三方的库,如何在JAVA中调用? ...
java中JNI调用c++的dll -
caoruntao:
brother涛,你太牛了。博客访问量竟然有6W多。不得了呀
java clone -
passlicense:
好文章!顶~
unicode和ISO 8859-1
[转]http://douglee.iteye.com/blog/698773
Hadoop 包括下面这些子项目:
- HDFS : A distributed file system that provides high throughput access to application data. HDFS: 一个能够提供高吞吐量访问应用数据的分布式文件系统。其思想来自于 Google 的 The Google File System (GFS)
- MapReduce : A software framework for distributed processing of large data sets on compute clusters. MapReduce: 在。其思想来自于 Google 的 MapReduce: Simplified Data Processing on Large Clusters
本人已买且读过部分章节。翻译的语句明显不通,但是该刚接触 Hadoop 挚友的还是很有帮助的。从中文版的内容来看,英文原版的质量非常不错。所以,建议将她和英文版(下载电子版即可,下载地址详见下面,附件也有文件下载),以及 Hadoop 官方文档信息一起结合起来学习和实践。这应该是一种不错的折衷方案吧,毕竟有关 Hadoop 的经典中文书籍少之又少。
《Hadoop: The Definitive Guide 》
从中文版的内容介绍来看,她对 Hadoop 的 HDFS 和 MapReduce 的具体实现细节都介绍地很详细。个人认为她与《Java 编程思想》有的一拼。英文原版下载地址:Oreilly.Hadoop.The.Definitive.Guide.Jun.2009.rar
有选择的看了这本书的部分章节,发现她对云计算(包括概念、相关技术)的解释还是颇有深度,且是用通俗易懂的语言阐明非常深奥的知识实属难得。同时也看出作者对云计算的理解还是很有深度的。
The Google File System
Sanjay Ghemawat , Howard Gobioff , and Shun-Tak Leung
Abstract
We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.
While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points.
The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients.
In this paper, we present file system interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.
Appeared in:
19th ACM Symposium on Operating Systems Principles,
Lake George, NY, October, 2003.
Download: PDF Version
MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat
Abstract
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.
Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.
Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.
Appeared in:
OSDI'04: Sixth Symposium on Operating System Design and Implementation,
San Francisco, CA, December, 2004.
Download: PDF Version
Slides: HTML Slides
想要学习 Google 技术的挚友,不妨时常访问她: Google Research 技术论文中心
发表评论
-
java clone
2012-02-27 17:12 1319【转】http://zhengjunwei2007-163-c ... -
Java程序性能优化
2012-02-27 17:02 786一、避免在循环条件中使用复杂表达式 在不做编译优化的情况 ... -
Java 访问权限
2012-02-27 16:54 1174[转]http://www.cnblogs.com/itao/ ... -
Java方法继承、方法重载、方法覆盖小总结
2012-02-27 16:42 1106[转] http://blog.csdn.net/c ... -
关于 java 您不知道的 5 件事 系列
2011-08-31 15:16 819http://www.ibm.com/developerwor ... -
ConcurrentHashMap与CopyOnWriteArrayList比较
2011-08-31 15:10 1291【转】http://www.iteye.com/t ... -
ConcurrentHashMap Collections.synchronizedMap和Hashtable讨论
2011-08-31 15:09 1155[转]http://www.w3china.org/blog/ ... -
JAVA的Random类
2011-08-30 15:22 843【转】http://www.cnblogs.com ... -
Direct vs non-direct ByteBuffer
2011-08-30 14:55 1125[转]http://littcai.iteye.com/blo ... -
详细介绍Java垃圾回收机制
2011-08-29 22:57 1229【转】http://developer.51cto.com/a ... -
System.gc() 和System.runFinalization()
2011-08-29 22:52 1582The Java language provide ... -
Java对象的强、软、弱和虚引用
2011-08-29 21:43 877[转]http://developer.51cto.com/a ... -
java对象,引用,实例
2011-08-22 11:05 570[转]http://jzgl-javaeye.iteye.co ... -
垃圾收集解析
2011-08-22 09:15 774[转]http://www.cnblogs.com/rolle ... -
Java对象序列化ObjectOutputStream和ObjectInputStream示例
2011-08-21 16:23 1049[转]http://sjsky.iteye.com/blo ... -
Java i++原理及i=i++的问题说明
2011-08-20 16:21 1539[转]http://blog.sina.com.cn/s/bl ... -
Java中的transient,volatile和strictfp关键字
2011-06-21 16:36 917转:http://www.iteye.com/topic/52 ... -
如何使用Proxy模式及Java内建的动态代理机制
2011-06-09 20:09 9011.Proxy模式 代理模式支持将某些操作从实际的对象中 ... -
System.load和System.loadLibrary
2011-05-24 21:49 1420转:http://hi.baidu.com/mynetbean ... -
static块到底什么时候执行?
2011-05-19 18:34 1190转:http://www.iteye.com/topic/11 ...
相关推荐
NULL 博文链接:https://douglee.iteye.com/blog/698773
Hadoop学习的书籍,里面包含Hadoop实战和Hadoop权威指南两本Hadoop学习的经典书籍,高清。。
Hadoop权威指南清晰版。包含了HDFS,Hive,MapReduce,HBase,Sqoop等介绍。Hadoop入门必备资料。权威经典书籍。
Hadoop权威指南,英文文字版。经典的Hadoop书籍
Hadoop 权威指南 第三版 高清 学习Hadoop 大数据开发经典书籍,必读
动物书HADOOP权威指南 中文第3版 超经典完整版 本资料共包含以下附件: 动物书HADOOP权威指南 中文第3版 超经典完整版.part1.rar 动物书HADOOP权威指南 中文第3版 超经典完整版.part2.rar 动物书HADOOP权威指南 中文...
学习大数据经典书籍。学习Hadoop最权威书籍。介绍Hadoop各项技术。
其中包含两本书,一本是Hadoop权威指南第二版,一本是Hadoop权威指南第三版。Hadoop学习的经典书籍,欢迎来下载。第三版是英文书籍。第二版是中文书籍。
Hadoop权威指南_第四版_中文版,经典书籍,最新第四版
想学hadoop的进来看吧,经典书籍,详细内容请看原文即可。
Hadoop经典书籍,学习Hadoop必备,很值得学习
《Hadoop权威指南》 《Hadoop权威指南》,作为Hadoop的经典⼊门书籍,从Hadoop的缘起,由浅⼊深,结合理论和实践,全⽅位地介绍Hadoop框架体系 结构。 《Hive编程指南》 《Hive编程指南》是⼀本Apache Hive的编程...