距離上次使用Hadoop也是好久以前的事了

瞬間Hadoop就從0.13標到現在的0.20了XD

但為了搭配Hbase使用,目前我安裝的是Hadoop-0.19.1

環境是Ubuntu 8.10

主要有四台機器:

hadoop0: 192168.1.1

hadoop1: 192168.1.2

hadoop2: 192168.1.3

hadoop3: 192168.1.4

安裝在/usr/local底下

 

前置作業:

設定NFS以及NIS:請參考 這裡

 

安裝Hadoop:

1. 到Hadoop的官方網站下載hadoop-0.19.1.tar.gz到/usr/local並解壓縮到/usr/local/hadoop

 

2. 設定$HADOOP_HOME/conf/hadoop-site.xml如下

<property>
  <name>hadoop.system.dir</name> 
  <value>/usr/local/hadoopsys</value>
  <description>
          A base for other temporary directories. Please make sure that this 
          directory is readable/writable for the hadoop-related processes
  </description>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://hadoop0:50040</value>
  <description>
  The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.
  </description>
</property>
<property>
  <name>mapred.job.tracker</name>
  <value>hadoop0:50020</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>
<property>
  <name>dfs.replication</name>
  <value>2</value>
  <description>Default block replication. 
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>
<property>
  <name>dfs.hosts</name>
  <value></value>
  <description>Names a file that contains a list of hosts that are
  permitted to connect to the namenode. The full pathname of the file
  must be specified.  If the value is empty, all hosts are
  permitted.</description>
</property>
<property>
  <name>mapred.hosts</name>
  <value></value>
  <description>Names a file that contains the list of nodes that may
  connect to the jobtracker.  If the value is empty, all hosts are
  permitted.</description>
</property>
<property>
  <name>mapred.system.dir</name> 
  <value>/mapred/system</value>
  <description>
          The shared directory where MapReduce stores control files in the HDFS file system.
  </description>
</property>
<property>
  <name>mapred.temp.dir</name>
  <value>/mapred/temp</value>
  <description>A shared directory for temporary files in the HDFS file system.
  </description>
</property>
<property>
  <name>fs.checkpoint.dir</name>
  <value>${hadoop.system.dir}/dfs/namesecondary</value>
  <description>Determines where on the local filesystem the DFS secondary
      name node should store the temporary images and edits to merge.  
  </description>
</property>
<property>
  <name>dfs.name.dir</name>
  <value>${hadoop.system.dir}/dfs/name</value>
  <description>Determines where on the local filesystem the DFS name node
      should store the name table.  If this is a comma-delimited list
      of directories then the name table is replicated in all of the
      directories, for redundancy. </description>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>${hadoop.system.dir}/dfs/data,${hadoop.system.dir}/dfs/data1</value>
  <description>Determines where on the local filesystem an DFS data node
  should store its blocks.  If this is a comma-delimited
  list of directories, then data will be stored in all named
  directories, typically on different devices.
  Directories that do not exist are ignored.
  </description>
</property>
其中需要注意的是
hadoop.system.dir
這設定的是真正的HDFS存放的位置,所以要小心不要存到NFS目錄底下了,
不然每個node都是access同一個disk,就沒有分散式了
( 而且如果這個硬碟掛了就全掛了XD)
fs.default.name
這是設定HDFS的主機以及port,
這邊的設定值與之後的Hbase設定有關
dfs.replication
這個指的是每一份在HDFS上面的資料會有幾份的複製本
越多當然就越能做到Fault Tolerance,但相對的效率也較低
3. 設定$HADOOP_HOME/conf/hadoop-env.sh裡的JAVA_HOME和$HADOOP_LOG到/tmp底下
4. 新增conf/masters 與 conf/slaves,內容如下
conf/masters: 
hadoop0
conf/slaves:
hadoop1
hadoop2 
hadoop3
5. 格式化HDFS
$HADOOP_HOME/bin/hadoop namenode -format
6. 啟動hadoop,作法與 這邊 一樣
Cluster的設定與Local的設定幾乎一樣,只有slaves設定和conf/hadoop-site.xml的部份要稍微修改
因為之前就寫過了所以懶得再寫一遍orz
如果上面有寫錯的/有任何更好的建議的/有問題的
歡迎留言囉!
創作者介紹

[todo Austin] 奧斯丁。土豆

austintodo 發表在 痞客邦 PIXNET 留言(0) 人氣()