一、Hbase數(shù)據(jù)庫(kù)概述; 二、Hbase體系結(jié)構(gòu); 三、Hbase數(shù)據(jù)庫(kù)模型; 四、總結(jié)Hbase整體特點(diǎn); 五、案例:搭建Hbase分布式數(shù)據(jù)庫(kù)系統(tǒng); 一、Hbase數(shù)據(jù)庫(kù)概述; 概述:Hbase是一個(gè)基于HDFS的面向列的分布式數(shù)據(jù)庫(kù),源于Google的BigTable基于GFS進(jìn)行分布式數(shù)據(jù)存儲(chǔ)一樣,前文提到,Hbase是基于流式數(shù)據(jù)訪問(wèn),對(duì)于第時(shí)間延遲的數(shù)據(jù)訪問(wèn)并不適合在HDFS上運(yùn)行,所以需要實(shí)時(shí)性的隨機(jī)訪問(wèn)超大規(guī)模的數(shù)據(jù)集,使用Hbase則是更好的選擇; 作用:Hbase作為典型的非關(guān)系型數(shù)據(jù)庫(kù),Nosql數(shù)據(jù)庫(kù)主要分為以下幾類: ?基于鍵值對(duì)存儲(chǔ)的類型; ?基于文檔存儲(chǔ)的類型; ?基于列存儲(chǔ)的類型; ?基于圖形數(shù)據(jù)存儲(chǔ)的類型; 在Nosql領(lǐng)域中,Hbase本身不是最優(yōu)秀的,但得益于與hadoop的整合,為其帶來(lái)了強(qiáng)大的擴(kuò)展空間。Hbase本質(zhì)只有插入操作,更新刪除等操作都是通過(guò)插入操作來(lái)完成,這是由于底層HDFS流式訪問(wèn)(一次寫(xiě)入,多次讀?。Q定的,每次插入數(shù)據(jù)時(shí),數(shù)據(jù)會(huì)帶有“時(shí)間戳”的標(biāo)記,形成多個(gè)版本,Hbase對(duì)于一個(gè)數(shù)據(jù)會(huì)保留其固定的版本數(shù)量,如果在查詢時(shí),也是顯示出距離當(dāng)前時(shí)間最近的一個(gè)新版本; 二、Hbase體系結(jié)構(gòu); 體系結(jié)構(gòu): 架構(gòu)分析:Hbase體系結(jié)構(gòu)由單個(gè)HMaster服務(wù)器和多個(gè)HRegion Server服務(wù)器組成,而所有這些服務(wù)器是通過(guò)ZooKeeper來(lái)進(jìn)行協(xié)調(diào)并處理各個(gè)服務(wù)器運(yùn)行期間可能遇見(jiàn)的問(wèn)題; 組件分析: ?HStore:多個(gè)HStore組成一個(gè)HRegion,本身由兩部分組成:Memstore和Storefile。首先用戶寫(xiě)入的數(shù)據(jù)存放到Memstore中,當(dāng)Memstore滿了后刷入Storefile; ?HRegion:由多個(gè)HStore組成,Hbase使用表存儲(chǔ)數(shù)據(jù)集,表由行和列組成,但與傳統(tǒng)關(guān)系型數(shù)據(jù)庫(kù)不同的是,當(dāng)表的大小超過(guò)設(shè)定的值時(shí),Hbase會(huì)自動(dòng)將表劃分為不同的區(qū)域HRegion(此操作也稱之為HRegion分裂),它是Hbase集群上分布式存儲(chǔ)和負(fù)載均衡的最小單位,這一點(diǎn)和HDFS中文件與文件塊存儲(chǔ)的概念類似; ?Hlog:存儲(chǔ)數(shù)據(jù)日志,到達(dá)HRegion上的寫(xiě)操作首先被追加到日志中,然后才被加載到Memstore,主要功能為故障修復(fù),當(dāng)某臺(tái)HRegionServer發(fā)生故障,新的HRegionServer在加載HRegion的時(shí)候可以通過(guò)Hlog對(duì)數(shù)據(jù)進(jìn)行恢復(fù); ?HRegionServer:由多個(gè)HRegion組成,在整個(gè)集群中可能存在多個(gè)節(jié)點(diǎn),每個(gè)節(jié)點(diǎn)只能運(yùn)行一個(gè)HRegionServer,負(fù)責(zé)對(duì)HDFS中讀寫(xiě)數(shù)據(jù)和管理HRegion和Hlog; ?HMaster:每臺(tái)HRegionServer都會(huì)與HMaster進(jìn)行通信,HMaster的主要任務(wù)就是告訴HRegionServer它需要維護(hù)哪些HRegion,具體功能如下: 1.管理用戶對(duì)表的增刪改查操作; 2.管理HRegionServer的負(fù)載均衡,動(dòng)態(tài)調(diào)整HRegion分布; 3.在HRegion分裂后,負(fù)責(zé)新的HRegion的分配; 4.在HRegionServer停機(jī)后,負(fù)責(zé)失效HRegionServer上的HRegion的遷移; ?ZooKeeper:存儲(chǔ)的是Hbase中的ROOT表(根數(shù)據(jù)表)和META表(元數(shù)據(jù)表),元數(shù)據(jù)表保存普通用戶表的HRegion標(biāo)識(shí)符信息, 標(biāo)識(shí)符格式為:表名+開(kāi)始主鍵+唯一ID。隨著HRegion的分裂,標(biāo)識(shí)符信息也會(huì)發(fā)生變化,分成多個(gè)HRegion后,需要由一個(gè)根數(shù)據(jù)表來(lái)貫穿多個(gè)元數(shù)據(jù)表; 此外,ZooKeeper還負(fù)責(zé)HRegionServer故障時(shí),通知HMaster進(jìn)行HRegion遷移;若HMaster出現(xiàn)故障,ZooKeeper負(fù)責(zé)恢復(fù)HMaster,并且保證有且只有一個(gè)HMaster正在運(yùn)行; ?Client:客戶端訪問(wèn)Hbase的單位,訪問(wèn)時(shí),首先訪問(wèn)Zookeeper--ROOT--META--table; 三、Hbase數(shù)據(jù)庫(kù)模型; 1.數(shù)據(jù)模型: 表(table):不存儲(chǔ)值為null的數(shù)據(jù),索引是行關(guān)鍵字、列關(guān)鍵字、時(shí)間戳; 行關(guān)鍵字(row key):行的主鍵,唯一標(biāo)識(shí)一行數(shù)據(jù); 列族(Colume Family):行中的列被分為“列族”,同一個(gè)列族的所有成員具有相同的列族前綴,一個(gè)表的列族必須在創(chuàng)建表時(shí)預(yù)先定義,格式(列名:修飾符); 列關(guān)鍵字(Colume key):列鍵,格式為: 存儲(chǔ)單元格(Cell):在Hbase中,值作為一個(gè)單元保存在單元格中,要定位一個(gè)單元,需要滿足“行鍵+列鍵+時(shí)間戳”三個(gè)要素; 時(shí)間戳(Timestamp):插入單元格時(shí)的時(shí)間戳,默認(rèn)作為單元格的版本號(hào); 2.存儲(chǔ)方式: 關(guān)系型數(shù)據(jù)庫(kù): 主鍵設(shè)置為name列,查找時(shí)根據(jù)學(xué)生名字可以很容易的實(shí)現(xiàn)查找,那么請(qǐng)思考以下問(wèn)題; ?如果現(xiàn)在新增加一門課程,如何在不改變表結(jié)構(gòu)的情況下進(jìn)行保存新課程的成績(jī)呢? ?如果tom同學(xué)數(shù)學(xué)成績(jī)參加了補(bǔ)考,如何記錄其同學(xué)的兩次數(shù)學(xué)成績(jī)? ?如若tom同學(xué)數(shù)學(xué)沒(méi)有成績(jī),那么表中值為null,即使為空,也會(huì)占用存儲(chǔ)空間; HBase數(shù)據(jù)庫(kù): 在不同時(shí)間插入不同數(shù)據(jù)時(shí),會(huì)生成時(shí)間戳,并且在列族內(nèi)生成數(shù)據(jù)記錄; 在HBase數(shù)據(jù)庫(kù)實(shí)際存儲(chǔ)時(shí),其表內(nèi)空值不計(jì)入存儲(chǔ)空間內(nèi); 四、總結(jié)Hbase整體特點(diǎn): HBase就是這樣一個(gè)基于列模式的映射數(shù)據(jù)庫(kù),它只能表示簡(jiǎn)單的鍵值的映射關(guān)系。與關(guān)系型數(shù)據(jù)庫(kù)相比,它有如下特點(diǎn): ?數(shù)據(jù)類型:HBase只有簡(jiǎn)單的字符串類型,它只保存字符串。而關(guān)系型數(shù)據(jù)庫(kù)有豐富的類型選擇和存儲(chǔ)方式; ?數(shù)據(jù)操作:HBase 只有簡(jiǎn)單的插入、查詢、刪除、清空等操作,表和表之間是分離的,沒(méi)有復(fù)雜的表和表之間的關(guān)系,所以不能、也沒(méi)有必要實(shí)現(xiàn)表和表之間的關(guān)聯(lián)操作。而關(guān)系型數(shù)據(jù)庫(kù)有多種連接操作; ?存儲(chǔ)模式:HBase 是基于列存儲(chǔ)的,每個(gè)列族都由幾個(gè)文件保存,不同列族的文件是分離的。關(guān)系型數(shù)據(jù)庫(kù)是基于表格結(jié)構(gòu)和行模式存儲(chǔ)的; ?數(shù)據(jù)維護(hù):HBase 的更新操作實(shí)際上是插入了新的數(shù)據(jù),它的舊版本依然會(huì)保留,而不是關(guān)系型數(shù)據(jù)庫(kù)的替換修改; ?可伸縮性:HBase 這類分布式數(shù)據(jù)庫(kù)就是為了這個(gè)目的而開(kāi)發(fā)出來(lái)的,所以它能夠輕松地增加或減少硬件數(shù)量,并且對(duì)錯(cuò)誤的兼容性比較高。而關(guān)系型數(shù)據(jù)庫(kù)通常需要增加中間層才能實(shí)現(xiàn)類似的功能; 五、案例:搭建Hbase完全分布式數(shù)據(jù)庫(kù)系統(tǒng); 案例環(huán)境:
版本對(duì)應(yīng): 下載位置:http://www./index.html#projects-list Hbase部署環(huán)境: 單機(jī)模式:在單臺(tái)主機(jī)運(yùn)行Hbase; 偽分布式模式:HBase只在hadoop的namenode節(jié)點(diǎn)運(yùn)行,與單機(jī)模式類似,只是其數(shù)據(jù)文件可以存儲(chǔ)在datanode節(jié)點(diǎn)上; 完全分布式模式:HBase運(yùn)行在hadoop集群的多個(gè)節(jié)點(diǎn)上,通常將HMaster運(yùn)行在namenode節(jié)點(diǎn)上,將HRegionServer運(yùn)行在datanode節(jié)點(diǎn)上; 案例步驟(保證多個(gè)節(jié)點(diǎn)之間時(shí)間的統(tǒng)一): ?搭建Hadoop分布式存儲(chǔ)集群(namenode和datanode); ?在master節(jié)點(diǎn)安裝部署Hbase程序; ?在master節(jié)點(diǎn)配置HBase程序; ?將master節(jié)點(diǎn)的habse程序復(fù)制到slave節(jié)點(diǎn); ?在master節(jié)點(diǎn)上開(kāi)啟HBase進(jìn)程并查看進(jìn)程; ?驗(yàn)證slave節(jié)點(diǎn)上的進(jìn)程狀態(tài); ?訪問(wèn)網(wǎng)頁(yè),查看HBase運(yùn)行狀態(tài); ?在master節(jié)點(diǎn)登錄HBase數(shù)據(jù)庫(kù),查看數(shù)據(jù)庫(kù)狀態(tài); ?HBase數(shù)據(jù)庫(kù)中基本管理操作; ?MapReduce結(jié)合HBase查詢表中行數(shù); ?搭建Hadoop分布式存儲(chǔ)集群(namenode和datanode); ?在master節(jié)點(diǎn)安裝部署Hbase程序; [root@master ~]# ls hbase-2.0.1-bin.tar.gz hbase-2.0.1-bin.tar.gz [root@master ~]# tar zxvf hbase-2.0.1-bin.tar.gz [root@master ~]# mv hbase-2.0.1 /usr/local/hbase [root@master ~]# ls /usr/local/hbase bin conf hbase-webapps lib NOTICE.txt RELEASENOTES.md CHANGES.md docs LEGAL LICENSE.txt README.txt [root@master ~]# chown hadoop:hadoop /usr/local/hbase/ -R ?在master節(jié)點(diǎn)配置HBase程序; [root@master ~]# su - hadoop [hadoop@master ~]$ vi /usr/local/hbase/conf/hbase-site.xml ##HBase站點(diǎn)相關(guān)配置文件 [hadoop@master ~]$ vi /usr/local/hbase/conf/hbase-env.sh ##HBase變量配置文件 export JAVA_HOME=/usr/local/java export HADOOP_HOME=/usr/local/hadoop export HBASE_HOME=/usr/local/hbase export HBASE_MANAGES_ZK=true 注解:export HBASE_MANAGES_ZK=true此配置項(xiàng)意為開(kāi)啟habse內(nèi)置的zookeeper進(jìn)程,使其隨HBase進(jìn)程一同啟動(dòng); [hadoop@master ~]$ vi /usr/local/hbase/conf/regionservers ##HBase的節(jié)點(diǎn) slave1 slave2 ?將master節(jié)點(diǎn)的habse程序復(fù)制到slave節(jié)點(diǎn); [root@slave1 ~]# mkdir /usr/local/hbase [root@slave1 ~]# chown hadoop:hadoop /usr/local/hbase/ [root@slave2 ~]# mkdir /usr/local/hbase [root@slave2 ~]# chown hadoop:hadoop /usr/local/hbase/ [hadoop@master ~]$ scp -r /usr/local/hbase/* hadoop@slave1:/usr/local/hbase [hadoop@master ~]$ scp -r /usr/local/hbase/* hadoop@slave2:/usr/local/hbase ?在master節(jié)點(diǎn)上開(kāi)啟HBase進(jìn)程并查看進(jìn)程; 注解:如若啟動(dòng)hbase時(shí),出現(xiàn):錯(cuò)誤:找不到或無(wú)法加載主類; 由于habse版本與hadoop版本導(dǎo)致,或者環(huán)境變量導(dǎo)致; ?驗(yàn)證slave節(jié)點(diǎn)上的進(jìn)程狀態(tài); ?訪問(wèn)網(wǎng)頁(yè),查看HBase運(yùn)行狀態(tài); http://192.168.100.101:16010 ?在master節(jié)點(diǎn)登錄HBase數(shù)據(jù)庫(kù),查看數(shù)據(jù)庫(kù)狀態(tài); ?在master節(jié)點(diǎn)訪問(wèn)hadoop存儲(chǔ)中數(shù)據(jù),驗(yàn)證數(shù)據(jù)文件狀態(tài); ?HBase數(shù)據(jù)庫(kù)中基本管理操作; [hadoop@master ~]# /usr/local/hbase/bin/hbase shell hbase(main):001:0> status ##查看狀態(tài) 1 active master, 0 backup masters, 2 servers, 0 dead, 1.0000 average load Took 0.8818 seconds hbase(main):002:0> create 'class','age','chengji' ##創(chuàng)建表,語(yǔ)法:create 表名 列族 列鍵 Created table class Took 1.5186 seconds => Hbase::Table - class hbase(main):003:0> list ##查看所有表 TABLE class 1 row(s) Took 0.0940 seconds => ["class"] hbase(main):004:0> describe 'class' ##查看表的詳細(xì)信息 Table class is ENABLED class COLUMN FAMILIES DESCRIPTION {NAME => 'age', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'f alse', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW' , CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PR EFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => ' 65536'} {NAME => 'chengji', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR = > 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => ' ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false' , PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} 2 row(s) Took 0.1701 seconds hbase(main):012:0> put 'class','tom','age','18' ##添加數(shù)據(jù),語(yǔ)法:put 表名 行鍵 列鍵 值 Took 0.1784 seconds hbase(main):013:0> put 'class','marry','age','20' Took 0.0262 seconds hbase(main):014:0> scan 'class' ##掃描class表中數(shù)據(jù) ROW COLUMN+CELL marry column=age:, timestamp=1535528846020, value=20 tom column=age:, timestamp=1535528825217, value=18 2 row(s) Took 0.0628 seconds hbase(main):017:0> put 'class','tom','chengji:math','95' ##插入數(shù)據(jù) Took 0.0217 seconds hbase(main):018:0> put 'class','tom','chengji:english','90' Took 0.0100 seconds hbase(main):019:0> put 'class','marry','chengji:math','85' Took 0.0130 seconds hbase(main):020:0> put 'class','marry','chengji:english','90' Took 0.0085 seconds hbase(main):021:0> scan 'class' ROW COLUMN+CELL marry column=age:, timestamp=1535528846020, value=20 marry column=chengji:english, timestamp=1535529132585, value=90 marry column=chengji:math, timestamp=1535529119078, value=85 tom column=age:, timestamp=1535528825217, value=18 tom column=chengji:english, timestamp=1535529101465, value=90 tom column=chengji:math, timestamp=1535529089638, value=95 2 row(s) Took 0.0120 seconds hbase(main):033:0> scan 'class',{COLUMN=>'chengji:math',LIMIT=>1} ##根據(jù)條件查找,顯示一行 ROW COLUMN+CELL marry column=age:, timestamp=1535528846020, value=20 marry column=chengji:english, timestamp=1535529132585, value=90 marry column=chengji:math, timestamp=1535529119078, value=85 1 row(s) Took 0.0456 seconds hbase(main):038:0> get 'class','tom' ##獲取表中數(shù)據(jù),語(yǔ)法:get 表名 行鍵 COLUMN CELL age: timestamp=1535528825217, value=18 chengji:english timestamp=1535529101465, value=90 chengji:math timestamp=1535529089638, value=95 1 row(s) Took 0.0125 seconds hbase(main):042:0> get 'class','tom',{COLUMN=>'age:'} ##根據(jù)條件獲取表中數(shù)據(jù),語(yǔ)法:get 表名 行鍵 {COLUMN=>列族} COLUMN CELL age: timestamp=1535528825217, value=18 1 row(s) Took 0.0188 seconds hbase(main):043:0> get 'class','tom','age:' ##根據(jù)條件獲取表中數(shù)據(jù),同上 COLUMN CELL age: timestamp=1535528825217, value=18 1 row(s) Took 0.0171 seconds hbase(main):044:0> get 'class','tom','chengji:english' COLUMN CELL chengji:english timestamp=1535529101465, value=90 1 row(s) Took 0.0162 seconds hbase(main):045:0> delete 'class','tom','chengji:english' ##刪除表中數(shù)據(jù)記錄,語(yǔ)法:delete 表名 行鍵 列鍵 Took 0.0367 seconds hbase(main):046:0> get 'class','tom','chengji:english' ##獲取表中數(shù)據(jù)記錄,無(wú)法獲取 COLUMN CELL 0 row(s) Took 0.0226 seconds hbase(main):047:0> get 'class','tom' ##獲取表中tom此行鍵的所有內(nèi)容 COLUMN CELL age: timestamp=1535528825217, value=18 chengji:math timestamp=1535529089638, value=95 1 row(s) Took 0.0106 seconds hbase(main):048:0> disable 'class' ##刪除表之前,需要先將表關(guān)閉disable Took 0.8495 seconds hbase(main):049:0> drop 'class' ##刪除表 Took 0.4907 seconds hbase(main):050:0> list ##查看所有表 TABLE 0 row(s) Took 0.0086 seconds => [] hbase(main):051:0> exit ?MapReduce結(jié)合HBase查詢表中行數(shù); [hadoop@master ~]$ cp /usr/local/hbase/conf/hbase-site.xml /usr/local/hadoop/etc/hadoop/ [hadoop@master ~]$ vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/hbase/lib/* [hadoop@master ~]$ scp -r /usr/local/hadoop/etc/hadoop/hadoop-env.sh hadoop@slave1:/usr/local/hadoop/etc/hadoop/ [hadoop@master ~]$ scp -r /usr/local/hbase/conf/hbase-site.xml hadoop@slave1:/usr/local/hbase/conf/ [hadoop@master ~]$ scp -r /usr/local/hadoop/etc/hadoop/hadoop-env.sh hadoop@slave2:/usr/local/hadoop/etc/hadoop/ [hadoop@master ~]$ scp -r /usr/local/hbase/conf/hbase-site.xml hadoop@slave2:/usr/local/hbase/conf/ [hadoop@master ~]$ hadoop jar /usr/local/hbase/lib/hbase-server-2.0.1.jar RunJar jarFile [mainClass] args... [hadoop@master ~]$ /usr/local/hbase/bin/hbase shell [hadoop@master ~]$ /usr/local/hbase/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'haha1' |
|