【Python數(shù)據(jù)挖掘】第三篇

highoo 2019-03-20

展開全文

一、Numpy

數(shù)組是一系列同類型數(shù)據(jù)的集合，可以被非零整數(shù)進(jìn)行索引，可以通過列表進(jìn)行數(shù)組的初始化，數(shù)組也可以通過索引進(jìn)行切片。

Numpy提供了幾乎全部的科學(xué)計(jì)算方式。

1 2	`# numpy 導(dǎo)入方式:` `import` `numpy as np`

①、創(chuàng)建數(shù)組：

1.簡單一二維數(shù)組

np.array( [1,2,3,4] ) # 一維數(shù)組

np.array( ['1',5,True] ) # 數(shù)組內(nèi)容為字符型

np.array( [True,True] ) # 布爾型數(shù)組

np.array( [[1,2,3,4] , [5,6,7,8]] ) # 二維數(shù)組

2.范圍函數(shù)生成一維數(shù)組:

np.arange([start,] stop[, step,], dtype=None)

np.arange(1,10)

# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

3.均分函數(shù)生成一維數(shù)組:(等差數(shù)列)

np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)

start : 初始值

stop : 末尾值

num : 生成的樣本數(shù) , 必須為非負(fù)數(shù)

endpoint : 默認(rèn)True , 數(shù)組最后一個(gè)元素為 stop項(xiàng)

# 數(shù)組step計(jì)算:

當(dāng) endpoint = True 時(shí), step = (end - start) / (num - 1)

當(dāng) endpoint = False 時(shí), step = (end - start) / num

np.linspace(1,10,num=5,endpoint=False)

# array([ 1. , 2.8, 4.6, 6.4, 8.2])

4.創(chuàng)建元素為1 的數(shù)組

1 2	`np.ones(4)` `# 一維數(shù)組 array([ 1., 1., 1., 1.])` `np.ones([4,5])` `# 二維數(shù)組 4行5列`

5.創(chuàng)建元素為0 的數(shù)組

1 2	`np.zeros(4)` `# 一維數(shù)組 array([ 0., 0., 0., 0.])` `np.zeros([4,5])` `# 二維數(shù)組 4行5列`

6.創(chuàng)建一定形狀的數(shù)組

numpy.empty(shape, dtype=float, order='C')

np.empty([2,3]) # 創(chuàng)建2行3列數(shù)組

7.創(chuàng)建方陣型，行列相等，對(duì)角元素為1，其余元素為0

np.eye(4) # 4行4列 , 元素為0 , 對(duì)角線元素為1

array([[ 1., 0., 0., 0.],

[ 0., 1., 0., 0.],

[ 0., 0., 1., 0.],

[ 0., 0., 0., 1.]])

8.創(chuàng)建與某數(shù)組大小相同的數(shù)組，元素為0

1 2	`arr1` `=` `np.eye(4)` `# 4行4列` `arr2` `=` `np.empty_like(arr1)` `# 4行4列`

9.Series轉(zhuǎn)換Array

1	`np.array(series)`

②、Numpy下的random類創(chuàng)建隨機(jī)數(shù)組

1.創(chuàng)建符合 [0:1) 均勻分布的數(shù)組

np.random.rand(d0, d1, ..., dn)

np.random.rand(4,5) # 4行5列數(shù)組

2.創(chuàng)建符合標(biāo)準(zhǔn)正態(tài)分布的數(shù)組

np.random.randn(d0, d1, ..., dn)

np.random.randn(4,5) # 4行5列數(shù)組

3.創(chuàng)建隨機(jī)整數(shù) 的數(shù)組 , (不包含)

np.random.randint(low, high=None, size=None, dtype='l')

np.random.randint(5, size=(2, 4)) # 生成0到4之間的 2 x 4數(shù)組

array([[4, 0, 2, 1],

[3, 2, 2, 0]])

4.創(chuàng)建隨機(jī)整數(shù) 的數(shù)組 , (包含)

np.random.random_integers(low, high=None, size=None)

np.random.random_integers(5, size=(2, 4)) # 生成1到5之間的 2 x 4數(shù)組

array([[3, 3, 4, 3],

[3, 4, 1, 5]])

5.創(chuàng)建 [0.0,1.0) 隨機(jī)浮點(diǎn)數(shù)

np.random.random(size=None)

np.random.random_sample(size=None)

np.random.ranf(size=None)

np.random.sample(size=None)

np.random.random( (5,) )

np.random.random_sample( (4,5) ) # 4行5列浮點(diǎn)數(shù)數(shù)組

6.從給定的1-D數(shù)組生成隨機(jī)樣本

np.random.choice(a, size=None, replace=True, p=None)

p：1-D array-

like，可選 ( 設(shè)置概率 )與a中的每個(gè)條目相關(guān)聯(lián)的概率。如果沒有給出樣本，則假設(shè)在a中的所有條目均勻分布。

np.random.choice(5, 3) # 從np.arange(5)生成大小為3的均勻隨機(jī)樣本：

np.random.choice(5, 3, replace=False) # 從np.arange(5)生成大小為3的均勻隨機(jī)樣本，沒有重復(fù)：

aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher']

np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])

7.返回隨機(jī)字節(jié)

np.random.bytes(length)

np.random.bytes(10)

# b'u\x1e\xd6\x8d\xf5]\xab6\xed\x0c'

③、重要屬性

np.shape

# 查看數(shù)組的維度  如:  (4,)  一個(gè)數(shù)字代表1維 , (5,6) 代表二維,5行6列數(shù)組  , .....

np.size # 查看數(shù)組元素總個(gè)數(shù)

np.ndim # 查看數(shù)組維度數(shù)

len(array) # 查看數(shù)組行數(shù)

④、重要方法

1. 給定條件判斷元素

numpy.where(condition[, x, y]) # 根據(jù)條件，從x或y返回元素。

np.where(arr1 > 0 , True , False )

array([[False, True, True, False],

[ True, False, False, False],

[ True, True, True, False]], dtype=bool)

2.查找數(shù)組唯一元素

np.unique(ar, return_index=False, return_inverse=False, return_counts=False)[source]

return_counts = True # 返回出現(xiàn)次數(shù)<br>

np.unique([1, 1, 2, 2, 3, 3])

# array([1, 2, 3])

a = np.array([[1, 1], [2, 3]])

np.unique(a)

# array([1, 2, 3])

3.兩個(gè)數(shù)組連接

np.concatenate((a1, a2, ...), axis=0) # 沿現(xiàn)有軸連接數(shù)組序列。

a = np.array([[1, 2], [3, 4]])

b = np.array([[5, 6]])

np.concatenate((a, b), axis=0)

array([[1, 2],

[3, 4],

[5, 6]])

np.concatenate((a, b.T), axis=1)

⑤、索引與切片

⑥、數(shù)組計(jì)算

1.加法

a = np.array([1,2,3])

b = np.array([-1,2,-4])

np.add(x1, x2[, out]) = <ufunc 'add'>

np.add(a,b) # 等效于 a + b

# array([ 0, 4, -1])

2.減法

np.subtract(x1, x2[, out]) = <ufunc 'subtract'>

np.subtract(a,b) # a - b

3.乘法

np.multiply(x1, x2[, out]) = <ufunc 'multiply'>

np.multiply(a,b) # a * b

4.除法

np.divide(x1, x2[, out]) = <ufunc 'divide'>

np.divide(a,b) # a / b

5.點(diǎn)積 (相乘后把元素相加)

兩矩陣的點(diǎn)積需要左邊矩陣列與右邊矩陣行數(shù)目相等

np.dot(a, b, out=None)

np.dot(a,b)

np.dot(a,b.T)

6.廣播

兩矩陣相加 , 類型shape不一樣時(shí) , 自動(dòng)廣播計(jì)算 ,作用在每一行每個(gè)元素

arr1 = np.random.randint(1,10,size=(3,4))

array([[3, 3, 4, 1],

[8, 4, 8, 2],

[6, 4, 4, 9]])

arr2 = np.array([2,2,2,2])

array([2, 2, 2, 2])

arr1 + arr2

array([[ 5, 5, 6, 3],

[10, 6, 10, 4],

[ 8, 6, 6, 11]])

# 方式二 :

arr1 + 6 # 每個(gè)元素都加6

7.求和

np.sum(a, axis=None, dtype=None, out=None, keepdims=<class numpy._globals._NoValue>)

# 給定軸上的數(shù)組元素的總和。

np.sum([0.5, 1.5])

# 2.0

np.sum([[0, 1], [0, 5]], axis=0)

# array([0, 6])

np.sum([[0, 1], [0, 5]], axis=1)

# array([1, 5])

8.求平均

np.mean(a, axis=None, dtype=None, out=None, keepdims=<class numpy._globals._NoValue>)

# 沿指定軸計(jì)算算術(shù)平均值。

a = np.array([[1, 2], [3, 4]])

np.mean(a)

# 2.5

np.mean(a, axis=0)

# array([ 2., 3.])

9.求平方根

np.sqrt(x[, out]) = <ufunc 'sqrt'>

# 按元素方式返回?cái)?shù)組的正平方根。

np.sqrt([1,4,9])

# array([ 1., 2., 3.])

10.求指數(shù)

1 2	`np.exp(x[, out])` `=` `<ufunc` `'exp'>` `# 計(jì)算輸入數(shù)組中所有元素的指數(shù)。`

11.求絕對(duì)值

np.absolute(x[, out]) = <ufunc 'absolute'>

# 逐個(gè)計(jì)算絕對(duì)值。

x = np.array([-1.2, 1.2])

np.absolute(x)

# array([ 1.2, 1.2])

12.求自然對(duì)數(shù)

1 2	`np.log(x[, out])` `=` `<ufunc` `'log'>` `# 自然對(duì)數(shù)，逐元素。`

⑦、線性代數(shù)計(jì)算

1.數(shù)組轉(zhuǎn)置

1 2	`arr1` `=` `np.random.randint(0,10,size=(4,4))` `np.transpose(arr1)` `# arr1.T`

2.矩陣的逆

a = np.array([[1,2],[4,7]])

np.linalg.inv(a)

array([[-7., 2.],

[ 4., -1.]])

3.沿?cái)?shù)組的對(duì)角線返回總和

a = np.array([[1,2],[4,7],[5,2]])

np.trace(a)

# 8

4.正方形數(shù)組的特征值和右特征向量

w, v = np.linalg.eig(np.array([ [1, -1], [1, 1] ]))

w; v

array([ 1. + 1.j, 1. - 1.j])

array([[ 0.70710678+0.j , 0.70710678+0.j ],

[ 0.00000000-0.70710678j, 0.00000000+0.70710678j]])

二、可視化

①、matplotlib 導(dǎo)入方式: ??官方文檔

1	`import` `matplotlib.pyplot as plt`

②、條形圖

plt.bar(left, height, width=0.8, bottom=None, hold=None, data=None, **kwargs)

y = np.array([10,20,30,50])

x = np.arange(len(y))

plt.bar(x,y,color='r') # 垂直方向

plt.show()

plt.barh(x,y,color='r') # 水平方向

plt.show()

③、多圖形排列

y = np.random.randint(10,100,size=(4,4))

x = np.arange(4)

plt.bar(x,y[0],color='r',width=0.25)

plt.bar(x+0.25,y[1],color='b',width=0.25)

plt.bar(x+0.5,y[2],color='g',width=0.25)

plt.show()

④、圖形堆疊

plt.bar(x,y[0],color='r')

plt.bar(x,y[1],color='y',bottom=y[0])

plt.bar(x,y[2],color='g',bottom=y[0] + y[1])

⑤、散點(diǎn)圖

data = np.random.rand(1024,2)

plt.scatter(data[:,0],data[:,1])

plt.show()

⑥、直方圖

x = np.random.rand(1000)

plt.hist(x,bins=50) # 顯示50條

plt.show()

⑦、箱形圖

x = np.random.randn(100,5)

plt.boxplot(x)

plt.show()

注意:

numpy數(shù)組計(jì)算中*和dot是有很大區(qū)別的

1.numpy乘法運(yùn)算中"*"是數(shù)組元素逐個(gè)計(jì)算具體代碼如下：

2.numpy乘法運(yùn)算中dot是按照矩陣乘法的規(guī)則來運(yùn)算的具體實(shí)現(xiàn)代碼如下：

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來自： highoo > 《數(shù)據(jù)分析》

舉報(bào)/認(rèn)領(lǐng)