量化分析师的Python日记【第3天：一大波金融Library来袭之numpy篇】

• numpy
• scipy
• pandas
• matplotlib

NumPy 简介

一、NumPy是什么？

``````import numpy
numpy.version.full_version

'1.8.0'
``````

``````import numpy as np
np.version.full_version

'1.8.0'
``````

二、初窥NumPy对象：数组

NumPy中的基本对象是同类型的多维数组（homogeneous multidimensional array），这和C++中的数组是一致的，例如字符型和数值型就不可共存于同一个数组中。先上例子：

``````a = np.arange(20)
``````

``````print a

numpy.ndarray
``````

``````a = a.reshape(4, 5)
print a

[[ 0  1  2  3  4]
[ 5  6  7  8  9]
[10 11 12 13 14]
[15 16 17 18 19]]
``````

``````a = a.reshape(2, 2, 5)
print a

[[[ 0  1  2  3  4]
[ 5  6  7  8  9]]

[[10 11 12 13 14]
[15 16 17 18 19]]]
``````

``````a.ndim

3
``````
``````a.shape

(2, 2, 5)
``````
``````a.size

20
``````
``````a.dtype

dtype('int64')
``````

三、创建数组

``````raw = [0,1,2,3,4]
a = np.array(raw)
a

array([0, 1, 2, 3, 4])
``````
``````raw = [[0,1,2,3,4], [5,6,7,8,9]]
b = np.array(raw)
b

array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
``````

``````d = (4, 5)
np.zeros(d)

array([[ 0.,  0.,  0.,  0.,  0.],
[ 0.,  0.,  0.,  0.,  0.],
[ 0.,  0.,  0.,  0.,  0.],
[ 0.,  0.,  0.,  0.,  0.]])
``````

``````d = (4, 5)
np.ones(d, dtype=int)

array([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])
``````

`[0, 1)`区间的随机数数组：

``````np.random.rand(5)

array([ 0.93807818,  0.45307847,  0.90732828,  0.36099623,  0.71981451])
``````

四、数组操作

``````a = np.array([[1.0, 2], [2, 4]])
print "a:"
print a
b = np.array([[3.2, 1.5], [2.5, 4]])
print "b:"
print b
print "a+b:"
print a+b

a:
[[ 1.  2.]
[ 2.  4.]]
b:
[[ 3.2  1.5]
[ 2.5  4. ]]
a+b:
[[ 4.2  3.5]
[ 4.5  8. ]]
``````

``````print "3 * a:"
print 3 * a
print "b + 1.8:"
print b + 1.8

3 * a:
[[  3.   6.]
[  6.  12.]]
b + 1.8:
[[ 5.   3.3]
[ 4.3  5.8]]
``````

``````a /= 2
print a

[[ 0.5  1. ]
[ 1.   2. ]]
``````

``````print "a:"
print a
print "np.exp(a):"
print np.exp(a)
print "np.sqrt(a):"
print np.sqrt(a)
print "np.square(a):"
print np.square(a)
print "np.power(a, 3):"
print np.power(a, 3)

a:
[[ 0.5  1. ]
[ 1.   2. ]]
np.exp(a):
[[ 1.64872127  2.71828183]
[ 2.71828183  7.3890561 ]]
np.sqrt(a):
[[ 0.70710678  1.        ]
[ 1.          1.41421356]]
np.square(a):
[[ 0.25  1.  ]
[ 1.    4.  ]]
np.power(a, 3):
[[ 0.125  1.   ]
[ 1.     8.   ]]
``````

``````a = np.arange(20).reshape(4,5)
print "a:"
print a
print "sum of all elements in a: " + str(a.sum())
print "maximum element in a: " + str(a.max())
print "minimum element in a: " + str(a.min())
print "maximum element in each row of a: " + str(a.max(axis=1))
print "minimum element in each column of a: " + str(a.min(axis=0))

a:
[[ 0  1  2  3  4]
[ 5  6  7  8  9]
[10 11 12 13 14]
[15 16 17 18 19]]
sum of all elements in a: 190
maximum element in a: 19
minimum element in a: 0
maximum element in each row of a: [ 4  9 14 19]
minimum element in each column of a: [0 1 2 3 4]
``````

``````a = np.arange(20).reshape(4, 5)
a = np.asmatrix(a)
print type(a)

b = np.matrix('1.0 2.0; 3.0 4.0')
print type(b)

<class 'numpy.matrixlib.defmatrix.matrix'>
<class 'numpy.matrixlib.defmatrix.matrix'>
``````

``````b = np.arange(2, 45, 3).reshape(5, 3)
b = np.mat(b)
print b

[[ 2  5  8]
[11 14 17]
[20 23 26]
[29 32 35]
[38 41 44]]
``````

``````np.linspace(0, 2, 9)

array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ,  1.25,  1.5 ,  1.75,  2.  ])
``````

``````print "matrix a:"
print a
print "matrix b:"
print b
c = a * b
print "matrix c:"
print c

print c

matrix a:
[[ 0  1  2  3  4]
[ 5  6  7  8  9]
[10 11 12 13 14]
[15 16 17 18 19]]
matrix b:
[[ 2  5  8]
[11 14 17]
[20 23 26]
[29 32 35]
[38 41 44]]
matrix c:
[[ 290  320  350]
[ 790  895 1000]
[1290 1470 1650]
[1790 2045 2300]]
``````

五、数组元素访问

``````a = np.array([[3.2, 1.5], [2.5, 4]])
print a[0][1]
print a[0, 1]

1.5
1.5
``````

``````b = a
a[0][1] = 2.0
print "a:"
print a
print "b:"
print b

a:
[[ 3.2  2. ]
[ 2.5  4. ]]
b:
[[ 3.2  2. ]
[ 2.5  4. ]]
``````

``````a = np.array([[3.2, 1.5], [2.5, 4]])
b = a.copy()
a[0][1] = 2.0
print "a:"
print a
print "b:"
print b

a:
[[ 3.2  2. ]
[ 2.5  4. ]]
b:
[[ 3.2  1.5]
[ 2.5  4. ]]
``````

``````a = np.array([[3.2, 1.5], [2.5, 4]])
b = a
a = np.array([[2, 1], [9, 3]])
print "a:"
print a
print "b:"
print b

a:
[[2 1]
[9 3]]
b:
[[ 3.2  1.5]
[ 2.5  4. ]]
``````

``````a = np.arange(20).reshape(4, 5)
print "a:"
print a
print "the 2nd and 4th column of a:"
print a[:,[1,3]]

a:
[[ 0  1  2  3  4]
[ 5  6  7  8  9]
[10 11 12 13 14]
[15 16 17 18 19]]
the 2nd and 4th column of a:
[[ 1  3]
[ 6  8]
[11 13]
[16 18]]
``````

``````a[:, 2][a[:, 0] > 5]

array([12, 17])
``````

``````loc = numpy.where(a==11)
print loc
print a[loc[0][0], loc[1][0]]

(array([2]), array([1]))
11
``````

六、数组操作

``````a = np.random.rand(2,4)
print "a:"
print a
a = np.transpose(a)
print "a is an array, by using transpose(a):"
print a
b = np.random.rand(2,4)
b = np.mat(b)
print "b:"
print b
print "b is a matrix, by using b.T:"
print b.T

a:
[[ 0.17571282  0.98510461  0.94864387  0.50078988]
[ 0.09457965  0.70251658  0.07134875  0.43780173]]
a is an array, by using transpose(a):
[[ 0.17571282  0.09457965]
[ 0.98510461  0.70251658]
[ 0.94864387  0.07134875]
[ 0.50078988  0.43780173]]
b:
[[ 0.09653644  0.46123468  0.50117363  0.69752578]
[ 0.60756723  0.44492537  0.05946373  0.4858369 ]]
b is a matrix, by using b.T:
[[ 0.09653644  0.60756723]
[ 0.46123468  0.44492537]
[ 0.50117363  0.05946373]
[ 0.69752578  0.4858369 ]]
``````

``````import numpy.linalg as nlg
a = np.random.rand(2,2)
a = np.mat(a)
print "a:"
print a
ia = nlg.inv(a)
print "inverse of a:"
print ia
print "a * inv(a)"
print a * ia

a:
[[ 0.86211266  0.6885563 ]
[ 0.28798536  0.70810425]]
inverse of a:
[[ 1.71798445 -1.6705577 ]
[-0.69870271  2.09163573]]
a * inv(a)
[[ 1.  0.]
[ 0.  1.]]
``````

``````a = np.random.rand(3,3)
eig_value, eig_vector = nlg.eig(a)
print "eigen value:"
print eig_value
print "eigen vector:"
print eig_vector

eigen value:
[ 1.35760609  0.43205379 -0.53470662]
eigen vector:
[[-0.76595379 -0.88231952 -0.07390831]
[-0.55170557  0.21659887 -0.74213622]
[-0.33005418  0.41784829  0.66616169]]
``````

``````a = np.array((1,2,3))
b = np.array((2,3,4))
print np.column_stack((a,b))

[[1 2]
[2 3]
[3 4]]
``````

``````a = np.random.rand(2,2)
b = np.random.rand(2,2)
print "a:"
print a
print "b:"
print a
c = np.hstack([a,b])
d = np.vstack([a,b])
print "horizontal stacking a and b:"
print c
print "vertical stacking a and b:"
print d

a:
[[ 0.6738195   0.4944045 ]
[ 0.25702675  0.15422012]]
b:
[[ 0.6738195   0.4944045 ]
[ 0.25702675  0.15422012]]
horizontal stacking a and b:
[[ 0.6738195   0.4944045   0.28058267  0.0967197 ]
[ 0.25702675  0.15422012  0.55191041  0.04694485]]
vertical stacking a and b:
[[ 0.6738195   0.4944045 ]
[ 0.25702675  0.15422012]
[ 0.28058267  0.0967197 ]
[ 0.55191041  0.04694485]]
``````

七、缺失值

``````a = np.random.rand(2,2)
a[0, 1] = np.nan
print np.isnan(a)

[[False  True]
[False False]]
``````

`nan_to_num`可用来将`nan`替换成0，在后面会介绍到的更高级的模块`pandas`时，我们将看到`pandas`提供能指定`nan`替换值的函数。

``````print np.nan_to_num(a)

[[ 0.58144238  0.        ]
[ 0.26789784  0.48664306]]
``````

NumPy还有很多的函数，想详细了解可参考链接 http://wiki.scipy.org/Numpy_Example_Listhttp://docs.scipy.org/doc/numpy