学习Pandas,第 4 课
英文原文: 04 - Lesson
在这一课,我们将回归一些基本概念。 我们将使用一个比较小的数据集这样你就可以非常容易理解我尝试解释的概念。 我们将添加列,删除列,并且使用不同的方式对数据进行切片(slicing)操作。 Enjoy!
import pandas as pd
import sys
print('Python version ' + sys.version)
print('Pandas version: ' + pd.__version__)
Python version 3.6.1 | packaged by conda-forge | (default, Mar 23 2017, 21:57:00)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)]
Pandas version: 0.19.2
d = [0,1,2,3,4,5,6,7,8,9]
df = pd.DataFrame(d)
df
|
0 |
0 |
0 |
1 |
1 |
2 |
2 |
3 |
3 |
4 |
4 |
5 |
5 |
6 |
6 |
7 |
7 |
8 |
8 |
9 |
9 |
df.columns = ['Rev']
df
|
Rev |
0 |
0 |
1 |
1 |
2 |
2 |
3 |
3 |
4 |
4 |
5 |
5 |
6 |
6 |
7 |
7 |
8 |
8 |
9 |
9 |
df['NewCol'] = 5
df
|
Rev |
NewCol |
0 |
0 |
5 |
1 |
1 |
5 |
2 |
2 |
5 |
3 |
3 |
5 |
4 |
4 |
5 |
5 |
5 |
5 |
6 |
6 |
5 |
7 |
7 |
5 |
8 |
8 |
5 |
9 |
9 |
5 |
df['NewCol'] = df['NewCol'] + 1
df
|
Rev |
NewCol |
0 |
0 |
6 |
1 |
1 |
6 |
2 |
2 |
6 |
3 |
3 |
6 |
4 |
4 |
6 |
5 |
5 |
6 |
6 |
6 |
6 |
7 |
7 |
6 |
8 |
8 |
6 |
9 |
9 |
6 |
del df['NewCol']
df
|
Rev |
0 |
0 |
1 |
1 |
2 |
2 |
3 |
3 |
4 |
4 |
5 |
5 |
6 |
6 |
7 |
7 |
8 |
8 |
9 |
9 |
df['test'] = 3
df['col'] = df['Rev']
df
|
Rev |
test |
col |
0 |
0 |
3 |
0 |
1 |
1 |
3 |
1 |
2 |
2 |
3 |
2 |
3 |
3 |
3 |
3 |
4 |
4 |
3 |
4 |
5 |
5 |
3 |
5 |
6 |
6 |
3 |
6 |
7 |
7 |
3 |
7 |
8 |
8 |
3 |
8 |
9 |
9 |
3 |
9 |
i = ['a','b','c','d','e','f','g','h','i','j']
df.index = i
df
|
Rev |
test |
col |
a |
0 |
3 |
0 |
b |
1 |
3 |
1 |
c |
2 |
3 |
2 |
d |
3 |
3 |
3 |
e |
4 |
3 |
4 |
f |
5 |
3 |
5 |
g |
6 |
3 |
6 |
h |
7 |
3 |
7 |
i |
8 |
3 |
8 |
j |
9 |
3 |
9 |
通过使用 *loc,我们可以选择 dataframe 中的部分数据。
df.loc['a']
Rev 0
test 3
col 0
Name: a, dtype: int64
df.loc['a':'d']
|
Rev |
test |
col |
a |
0 |
3 |
0 |
b |
1 |
3 |
1 |
c |
2 |
3 |
2 |
d |
3 |
3 |
3 |
df.iloc[0:3]
|
Rev |
test |
col |
a |
0 |
3 |
0 |
b |
1 |
3 |
1 |
c |
2 |
3 |
2 |
也可以通过列名选择一列的值。
df['Rev']
a 0
b 1
c 2
d 3
e 4
f 5
g 6
h 7
i 8
j 9
Name: Rev, dtype: int64
df[['Rev', 'test']]
|
Rev |
test |
a |
0 |
3 |
b |
1 |
3 |
c |
2 |
3 |
d |
3 |
3 |
e |
4 |
3 |
f |
5 |
3 |
g |
6 |
3 |
h |
7 |
3 |
i |
8 |
3 |
j |
9 |
3 |
df.ix[0:3,'Rev']
a 0
b 1
c 2
Name: Rev, dtype: int64
df.ix[5:,'col']
f 5
g 6
h 7
i 8
j 9
Name: col, dtype: int64
df.ix[:3,['col', 'test']]
|
col |
test |
a |
0 |
3 |
b |
1 |
3 |
c |
2 |
3 |
还有一些方便的方法来选择最前或者最后的一些记录。
df.head()
|
Rev |
test |
col |
a |
0 |
3 |
0 |
b |
1 |
3 |
1 |
c |
2 |
3 |
2 |
d |
3 |
3 |
3 |
e |
4 |
3 |
4 |
df.tail()
|
Rev |
test |
col |
f |
5 |
3 |
5 |
g |
6 |
3 |
6 |
h |
7 |
3 |
7 |
i |
8 |
3 |
8 |
j |
9 |
3 |
9 |
This tutorial was created by HEDARO
本教程由派兰数据翻译
These tutorials are also available through an email course, please visit http://www.hedaro.com/pandas-tutorial to sign up today.