# 八、随机性

``````two_groups = make_array('treatment', 'control')
np.random.choice(two_groups)
'treatment'
``````

``````np.random.choice(two_groups, 10)
array(['treatment', 'control', 'treatment', 'control', 'control',
'treatment', 'treatment', 'control', 'control', 'control'],
dtype='<U9')
``````

• 个体是否被分配到实验组？
• 赌徒是否会赢钱？
• 一个民意调查是否做出了准确的预测？

## 布尔值和比较

``````3 > 1 + 1
True
``````

`True`表示比较是有效的；Python 已经证实了`3``1 + 1`的关系的这个简单事实。 下面列出了一整套通用的比较运算符。

``````5 = 10/2
File "<ipython-input-4-5c7d3e808777>", line 1
5 = 10/2
^
SyntaxError: can't assign to literal
5 == 10/2
True
``````

``````1 < 1 + 1 < 3
True
``````

``````x = 12
y = 5
min(x, y) <= (x+y)/2 <= max(x, y)
True
``````

### 字符串比较

``````'Dog' > 'Catastrophe' > 'Cat'
``````

``````np.random.choice(two_groups) == 'treatment'
False
``````

## 比较数组和值

``````tosses = make_array('Tails', 'Heads', 'Tails', 'Heads', 'Heads')
array([False,  True, False,  True,  True], dtype=bool)
``````

`numpy`方法`count_nonzero`计算数组的非零（即`True`）元素的数量。

``````np.count_nonzero(tosses == 'Heads')
3
``````

## 条件语句

``````def sign(x):

if x > 0:
return 'Positive'
sign(3)
'Positive'
``````

``````sign(-3)
``````

``````def sign(x):

if x > 0:
return 'Positive'

elif x < 0:
return 'Negative'
``````

``````sign(-3)
'Negative'
``````

``````def sign(x):

if x > 0:
return 'Positive'

elif x < 0:
return 'Negative'

elif x == 0:
return 'Neither positive nor negative'
sign(0)
'Neither positive nor negative'
``````

``````def sign(x):

if x > 0:
return 'Positive'

elif x < 0:
return 'Negative'

else:
return 'Neither positive nor negative'
sign(0)
'Neither positive nor negative'
``````

### 一般形式

``````if <if expression>:
<if body>
elif <elif expression 0>:
<elif body 0>
elif <elif expression 1>:
<elif body 1>
...
else:
<else body>
``````

### 示例："另一个"

``````def other_one(x, a_b):

"""Compare x with the two elements of a_b;
if it is equal to one of them, return the other one;
if it is not equal to either of them, return an error message.
"""
if x == a_b.item(0):
return a_b.item(1)

elif x == a_b.item(1):
return a_b.item(0)

else:
return 'The input is not valid.'
colors = make_array('red', 'blue')
other_one('red', colors)
'blue'
other_one('blue', colors)
'red'
other_one('potato', colors)
'The input is not valid.'
``````

## 迭代

``````np.random.choice(make_array('Heads', 'Tails'))
``````

``````for i in np.arange(3):
print(i)
0
1
2
``````

``````i = np.arange(3).item(0)
print(i)
i = np.arange(3).item(1)
print(i)
i = np.arange(3).item(2)
print(i)
0
1
2
``````

``````coin = make_array('Heads', 'Tails')

for i in np.arange(5):
Tails
``````

### 扩展数组

`numpy`中的`append`方法可以帮助我们实现它。 调用`np.append(array_name，value)`将求出一个新的数组，它是由`value`扩展的`array_name`。在使用`append`时请记住，数组的所有条目必须具有相同的类型。

``````pets = make_array('Cat', 'Dog')
np.append(pets, 'Another Pet')
array(['Cat', 'Dog', 'Another Pet'],
dtype='<U11')
``````

``````pets
array(['Cat', 'Dog'],
dtype='<U3')
``````

``````pets = np.append(pets, 'Another Pet')
pets
array(['Cat', 'Dog', 'Another Pet'],
dtype='<U11')
``````

### 示例：计算正面的数量

``````coin = make_array('Heads', 'Tails')

tosses = make_array()

for i in np.arange(5):
tosses = np.append(tosses, np.random.choice(coin))

tosses
dtype='<U32')
``````

``````coin = make_array('Heads', 'Tails')

tosses = make_array()

i = np.arange(5).item(0)
tosses = np.append(tosses, np.random.choice(coin))
i = np.arange(5).item(1)
tosses = np.append(tosses, np.random.choice(coin))
i = np.arange(5).item(2)
tosses = np.append(tosses, np.random.choice(coin))
i = np.arange(5).item(3)
tosses = np.append(tosses, np.random.choice(coin))
i = np.arange(5).item(4)
tosses = np.append(tosses, np.random.choice(coin))

tosses
dtype='<U32')
``````

``````np.count_nonzero(tosses == 'Heads')
2
``````

``````tosses = make_array()

for i in np.arange(1000):
tosses = np.append(tosses, np.random.choice(coin))

481
``````

## 示例：100 次投掷中的正面数量

• 掷硬币 100 次，记录正面数量。

``````np.random.choice(coin, 10)
dtype='<U5')
``````

``````N = 10000

for i in np.arange(N):
tosses = np.random.choice(coin, 100)

array([ 46.,  64.,  59., ...,  56.,  54.,  56.])
``````

``````results = Table().with_columns(
'Repetition', np.arange(1, N+1),
)

results
``````
1 46
2 64
3 59
4 57
5 54
6 47
7 45
8 50
9 44
10 57

（省略了 9990 行）

``````results.select('Number of Heads').hist(bins=np.arange(30.5, 69.6, 1))
``````

## Monty Hall 问题

• 参赛者进行初步选择，但不打开那个门。
• 其他两个门中至少有一个门的后面必须有一只山羊。Monty 打开这些门之一来展示山羊，维基百科中显示了他所有的荣耀。

• 还剩下两个门，其中一个是参赛者的原始选择。 其中一扇门后面有车，另一扇有一只山羊。 参赛者现在可以选择打开两扇门中的哪一扇。

### 解法

• 汽车在原来选择的门后面的几率是 1/3。
• 汽车在原来选择的门后面或者剩余的门后面。 它不能在其他地方。
• 因此，汽车在剩余的门后的几率是 2/3。
• 因此，选手应该更改选择。
• 就是这样，故事结束了。

### 模拟

``````doors = make_array('Car', 'Goat 1', 'Goat 2')
goats = make_array('Goat 1', 'Goat 2')
``````

• 参赛选手的原始选择的什么
• Monty 排除了什么
• 剩下的门是什么

``````def other_one(x, a_b):
if x == a_b.item(0):
return a_b.item(1)
elif x == a_b.item(1):
return a_b.item(0)
else:
return 'Input Not Valid'
``````

``````original = 'Goat 1'
make_array(original, other_one(original, goats), 'Car')
array(['Goat 1', 'Goat 2', 'Car'],
dtype='<U6')
original = 'Goat 2'
make_array(original, other_one(original, goats), 'Car')
array(['Goat 2', 'Goat 1', 'Car'],
dtype='<U6')
``````

``````def is_goat(door_name):

""" Check whether the name of a door (a string) is a Goat.

Examples:
=========

>>> is_goat('Goat 1')
True
>>> is_goat('Goat 2')
True
>>> is_goat('Car')
False
"""
if door_name == "Goat 1":
return True
elif door_name == "Goat 2":
return True
else:
return False

def monty_hall():

""" Play the Monty Hall game once
and return an array of three strings:

original choice, what Monty throws out, what remains
"""

original = np.random.choice(doors)

if is_goat(original):
return make_array(original, other_one(original, goats), 'Car')

else:
throw_out = np.random.choice(goats)
return make_array(original, throw_out, other_one(throw_out, goats))
``````

``````monty_hall()
array(['Car', 'Goat 2', 'Goat 1'],
dtype='<U6')
``````

``````# Number of times we'll play the game
N = 10000

original = make_array()     # original choice
throw_out = make_array()    # what Monty throws out
remains = make_array()      # what remains

for i in np.arange(N):
result = monty_hall()    # the result of one game

# Collect the results in the appropriate arrays
original = np.append(original, result.item(0))
throw_out = np.append(throw_out, result.item(1))
remains = np.append(remains, result.item(2))

# The for-loop is done! Now put all the arrays together in a table.
results = Table().with_columns(
'Original Door Choice', original,
'Monty Throws Out', throw_out,
'Remaining Door', remains
)
results
``````
Original Door Choice Monty Throws Out Remaining Door
Car Goat 1 Goat 2
Goat 1 Goat 2 Car
Goat 2 Goat 1 Car
Car Goat 2 Goat 1
Car Goat 2 Goat 1
Goat 1 Goat 2 Car
Goat 1 Goat 2 Car
Goat 1 Goat 2 Car
Goat 2 Goat 1 Car
Goat 1 Goat 2 Car

（省略了 9990 行）

``````results.group('Original Door Choice')
``````
Original Door Choice count
Car 3312
Goat 1 3382
Goat 2 3306
``````results.group('Remaining Door')
``````
Remaining Door count
Car 6688
Goat 1 1640
Goat 2 1672

``````results_o = results.group('Original Door Choice')
results_r = results.group('Remaining Door')
joined = results_o.join('Original Door Choice', results_r, 'Remaining Door')
combined = joined.relabeled(0, 'Item').relabeled(1, 'Original Door').relabeled(2, 'Remaining Door')
combined
``````
Item Original Door Remaining Door
Car 3312 6688
Goat 1 3382 1640
Goat 2 3306 1672
``````combined.barh(0)
``````

## 事件以两种不同的方式发生

### 至少有一个成功

``````rolls = np.arange(1, 51, 1)
results = Table().with_columns(
'Rolls', rolls,
'Chance of at least one 6', 1 - (5/6)**rolls
)
results
``````
Rolls Chance of at least one 6
1 0.166667
2 0.305556
3 0.421296
4 0.517747
5 0.598122
6 0.665102
7 0.720918
8 0.767432
9 0.806193
10 0.838494

（省略了 40 行）

``````results.scatter('Rolls')
``````

``````results.where('Rolls', are.equal_to(50))
``````
Rolls Chance of at least one 6
50 0.99989

## 抽样

``````top1 = Table.read_table('top_movies.csv')
top2 = top1.with_column('Row Index', np.arange(top1.num_rows))
top = top2.move_to_start('Row Index')

top.set_format(make_array(3, 4), NumberFormatter)
``````
Row Index Title Studio Gross Gross (Adjusted) Year
0 Star Wars: The Force Awakens Buena Vista (Disney) 906,723,418 906,723,400 2015
1 Avatar Fox 760,507,625 846,120,800 2009
2 Titanic Paramount 658,672,302 1,178,627,900 1997
3 Jurassic World Universal 652,270,625 687,728,000 2015
4 Marvel's The Avengers Buena Vista (Disney) 623,357,910 668,866,600 2012
5 The Dark Knight Warner Bros. 534,858,444 647,761,600 2008
6 Star Wars: Episode I - The Phantom Menace Fox 474,544,677 785,715,000 1999
7 Star Wars Fox 460,998,007 1,549,640,500 1977
8 Avengers: Age of Ultron Buena Vista (Disney) 459,005,868 465,684,200 2015
9 The Dark Knight Rises Warner Bros. 448,139,099 500,961,700 2012

（省略了 190 行）

## 确定性样本

``````top.take(make_array(3, 18, 100))
``````
Row Index Title Studio Gross Gross (Adjusted) Year
3 Jurassic World Universal 652,270,625 687,728,000 2015
18 Spider-Man Sony 403,706,375 604,517,300 2002
100 Gone with the Wind MGM 198,676,459 1,757,788,200 1939

``````top.where('Title', are.containing('Harry Potter'))
``````
Row Index Title Studio Gross Gross (Adjusted) Year
22 Harry Potter and the Deathly Hallows Part 2 Warner Bros. 381,011,219 417,512,200 2011
43 Harry Potter and the Sorcerer's Stone Warner Bros. 317,575,550 486,442,900 2001
54 Harry Potter and the Half-Blood Prince Warner Bros. 301,959,197 352,098,800 2009
59 Harry Potter and the Order of the Phoenix Warner Bros. 292,004,738 369,250,200 2007
62 Harry Potter and the Goblet of Fire Warner Bros. 290,013,036 393,024,800 2005
69 Harry Potter and the Chamber of Secrets Warner Bros. 261,988,482 390,768,100 2002
76 Harry Potter and the Prisoner of Azkaban Warner Bros. 249,541,069 349,598,600 2004

### 随机抽样方案

• 个体 A 选中概率为 1。
• 个体 B 或 C 根据掷硬币来选择：如果硬币为正面，选择 B，否则，选择 C。

``````A: 1
B: 1/2
C: 1/2
AB: 1/2
AC: 1/2
BC: 0
ABC: 0
``````

### 系统样本

``````"""Choose a random start among rows 0 through 9;
then take every 10th row."""

start = np.random.choice(np.arange(10))
top.take(np.arange(start, top.num_rows, 10))
``````
Row Index Title Studio Gross Gross (Adjusted) Year
6 Star Wars: Episode I - The Phantom Menace Fox 474,544,677 785,715,000 1999
16 Iron Man 3 Buena Vista (Disney) 409,013,994 424,632,700 2013
26 Spider-Man 2 Sony 373,585,825 523,381,100 2004
36 Minions Universal 336,045,770 354,213,900 2015
46 Iron Man 2 Paramount 312,433,331 341,908,200 2010
56 The Twilight Saga: New Moon Sum. 296,623,634 338,517,700 2009
66 Meet the Fockers Universal 279,261,160 384,305,300 2004
76 Harry Potter and the Prisoner of Azkaban Warner Bros. 249,541,069 349,598,600 2004
86 The Exorcist Warner Bros. 232,906,145 962,212,800 1973
96 Back to the Future Universal 210,609,762 513,740,700 1985

（省略了 10 行）