Natural Language Processing using Artificial Neural Networks

“In God we trust. All others must bring data.” – W. Edwards Deming, statistician

Word Embeddings

What?

Convert words to vectors in a high dimensional space. Each dimension denotes an aspect like gender, type of object / word.

"Word embeddings" are a family of natural language processing techniques aiming at mapping semantic meaning into a geometric space. This is done by associating a numeric vector to every word in a dictionary, such that the distance (e.g. L2 distance or more commonly cosine distance) between any two vectors would capture part of the semantic relationship between the two associated words. The geometric space formed by these vectors is called an embedding space.

Why?

By converting words to vectors we build relations between words. More similar the words in a dimension, more closer their scores are.

Example

W(green) = (1.2, 0.98, 0.05, ...)

W(red) = (1.1, 0.2, 0.5, ...)

Here the vector values of green and red are very similar in one dimension because they both are colours. The value for second dimension is very different because red might be depicting something negative in the training data while green is used for positiveness.

By vectorizing we are indirectly building different kind of relations between words.

Example of word2vec using gensim

from gensim.models import word2vec
from gensim.models.word2vec import Word2Vec
Using gpu device 0: GeForce GTX 760 (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)

Reading blog post from data directory

import os
import pickle
DATA_DIRECTORY = os.path.join(os.path.abspath(os.path.curdir), '..', 
                              'data', 'word_embeddings')
male_posts = []
female_post = []
with open(os.path.join(DATA_DIRECTORY,"male_blog_list.txt"),"rb") as male_file:
    male_posts= pickle.load(male_file)

with open(os.path.join(DATA_DIRECTORY,"female_blog_list.txt"),"rb") as female_file:
    female_posts = pickle.load(female_file)
print(len(female_posts))
print(len(male_posts))
2252
2611
filtered_male_posts = list(filter(lambda p: len(p) > 0, male_posts))
filtered_female_posts = list(filter(lambda p: len(p) > 0, female_posts))
posts = filtered_female_posts + filtered_male_posts
print(len(filtered_female_posts), len(filtered_male_posts), len(posts))
2247 2595 4842

Word2Vec

w2v = Word2Vec(size=200, min_count=1)
w2v.build_vocab(map(lambda x: x.split(), posts[:100]), )
w2v.vocab
{'see.': <gensim.models.word2vec.Vocab at 0x7f61aa4f1908>,
 'never.': <gensim.models.word2vec.Vocab at 0x7f61aa4f1dd8>,
 'driving': <gensim.models.word2vec.Vocab at 0x7f61aa4f1e48>,
 'buddy': <gensim.models.word2vec.Vocab at 0x7f61aa4f0240>,
 'DEFENSE': <gensim.models.word2vec.Vocab at 0x7f61aa4f0438>,
 'interval': <gensim.models.word2vec.Vocab at 0x7f61aa4f04e0>,
 'Right': <gensim.models.word2vec.Vocab at 0x7f61aa4f06a0>,
 'minds,': <gensim.models.word2vec.Vocab at 0x7f61aa4f06d8>,
 'earth.': <gensim.models.word2vec.Vocab at 0x7f61aa4f0710>,
 'pleasure': <gensim.models.word2vec.Vocab at 0x7f61aa4f08d0>,
 'school,': <gensim.models.word2vec.Vocab at 0x7f61aa4f0cc0>,
 'someone': <gensim.models.word2vec.Vocab at 0x7f61aa4f0ef0>,
 'dangit...': <gensim.models.word2vec.Vocab at 0x7f61aa4f23c8>,
 'one!': <gensim.models.word2vec.Vocab at 0x7f61aa4f2c88>,
 'hard.': <gensim.models.word2vec.Vocab at 0x7f61aa4e25c0>,
 'programs,': <gensim.models.word2vec.Vocab at 0x7f61aa4e27b8>,
 'SEEEENNNIIIOOORS!!!': <gensim.models.word2vec.Vocab at 0x7f61aa4e27f0>,
 'two)': <gensim.models.word2vec.Vocab at 0x7f61aa4e2828>,
 "o'": <gensim.models.word2vec.Vocab at 0x7f61aa4e28d0>,
 '--': <gensim.models.word2vec.Vocab at 0x7f61aa4e2a58>,
 'this-actually': <gensim.models.word2vec.Vocab at 0x7f61aa4e2b70>,
 'swimming.': <gensim.models.word2vec.Vocab at 0x7f61aa4e2c50>,
 'people.': <gensim.models.word2vec.Vocab at 0x7f61aa4e2cc0>,
 'turn': <gensim.models.word2vec.Vocab at 0x7f61aa4e2e48>,
 'happened': <gensim.models.word2vec.Vocab at 0x7f61aa4e2fd0>,
 'clothing:': <gensim.models.word2vec.Vocab at 0x7f61aa4e22e8>,
 'it!': <gensim.models.word2vec.Vocab at 0x7f61aa4e2048>,
 'church': <gensim.models.word2vec.Vocab at 0x7f61aa4e21d0>,
 'boring.': <gensim.models.word2vec.Vocab at 0x7f61aa4e2240>,
 'freaky': <gensim.models.word2vec.Vocab at 0x7f61aa4ea278>,
 'Democrats,': <gensim.models.word2vec.Vocab at 0x7f61aa4ea320>,
 '*kick': <gensim.models.word2vec.Vocab at 0x7f61aa4ea358>,
 '"It': <gensim.models.word2vec.Vocab at 0x7f61aa4ea550>,
 'wet': <gensim.models.word2vec.Vocab at 0x7f61aa4ea6d8>,
 'snooze': <gensim.models.word2vec.Vocab at 0x7f61aa4ea7b8>,
 'points': <gensim.models.word2vec.Vocab at 0x7f61aa4ea978>,
 'Sen.': <gensim.models.word2vec.Vocab at 0x7f61aa4ea9b0>,
 'although': <gensim.models.word2vec.Vocab at 0x7f61aa4eaac8>,
 'Charlotte': <gensim.models.word2vec.Vocab at 0x7f61aa4eab00>,
 'lil...but': <gensim.models.word2vec.Vocab at 0x7f61aa4eab38>,
 'oneo': <gensim.models.word2vec.Vocab at 0x7f61aa4eac50>,
 'course;': <gensim.models.word2vec.Vocab at 0x7f61aa4eada0>,
 'Bring': <gensim.models.word2vec.Vocab at 0x7f61aa4eadd8>,
 '(compared': <gensim.models.word2vec.Vocab at 0x7f61aa4eae48>,
 'ugh.': <gensim.models.word2vec.Vocab at 0x7f61aa4eaef0>,
 'sit': <gensim.models.word2vec.Vocab at 0x7f61aa553a20>,
 'dipped?': <gensim.models.word2vec.Vocab at 0x7f61aa4eafd0>,
 'based': <gensim.models.word2vec.Vocab at 0x7f61aa4ec978>,
 'A.I.': <gensim.models.word2vec.Vocab at 0x7f61aa4ec080>,
 'breathing.': <gensim.models.word2vec.Vocab at 0x7f61aa4ec128>,
 'multi-millionaire': <gensim.models.word2vec.Vocab at 0x7f61aa4ec208>,
 'groups': <gensim.models.word2vec.Vocab at 0x7f61aa4ec278>,
 'on': <gensim.models.word2vec.Vocab at 0x7f61aa4ec2b0>,
 'animals),': <gensim.models.word2vec.Vocab at 0x7f61aa4d8630>,
 'Manners?': <gensim.models.word2vec.Vocab at 0x7f61aa4ec320>,
 'you?]:': <gensim.models.word2vec.Vocab at 0x7f61aa445f60>,
 'redistribute': <gensim.models.word2vec.Vocab at 0x7f61aa4dbba8>,
 'omg.': <gensim.models.word2vec.Vocab at 0x7f61aa4ec470>,
 'dance?:': <gensim.models.word2vec.Vocab at 0x7f61aa4ec4a8>,
 'Canada)': <gensim.models.word2vec.Vocab at 0x7f61aa553b00>,
 'came': <gensim.models.word2vec.Vocab at 0x7f61aa4ec550>,
 'poof': <gensim.models.word2vec.Vocab at 0x7f61aa4ec588>,
 'brownies.': <gensim.models.word2vec.Vocab at 0x7f61aa4ec630>,
 'Not': <gensim.models.word2vec.Vocab at 0x7f61aa4ec710>,
 'spaces': <gensim.models.word2vec.Vocab at 0x7f61aa4ec780>,
 'destroy': <gensim.models.word2vec.Vocab at 0x7f61aa4ec860>,
 'maybe.': <gensim.models.word2vec.Vocab at 0x7f61aa4ec898>,
 'Industrial': <gensim.models.word2vec.Vocab at 0x7f61aa4ec9e8>,
 'boring': <gensim.models.word2vec.Vocab at 0x7f61aa4ecb00>,
 'is:': <gensim.models.word2vec.Vocab at 0x7f61aa4ecd30>,
 'question.': <gensim.models.word2vec.Vocab at 0x7f61aa4ecd68>,
 'long-lasting': <gensim.models.word2vec.Vocab at 0x7f61aa4ecda0>,
 'sun': <gensim.models.word2vec.Vocab at 0x7f61aa5dc1d0>,
 'CrAp*': <gensim.models.word2vec.Vocab at 0x7f61aa4ed080>,
 'irresistable': <gensim.models.word2vec.Vocab at 0x7f61aa4ed0f0>,
 'dont...i': <gensim.models.word2vec.Vocab at 0x7f61aa4ed128>,
 'loss.': <gensim.models.word2vec.Vocab at 0x7f61aa4ed160>,
 'easy': <gensim.models.word2vec.Vocab at 0x7f61aa4ed2b0>,
 'wanna': <gensim.models.word2vec.Vocab at 0x7f61aa4635c0>,
 'Gaviota': <gensim.models.word2vec.Vocab at 0x7f61aa4ed4a8>,
 'nose': <gensim.models.word2vec.Vocab at 0x7f61aa4ed518>,
 'slept': <gensim.models.word2vec.Vocab at 0x7f61aa4ed5c0>,
 'hahahahah': <gensim.models.word2vec.Vocab at 0x7f61aa4ed5f8>,
 'halloween': <gensim.models.word2vec.Vocab at 0x7f61aa4ed630>,
 'shes': <gensim.models.word2vec.Vocab at 0x7f61aa553c50>,
 'realize': <gensim.models.word2vec.Vocab at 0x7f61aa4ed860>,
 'twice': <gensim.models.word2vec.Vocab at 0x7f61aa4ed908>,
 'lift': <gensim.models.word2vec.Vocab at 0x7f61aa4eda90>,
 'china,': <gensim.models.word2vec.Vocab at 0x7f61aa4edc88>,
 'Standard.)': <gensim.models.word2vec.Vocab at 0x7f61aa4edcc0>,
 'worried': <gensim.models.word2vec.Vocab at 0x7f61aa4edda0>,
 'Opposite': <gensim.models.word2vec.Vocab at 0x7f61aa4eddd8>,
 'chin.': <gensim.models.word2vec.Vocab at 0x7f61aa4edef0>,
 'Garden': <gensim.models.word2vec.Vocab at 0x7f61aa4ebcc0>,
 'guy': <gensim.models.word2vec.Vocab at 0x7f61aa4ebd68>,
 'remmeber': <gensim.models.word2vec.Vocab at 0x7f61aa4ebef0>,
 'fence,': <gensim.models.word2vec.Vocab at 0x7f61aa4eb128>,
 'apologizing': <gensim.models.word2vec.Vocab at 0x7f61aa4eb160>,
 'next.': <gensim.models.word2vec.Vocab at 0x7f61aa4eb2b0>,
 'MATTERS': <gensim.models.word2vec.Vocab at 0x7f61aa4eb2e8>,
 'rugs': <gensim.models.word2vec.Vocab at 0x7f61aa4eb320>,
 'her...': <gensim.models.word2vec.Vocab at 0x7f61aa4eb438>,
 'energy,': <gensim.models.word2vec.Vocab at 0x7f61aa4eb4a8>,
 'recorded,': <gensim.models.word2vec.Vocab at 0x7f61aa4eb588>,
 'pepsi.': <gensim.models.word2vec.Vocab at 0x7f61aa4eb710>,
 'r': <gensim.models.word2vec.Vocab at 0x7f61aa4eb860>,
 '13': <gensim.models.word2vec.Vocab at 0x7f61aa4eb898>,
 'at:': <gensim.models.word2vec.Vocab at 0x7f61aa5dc390>,
 'cheaper': <gensim.models.word2vec.Vocab at 0x7f61aa4ee9b0>,
 'children!': <gensim.models.word2vec.Vocab at 0x7f61aa5b6c88>,
 'tree': <gensim.models.word2vec.Vocab at 0x7f61aa4eecc0>,
 'met': <gensim.models.word2vec.Vocab at 0x7f61aa4eecf8>,
 'one,': <gensim.models.word2vec.Vocab at 0x7f61aa4eeda0>,
 'rejected?': <gensim.models.word2vec.Vocab at 0x7f61aa4eee48>,
 'Marianne’s': <gensim.models.word2vec.Vocab at 0x7f61aa4eee80>,
 'Icenhower': <gensim.models.word2vec.Vocab at 0x7f61aa4ee978>,
 'day!': <gensim.models.word2vec.Vocab at 0x7f61aa4ee1d0>,
 'leaving': <gensim.models.word2vec.Vocab at 0x7f61aa4ee240>,
 '2110': <gensim.models.word2vec.Vocab at 0x7f61aa4ee2b0>,
 'kiss:': <gensim.models.word2vec.Vocab at 0x7f61aa4ee748>,
 'nearest': <gensim.models.word2vec.Vocab at 0x7f61aa4ee780>,
 'aimlessly': <gensim.models.word2vec.Vocab at 0x7f61aa4ee7b8>,
 'sprint': <gensim.models.word2vec.Vocab at 0x7f61aa4ee898>,
 'kids!)': <gensim.models.word2vec.Vocab at 0x7f61aa536048>,
 'canteen': <gensim.models.word2vec.Vocab at 0x7f61aa5360f0>,
 'weekend!': <gensim.models.word2vec.Vocab at 0x7f61aa536160>,
 'him': <gensim.models.word2vec.Vocab at 0x7f61aa536198>,
 'scariest': <gensim.models.word2vec.Vocab at 0x7f61aa5361d0>,
 'this?': <gensim.models.word2vec.Vocab at 0x7f61aa536208>,
 '"choosing': <gensim.models.word2vec.Vocab at 0x7f61aa536240>,
 'Talk': <gensim.models.word2vec.Vocab at 0x7f61aa5362b0>,
 'weeks': <gensim.models.word2vec.Vocab at 0x7f61aa5362e8>,
 "You'll": <gensim.models.word2vec.Vocab at 0x7f61aa536390>,
 'goodnight': <gensim.models.word2vec.Vocab at 0x7f61aa5363c8>,
 'skiing.': <gensim.models.word2vec.Vocab at 0x7f61aa536438>,
 'KeEp': <gensim.models.word2vec.Vocab at 0x7f61aa5535f8>,
 'week': <gensim.models.word2vec.Vocab at 0x7f61aa536550>,
 'norwegian': <gensim.models.word2vec.Vocab at 0x7f61aa5366a0>,
 'HAND:': <gensim.models.word2vec.Vocab at 0x7f61aa553780>,
 'fact,': <gensim.models.word2vec.Vocab at 0x7f61aa5367f0>,
 'thanksgiving': <gensim.models.word2vec.Vocab at 0x7f61aa536828>,
 'me..argh...': <gensim.models.word2vec.Vocab at 0x7f61aa536860>,
 'she': <gensim.models.word2vec.Vocab at 0x7f61aa536898>,
 'Tree': <gensim.models.word2vec.Vocab at 0x7f61aa536908>,
 'combat.': <gensim.models.word2vec.Vocab at 0x7f61aa536940>,
 'mitosis': <gensim.models.word2vec.Vocab at 0x7f61aa536978>,
 'offered': <gensim.models.word2vec.Vocab at 0x7f61aa5369e8>,
 'no..': <gensim.models.word2vec.Vocab at 0x7f61aa536a20>,
 '(there': <gensim.models.word2vec.Vocab at 0x7f61aa536a58>,
 'aspirations': <gensim.models.word2vec.Vocab at 0x7f61aa536a90>,
 'page': <gensim.models.word2vec.Vocab at 0x7f61aa536ac8>,
 'Least': <gensim.models.word2vec.Vocab at 0x7f61aa536b38>,
 'each': <gensim.models.word2vec.Vocab at 0x7f61aa536b70>,
 'ride...': <gensim.models.word2vec.Vocab at 0x7f61aa536ba8>,
 'doesn’t': <gensim.models.word2vec.Vocab at 0x7f61aa536c18>,
 'FUCK': <gensim.models.word2vec.Vocab at 0x7f61aa536c50>,
 'gona': <gensim.models.word2vec.Vocab at 0x7f61aa536dd8>,
 'window': <gensim.models.word2vec.Vocab at 0x7f61aa536e10>,
 'end': <gensim.models.word2vec.Vocab at 0x7f61aa536e48>,
 'expected': <gensim.models.word2vec.Vocab at 0x7f61aa536eb8>,
 'well.': <gensim.models.word2vec.Vocab at 0x7f61aa536ef0>,
 'called': <gensim.models.word2vec.Vocab at 0x7f61aa460748>,
 "needn't": <gensim.models.word2vec.Vocab at 0x7f61aa536f28>,
 'doesnt': <gensim.models.word2vec.Vocab at 0x7f61aa536f60>,
 'venturing': <gensim.models.word2vec.Vocab at 0x7f61aa440a90>,
 'alex': <gensim.models.word2vec.Vocab at 0x7f61aa536fd0>,
 'here:': <gensim.models.word2vec.Vocab at 0x7f61aa53b048>,
 'ewWw': <gensim.models.word2vec.Vocab at 0x7f61aa53b0b8>,
 'pole?': <gensim.models.word2vec.Vocab at 0x7f61aa53b0f0>,
 'melody,': <gensim.models.word2vec.Vocab at 0x7f61aa5b6eb8>,
 'motivated': <gensim.models.word2vec.Vocab at 0x7f61aa53b128>,
 'Well,': <gensim.models.word2vec.Vocab at 0x7f61aa53b160>,
 'says:': <gensim.models.word2vec.Vocab at 0x7f61aa53b198>,
 'worm': <gensim.models.word2vec.Vocab at 0x7f61aa53b1d0>,
 '[some': <gensim.models.word2vec.Vocab at 0x7f61aa553f98>,
 'name': <gensim.models.word2vec.Vocab at 0x7f61aa53b320>,
 'Leave"': <gensim.models.word2vec.Vocab at 0x7f61aa53b358>,
 '4th': <gensim.models.word2vec.Vocab at 0x7f61aa404ba8>,
 "It's...": <gensim.models.word2vec.Vocab at 0x7f61aa53b390>,
 'problem??': <gensim.models.word2vec.Vocab at 0x7f61aa553fd0>,
 'remember': <gensim.models.word2vec.Vocab at 0x7f61aa53b470>,
 'o': <gensim.models.word2vec.Vocab at 0x7f61aa4e32b0>,
 'letters.': <gensim.models.word2vec.Vocab at 0x7f61aa53b4a8>,
 'jean': <gensim.models.word2vec.Vocab at 0x7f61aa53b4e0>,
 'thing.': <gensim.models.word2vec.Vocab at 0x7f61aa53b518>,
 'friend?]:': <gensim.models.word2vec.Vocab at 0x7f61aa53b588>,
 'am!': <gensim.models.word2vec.Vocab at 0x7f61aa53b5c0>,
 'side...': <gensim.models.word2vec.Vocab at 0x7f61aa53b6a0>,
 'Yet': <gensim.models.word2vec.Vocab at 0x7f61aa53b6d8>,
 'easier': <gensim.models.word2vec.Vocab at 0x7f61aa53b828>,
 'babies': <gensim.models.word2vec.Vocab at 0x7f61aa53b860>,
 'You?': <gensim.models.word2vec.Vocab at 0x7f61aa53b898>,
 'wedding:': <gensim.models.word2vec.Vocab at 0x7f61aa53b8d0>,
 '2.)': <gensim.models.word2vec.Vocab at 0x7f61aa53b908>,
 'first...then': <gensim.models.word2vec.Vocab at 0x7f61aa53b940>,
 'LA:': <gensim.models.word2vec.Vocab at 0x7f61aa53b978>,
 'but,)': <gensim.models.word2vec.Vocab at 0x7f61aa53b9b0>,
 'not,': <gensim.models.word2vec.Vocab at 0x7f61aa53ba20>,
 'possession': <gensim.models.word2vec.Vocab at 0x7f61aa53ba58>,
 'its': <gensim.models.word2vec.Vocab at 0x7f61aa53ba90>,
 'stop': <gensim.models.word2vec.Vocab at 0x7f61aa53bac8>,
 'Thanks': <gensim.models.word2vec.Vocab at 0x7f61aa53bb00>,
 'durin': <gensim.models.word2vec.Vocab at 0x7f61aa53bb38>,
 'rings': <gensim.models.word2vec.Vocab at 0x7f61aa53bb70>,
 'Specifics': <gensim.models.word2vec.Vocab at 0x7f61aa53bba8>,
 'http://www.kingsofchaos.com/recruit.php?uniqid=jm8bja2z': <gensim.models.word2vec.Vocab at 0x7f61aa53bbe0>,
 'lace': <gensim.models.word2vec.Vocab at 0x7f61aa53bc18>,
 'pretended': <gensim.models.word2vec.Vocab at 0x7f61aa53bc50>,
 'clothes': <gensim.models.word2vec.Vocab at 0x7f61aa53bd30>,
 'wong': <gensim.models.word2vec.Vocab at 0x7f61aa53bd68>,
 '38': <gensim.models.word2vec.Vocab at 0x7f61aa5ce390>,
 'country.': <gensim.models.word2vec.Vocab at 0x7f61aa53bda0>,
 'criticism': <gensim.models.word2vec.Vocab at 0x7f61aa53bdd8>,
 'NATIONAL': <gensim.models.word2vec.Vocab at 0x7f61aa53be48>,
 "that's": <gensim.models.word2vec.Vocab at 0x7f61aa53beb8>,
 'conclusively': <gensim.models.word2vec.Vocab at 0x7f61aa53bef0>,
 'cartoons,': <gensim.models.word2vec.Vocab at 0x7f61aa53bf28>,
 'chest/lungs': <gensim.models.word2vec.Vocab at 0x7f61aa53bf60>,
 'whilst': <gensim.models.word2vec.Vocab at 0x7f61aa5dc7b8>,
 "I'm,": <gensim.models.word2vec.Vocab at 0x7f61aa3feb38>,
 'Tata.': <gensim.models.word2vec.Vocab at 0x7f61aa53bfd0>,
 'mix': <gensim.models.word2vec.Vocab at 0x7f61aa533160>,
 'popularity': <gensim.models.word2vec.Vocab at 0x7f61aa533390>,
 'park)': <gensim.models.word2vec.Vocab at 0x7f61aa5333c8>,
 '(trampled': <gensim.models.word2vec.Vocab at 0x7f61aa5336a0>,
 'reminded': <gensim.models.word2vec.Vocab at 0x7f61aa5339b0>,
 'says.': <gensim.models.word2vec.Vocab at 0x7f61aa533a58>,
 'repetition,': <gensim.models.word2vec.Vocab at 0x7f61aa533ac8>,
 'Size?': <gensim.models.word2vec.Vocab at 0x7f61aa533c18>,
 "hm...i'm": <gensim.models.word2vec.Vocab at 0x7f61aa533e10>,
 'interesting,': <gensim.models.word2vec.Vocab at 0x7f61aa560160>,
 'exams': <gensim.models.word2vec.Vocab at 0x7f61aa533f28>,
 'crusts.': <gensim.models.word2vec.Vocab at 0x7f61aa533f60>,
 'filling': <gensim.models.word2vec.Vocab at 0x7f61aa533fd0>,
 'gets': <gensim.models.word2vec.Vocab at 0x7f61aa4e51d0>,
 'his': <gensim.models.word2vec.Vocab at 0x7f61aa4e5208>,
 'Friday,': <gensim.models.word2vec.Vocab at 0x7f61aa4e5240>,
 'f': <gensim.models.word2vec.Vocab at 0x7f61aa4e5278>,
 'too!': <gensim.models.word2vec.Vocab at 0x7f61aa4e52e8>,
 'Made': <gensim.models.word2vec.Vocab at 0x7f61aa4e5400>,
 'accidentally': <gensim.models.word2vec.Vocab at 0x7f61aa4e5438>,
 '"New': <gensim.models.word2vec.Vocab at 0x7f61aa4e5470>,
 'COURSE.': <gensim.models.word2vec.Vocab at 0x7f61aa4e54a8>,
 '[please': <gensim.models.word2vec.Vocab at 0x7f61aa572240>,
 'this...': <gensim.models.word2vec.Vocab at 0x7f61aa4e5630>,
 'soon': <gensim.models.word2vec.Vocab at 0x7f61aa4e5710>,
 'worry': <gensim.models.word2vec.Vocab at 0x7f61aa4e57b8>,
 'Job]:': <gensim.models.word2vec.Vocab at 0x7f61aa4e58d0>,
 'deal': <gensim.models.word2vec.Vocab at 0x7f61aa4e59b0>,
 'pounding': <gensim.models.word2vec.Vocab at 0x7f61aa4e59e8>,
 '[Are': <gensim.models.word2vec.Vocab at 0x7f61aa4e5a90>,
 'begin': <gensim.models.word2vec.Vocab at 0x7f61aa4e5b00>,
 'isolated': <gensim.models.word2vec.Vocab at 0x7f61aa4e5c18>,
 'anyways': <gensim.models.word2vec.Vocab at 0x7f61aa4e5c50>,
 'garbage': <gensim.models.word2vec.Vocab at 0x7f61aa4e5c88>,
 'awww': <gensim.models.word2vec.Vocab at 0x7f61aa4e5cf8>,
 'intelligence': <gensim.models.word2vec.Vocab at 0x7f61aa4e5d68>,
 'being': <gensim.models.word2vec.Vocab at 0x7f61aa4e5e48>,
 'married?]:': <gensim.models.word2vec.Vocab at 0x7f61aa4e5eb8>,
 'omg': <gensim.models.word2vec.Vocab at 0x7f61aa440dd8>,
 '...': <gensim.models.word2vec.Vocab at 0x7f61aa4e5f28>,
 'highlight': <gensim.models.word2vec.Vocab at 0x7f61aa4e5fd0>,
 'to': <gensim.models.word2vec.Vocab at 0x7f61aa4e8978>,
 'AHH': <gensim.models.word2vec.Vocab at 0x7f61aa4e8b38>,
 'OVER!!!!!!!!!': <gensim.models.word2vec.Vocab at 0x7f61aa4e8b70>,
 'Cried': <gensim.models.word2vec.Vocab at 0x7f61aa4e8c18>,
 'SAYING?!?!?': <gensim.models.word2vec.Vocab at 0x7f61aa4e8c50>,
 'olivia.': <gensim.models.word2vec.Vocab at 0x7f61aa4e8da0>,
 "she'll": <gensim.models.word2vec.Vocab at 0x7f61aa4e8f60>,
 'community,': <gensim.models.word2vec.Vocab at 0x7f61aa4e8f98>,
 'cold.': <gensim.models.word2vec.Vocab at 0x7f61aa5dc978>,
 'not': <gensim.models.word2vec.Vocab at 0x7f61aa4e8898>,
 'transcripts': <gensim.models.word2vec.Vocab at 0x7f61aa4e8160>,
 'promises...i': <gensim.models.word2vec.Vocab at 0x7f61aa5c7ba8>,
 'totem': <gensim.models.word2vec.Vocab at 0x7f61aa4e82e8>,
 'naked,': <gensim.models.word2vec.Vocab at 0x7f61aa554320>,
 'hate': <gensim.models.word2vec.Vocab at 0x7f61aa4e8358>,
 'gas': <gensim.models.word2vec.Vocab at 0x7f61aa4e85c0>,
 'beat': <gensim.models.word2vec.Vocab at 0x7f61aa4e85f8>,
 'Jungle': <gensim.models.word2vec.Vocab at 0x7f61aa4e8748>,
 'band': <gensim.models.word2vec.Vocab at 0x7f61aa5697b8>,
 'ought': <gensim.models.word2vec.Vocab at 0x7f61aa4e8828>,
 'ishouldnt': <gensim.models.word2vec.Vocab at 0x7f61aa4e7128>,
 'funni': <gensim.models.word2vec.Vocab at 0x7f61aa4e7208>,
 'camera': <gensim.models.word2vec.Vocab at 0x7f61aa4e7278>,
 "Mom's": <gensim.models.word2vec.Vocab at 0x7f61aa4e7400>,
 'invitations': <gensim.models.word2vec.Vocab at 0x7f61aa4e7438>,
 'sheets,': <gensim.models.word2vec.Vocab at 0x7f61aa4e7470>,
 'sony': <gensim.models.word2vec.Vocab at 0x7f61aa4e74a8>,
 'Could': <gensim.models.word2vec.Vocab at 0x7f61aa4e7588>,
 '"goodness"': <gensim.models.word2vec.Vocab at 0x7f61aa4e75c0>,
 'commentators': <gensim.models.word2vec.Vocab at 0x7f61aa4e7668>,
 'learned': <gensim.models.word2vec.Vocab at 0x7f61aa4e7710>,
 'quit': <gensim.models.word2vec.Vocab at 0x7f61aa4e7748>,
 "mother's": <gensim.models.word2vec.Vocab at 0x7f61aa5dc9e8>,
 'Hussein,': <gensim.models.word2vec.Vocab at 0x7f61aa5b9320>,
 'Funny,': <gensim.models.word2vec.Vocab at 0x7f61aa4e7860>,
 'Actually': <gensim.models.word2vec.Vocab at 0x7f61aa4e7898>,
 'upsetting.': <gensim.models.word2vec.Vocab at 0x7f61aa4e7a90>,
 'ring!)': <gensim.models.word2vec.Vocab at 0x7f61aa4e7b00>,
 'material': <gensim.models.word2vec.Vocab at 0x7f61aa4e7b38>,
 '…': <gensim.models.word2vec.Vocab at 0x7f61aa4e7be0>,
 'kind': <gensim.models.word2vec.Vocab at 0x7f61aa4e7c18>,
 'Moon"': <gensim.models.word2vec.Vocab at 0x7f61aa4e7cc0>,
 'james,': <gensim.models.word2vec.Vocab at 0x7f61aa4e7d30>,
 'regardless': <gensim.models.word2vec.Vocab at 0x7f61aa4e7d68>,
 'WATCHED': <gensim.models.word2vec.Vocab at 0x7f61aa4e7da0>,
 'possibly': <gensim.models.word2vec.Vocab at 0x7f61aa5bbe10>,
 'Make': <gensim.models.word2vec.Vocab at 0x7f61aa4e7ef0>,
 'airplanes,': <gensim.models.word2vec.Vocab at 0x7f61aa463f60>,
 'Exaggerated,': <gensim.models.word2vec.Vocab at 0x7f61aa4e7f98>,
 'head,': <gensim.models.word2vec.Vocab at 0x7f61aa4e7fd0>,
 'graceful': <gensim.models.word2vec.Vocab at 0x7f61aa4d9128>,
 'but': <gensim.models.word2vec.Vocab at 0x7f61aa4d9550>,
 'low': <gensim.models.word2vec.Vocab at 0x7f61aa4d95f8>,
 'it!!!': <gensim.models.word2vec.Vocab at 0x7f61aa4d9710>,
 'usual)': <gensim.models.word2vec.Vocab at 0x7f61aa4d97f0>,
 'doing?:': <gensim.models.word2vec.Vocab at 0x7f61aa4d9908>,
 "wat's": <gensim.models.word2vec.Vocab at 0x7f61aa4d99b0>,
 'disadvantages': <gensim.models.word2vec.Vocab at 0x7f61aa5dcb00>,
 'breaks': <gensim.models.word2vec.Vocab at 0x7f61aa4d9cf8>,
 'partner,': <gensim.models.word2vec.Vocab at 0x7f61aa4d8048>,
 'totally': <gensim.models.word2vec.Vocab at 0x7f61aa4d80f0>,
 'break?!': <gensim.models.word2vec.Vocab at 0x7f61aa4d81d0>,
 'remember,': <gensim.models.word2vec.Vocab at 0x7f61aa3fedd8>,
 'nose.': <gensim.models.word2vec.Vocab at 0x7f61aa4d82b0>,
 '...gets': <gensim.models.word2vec.Vocab at 0x7f61aa4d82e8>,
 'circles': <gensim.models.word2vec.Vocab at 0x7f61aa4d8320>,
 'list?': <gensim.models.word2vec.Vocab at 0x7f61aa4d84a8>,
 'babble.': <gensim.models.word2vec.Vocab at 0x7f61aa4ec2e8>,
 'Those': <gensim.models.word2vec.Vocab at 0x7f61aa5c19b0>,
 'hers,': <gensim.models.word2vec.Vocab at 0x7f61aa554518>,
 'Kucinich).': <gensim.models.word2vec.Vocab at 0x7f61aa4d8a90>,
 'toxic,': <gensim.models.word2vec.Vocab at 0x7f61aa4d8ac8>,
 'mates.': <gensim.models.word2vec.Vocab at 0x7f61aa4d8be0>,
 'rock!': <gensim.models.word2vec.Vocab at 0x7f61aa4d8d68>,
 'birthday': <gensim.models.word2vec.Vocab at 0x7f61aa4d8e48>,
 'okay-': <gensim.models.word2vec.Vocab at 0x7f61aa4d8ef0>,
 'Twenty-six': <gensim.models.word2vec.Vocab at 0x7f61aa4d8f60>,
 'Molly': <gensim.models.word2vec.Vocab at 0x7f61aa4d8f98>,
 'everyone.i': <gensim.models.word2vec.Vocab at 0x7f61aa4d8fd0>,
 'brought': <gensim.models.word2vec.Vocab at 0x7f61aa4db320>,
 'rusty.': <gensim.models.word2vec.Vocab at 0x7f61aa4db358>,
 "Let's": <gensim.models.word2vec.Vocab at 0x7f61aa4db390>,
 'soon?': <gensim.models.word2vec.Vocab at 0x7f61aa4db400>,
 '19.': <gensim.models.word2vec.Vocab at 0x7f61aa4db4e0>,
 'shuffle': <gensim.models.word2vec.Vocab at 0x7f61aa4db8d0>,
 "you're": <gensim.models.word2vec.Vocab at 0x7f61aa4dbac8>,
 'somehow?': <gensim.models.word2vec.Vocab at 0x7f61aa4ec400>,
 'naked?]:': <gensim.models.word2vec.Vocab at 0x7f61aa5d4c18>,
 '...i': <gensim.models.word2vec.Vocab at 0x7f61aa4dbd68>,
 'friend': <gensim.models.word2vec.Vocab at 0x7f61aa4da048>,
 'away;': <gensim.models.word2vec.Vocab at 0x7f61aa4da320>,
 'tending': <gensim.models.word2vec.Vocab at 0x7f61aa4da358>,
 'creates': <gensim.models.word2vec.Vocab at 0x7f61aa4da5f8>,
 'certitude,': <gensim.models.word2vec.Vocab at 0x7f61aa4da668>,
 'job...some': <gensim.models.word2vec.Vocab at 0x7f61aa4da6a0>,
 'room.': <gensim.models.word2vec.Vocab at 0x7f61aa4da748>,
 '...will': <gensim.models.word2vec.Vocab at 0x7f61aa4da7b8>,
 'mincing': <gensim.models.word2vec.Vocab at 0x7f61aa4da8d0>,
 'dog/cat/bird/fish,': <gensim.models.word2vec.Vocab at 0x7f61aa4da908>,
 'way,': <gensim.models.word2vec.Vocab at 0x7f61aa5bf0b8>,
 'nvm...': <gensim.models.word2vec.Vocab at 0x7f61aa4daba8>,
 'illness,': <gensim.models.word2vec.Vocab at 0x7f61aa4dac18>,
 'good.': <gensim.models.word2vec.Vocab at 0x7f61aa4dacc0>,
 'bother??': <gensim.models.word2vec.Vocab at 0x7f61aa4dada0>,
 'curse': <gensim.models.word2vec.Vocab at 0x7f61aa4dadd8>,
 "daughter's": <gensim.models.word2vec.Vocab at 0x7f61aa4daf60>,
 '(albeit,': <gensim.models.word2vec.Vocab at 0x7f61aa4dc3c8>,
 'okay.': <gensim.models.word2vec.Vocab at 0x7f61aa4ede48>,
 'boxers': <gensim.models.word2vec.Vocab at 0x7f61aa4dc588>,
 'Calculus,': <gensim.models.word2vec.Vocab at 0x7f61aa4dc6a0>,
 'MEAN': <gensim.models.word2vec.Vocab at 0x7f61aa4dc7b8>,
 'rosie.': <gensim.models.word2vec.Vocab at 0x7f61aa4dc9e8>,
 'hard': <gensim.models.word2vec.Vocab at 0x7f61aa4dca90>,
 'life...think': <gensim.models.word2vec.Vocab at 0x7f61aa4dcba8>,
 'takes': <gensim.models.word2vec.Vocab at 0x7f61aa4dce48>,
 'pretty.': <gensim.models.word2vec.Vocab at 0x7f61aa5c1c88>,
 'award': <gensim.models.word2vec.Vocab at 0x7f61aa4dceb8>,
 'their': <gensim.models.word2vec.Vocab at 0x7f61aa4dcf28>,
 'plainly.': <gensim.models.word2vec.Vocab at 0x7f61aa4dcfd0>,
 'noone': <gensim.models.word2vec.Vocab at 0x7f61aa4ca1d0>,
 'say...no': <gensim.models.word2vec.Vocab at 0x7f61aa438978>,
 'thats': <gensim.models.word2vec.Vocab at 0x7f61aa4ca2e8>,
 'learning': <gensim.models.word2vec.Vocab at 0x7f61aa5dcd30>,
 'sleep': <gensim.models.word2vec.Vocab at 0x7f61aa4ca4a8>,
 'against': <gensim.models.word2vec.Vocab at 0x7f61aa4ca5c0>,
 'rubbish': <gensim.models.word2vec.Vocab at 0x7f61aa565358>,
 'years,': <gensim.models.word2vec.Vocab at 0x7f61aa4ca9b0>,
 'theatre)': <gensim.models.word2vec.Vocab at 0x7f61aa4caa20>,
 '[Kissed': <gensim.models.word2vec.Vocab at 0x7f61aa4caa90>,
 'love?': <gensim.models.word2vec.Vocab at 0x7f61aa4caba8>,
 'Forgetting': <gensim.models.word2vec.Vocab at 0x7f61aa4cada0>,
 'Whoever': <gensim.models.word2vec.Vocab at 0x7f61aa4cb358>,
 'bacon': <gensim.models.word2vec.Vocab at 0x7f61aa4cb438>,
 'wishing': <gensim.models.word2vec.Vocab at 0x7f61aa5ce940>,
 'fantastic.': <gensim.models.word2vec.Vocab at 0x7f61aa4cb898>,
 'rosalie...': <gensim.models.word2vec.Vocab at 0x7f61aa4cbc18>,
 'souned': <gensim.models.word2vec.Vocab at 0x7f61aa5dce48>,
 'bulbous': <gensim.models.word2vec.Vocab at 0x7f61aa4cbeb8>,
 'in-depth': <gensim.models.word2vec.Vocab at 0x7f61aa4cbef0>,
 'proof': <gensim.models.word2vec.Vocab at 0x7f61aa4cd240>,
 'however,': <gensim.models.word2vec.Vocab at 0x7f61aa4cd278>,
 'at': <gensim.models.word2vec.Vocab at 0x7f61aa4cd390>,
 "you'll": <gensim.models.word2vec.Vocab at 0x7f61aa4cd438>,
 'Will': <gensim.models.word2vec.Vocab at 0x7f61aa4cd780>,
 'Chotky': <gensim.models.word2vec.Vocab at 0x7f61aa4cda90>,
 'o0o!': <gensim.models.word2vec.Vocab at 0x7f61aa4cf2b0>,
 'overnight,': <gensim.models.word2vec.Vocab at 0x7f61aa442208>,
 '6.': <gensim.models.word2vec.Vocab at 0x7f61aa4cf400>,
 'expensive': <gensim.models.word2vec.Vocab at 0x7f61aa4cf518>,
 'employers': <gensim.models.word2vec.Vocab at 0x7f61aa4cf550>,
 'especially': <gensim.models.word2vec.Vocab at 0x7f61aa4cf828>,
 'lives,': <gensim.models.word2vec.Vocab at 0x7f61aa4cf860>,
 'dumb': <gensim.models.word2vec.Vocab at 0x7f61aa4cf898>,
 'EVERYONE!!!': <gensim.models.word2vec.Vocab at 0x7f61aa4cfc88>,
 'mind,': <gensim.models.word2vec.Vocab at 0x7f61aa4cfd68>,
 'terms': <gensim.models.word2vec.Vocab at 0x7f61aa4cffd0>,
 'deception': <gensim.models.word2vec.Vocab at 0x7f61aa4ce390>,
 'glad.': <gensim.models.word2vec.Vocab at 0x7f61aa4ce5c0>,
 '20:': <gensim.models.word2vec.Vocab at 0x7f61aa453e48>,
 'disappeared!!!!!!!!': <gensim.models.word2vec.Vocab at 0x7f61aa4ce978>,
 'candy:': <gensim.models.word2vec.Vocab at 0x7f61aa4ceba8>,
 'PRODUCTIVE!!': <gensim.models.word2vec.Vocab at 0x7f61aa5d7048>,
 'Goals': <gensim.models.word2vec.Vocab at 0x7f61aa4cec18>,
 'like,': <gensim.models.word2vec.Vocab at 0x7f61aa4cecf8>,
 'Carter': <gensim.models.word2vec.Vocab at 0x7f61aa4cedd8>,
 'So': <gensim.models.word2vec.Vocab at 0x7f61aa4cef60>,
 '5:': <gensim.models.word2vec.Vocab at 0x7f61aa4d2048>,
 'stalled.': <gensim.models.word2vec.Vocab at 0x7f61aa4d2208>,
 'fewer': <gensim.models.word2vec.Vocab at 0x7f61aa4d26d8>,
 'lies': <gensim.models.word2vec.Vocab at 0x7f61aa4d27b8>,
 'faces': <gensim.models.word2vec.Vocab at 0x7f61aa5b9828>,
 'im': <gensim.models.word2vec.Vocab at 0x7f61aa4d2898>,
 'kina': <gensim.models.word2vec.Vocab at 0x7f61aa4d2ba8>,
 'Each': <gensim.models.word2vec.Vocab at 0x7f61aa4d2e10>,
 'know...even': <gensim.models.word2vec.Vocab at 0x7f61aa4d2e48>,
 'thrown': <gensim.models.word2vec.Vocab at 0x7f61aa4d2eb8>,
 "can't": <gensim.models.word2vec.Vocab at 0x7f61aa4d3128>,
 'close-minded.': <gensim.models.word2vec.Vocab at 0x7f61aa4d31d0>,
 'aint': <gensim.models.word2vec.Vocab at 0x7f61aa4d3240>,
 'the': <gensim.models.word2vec.Vocab at 0x7f61aa4d36d8>,
 'Ikea': <gensim.models.word2vec.Vocab at 0x7f61aa4d37b8>,
 'trying': <gensim.models.word2vec.Vocab at 0x7f61aa4d38d0>,
 'Coulter': <gensim.models.word2vec.Vocab at 0x7f61aa4d3940>,
 'cleaner,': <gensim.models.word2vec.Vocab at 0x7f61aa4d3b00>,
 'Mix]"': <gensim.models.word2vec.Vocab at 0x7f61aa4d3ba8>,
 'surface,': <gensim.models.word2vec.Vocab at 0x7f61aa4d3c50>,
 'mean,': <gensim.models.word2vec.Vocab at 0x7f61aa4d50b8>,
 'Graham),': <gensim.models.word2vec.Vocab at 0x7f61aa4d5160>,
 'Congress,': <gensim.models.word2vec.Vocab at 0x7f61aa4d5198>,
 'animals': <gensim.models.word2vec.Vocab at 0x7f61aa4d5208>,
 'small': <gensim.models.word2vec.Vocab at 0x7f61aa4d5278>,
 'steps.': <gensim.models.word2vec.Vocab at 0x7f61aa4d5390>,
 '[relationship]': <gensim.models.word2vec.Vocab at 0x7f61aa4d5438>,
 '[Wanted': <gensim.models.word2vec.Vocab at 0x7f61aa4d55c0>,
 'finals...too': <gensim.models.word2vec.Vocab at 0x7f61aa4d55f8>,
 'definitely.': <gensim.models.word2vec.Vocab at 0x7f61aa554eb8>,
 'I:': <gensim.models.word2vec.Vocab at 0x7f61aa4d56a0>,
 'what...even': <gensim.models.word2vec.Vocab at 0x7f61aa4eef98>,
 '......': <gensim.models.word2vec.Vocab at 0x7f61aa4d5978>,
 'lies).': <gensim.models.word2vec.Vocab at 0x7f61aa4d59e8>,
 'longer': <gensim.models.word2vec.Vocab at 0x7f61aa4d5a90>,
 'animals.': <gensim.models.word2vec.Vocab at 0x7f61aa5b9908>,
 'mindless': <gensim.models.word2vec.Vocab at 0x7f61aa4d5b38>,
 'disappear….': <gensim.models.word2vec.Vocab at 0x7f61aa4d5cc0>,
 'places': <gensim.models.word2vec.Vocab at 0x7f61aa4d6c88>,
 'sheets.': <gensim.models.word2vec.Vocab at 0x7f61aa4d6cf8>,
 'here.': <gensim.models.word2vec.Vocab at 0x7f61aa4d6d68>,
 'both,': <gensim.models.word2vec.Vocab at 0x7f61aa4d6e10>,
 'xela': <gensim.models.word2vec.Vocab at 0x7f61aa4d6e80>,
 'creeping': <gensim.models.word2vec.Vocab at 0x7f61aa4d6be0>,
 'dressy': <gensim.models.word2vec.Vocab at 0x7f61aa4d6048>,
 'melting': <gensim.models.word2vec.Vocab at 0x7f61aa4d6198>,
 '30': <gensim.models.word2vec.Vocab at 0x7f61aa4d6240>,
 'Questions': <gensim.models.word2vec.Vocab at 0x7f61aa5b99e8>,
 'indicates': <gensim.models.word2vec.Vocab at 0x7f61aa4d6390>,
 'guess': <gensim.models.word2vec.Vocab at 0x7f61aa4d63c8>,
 '37': <gensim.models.word2vec.Vocab at 0x7f61aa4d6400>,
 'strong,': <gensim.models.word2vec.Vocab at 0x7f61aa4d6588>,
 "I'd": <gensim.models.word2vec.Vocab at 0x7f61aa4d6668>,
 'Band': <gensim.models.word2vec.Vocab at 0x7f61aa4d6940>,
 'portly.': <gensim.models.word2vec.Vocab at 0x7f61aa4d6a20>,
 'dere': <gensim.models.word2vec.Vocab at 0x7f61aa4d6ac8>,
 'weeee': <gensim.models.word2vec.Vocab at 0x7f61aa5b9ef0>,
 'reason': <gensim.models.word2vec.Vocab at 0x7f61aa4d6b00>,
 'az': <gensim.models.word2vec.Vocab at 0x7f61aa4d7208>,
 'pond..': <gensim.models.word2vec.Vocab at 0x7f61aa5e1ef0>,
 'anyway).': <gensim.models.word2vec.Vocab at 0x7f61aa4d77b8>,
 'adventurous': <gensim.models.word2vec.Vocab at 0x7f61aa4d7828>,
 'supply': <gensim.models.word2vec.Vocab at 0x7f61aa4d70b8>,
 'Bored': <gensim.models.word2vec.Vocab at 0x7f61aa4d7240>,
 'black': <gensim.models.word2vec.Vocab at 0x7f61aa4d7278>,
 'cambridge?': <gensim.models.word2vec.Vocab at 0x7f61aa4d7358>,
 'noise': <gensim.models.word2vec.Vocab at 0x7f61aa4d7438>,
 'Winnipeg.': <gensim.models.word2vec.Vocab at 0x7f61aa4d7470>,
 'There': <gensim.models.word2vec.Vocab at 0x7f61aa4d74e0>,
 'chat': <gensim.models.word2vec.Vocab at 0x7f61aa4d7588>,
 'HERE': <gensim.models.word2vec.Vocab at 0x7f61aa4d76a0>,
 'choose': <gensim.models.word2vec.Vocab at 0x7f61aa4d78d0>,
 'morality,': <gensim.models.word2vec.Vocab at 0x7f61aa4d7a90>,
 'favors': <gensim.models.word2vec.Vocab at 0x7f61aa4d7b38>,
 '[If': <gensim.models.word2vec.Vocab at 0x7f61aa4d7c50>,
 'nvm,': <gensim.models.word2vec.Vocab at 0x7f61aa4d7cc0>,
 'tragedy': <gensim.models.word2vec.Vocab at 0x7f61aa4d7d30>,
 'japanese': <gensim.models.word2vec.Vocab at 0x7f61aa4d7da0>,
 'invite': <gensim.models.word2vec.Vocab at 0x7f61aa54b780>,
 'way.': <gensim.models.word2vec.Vocab at 0x7f61aa4d7e10>,
 'HAPPY': <gensim.models.word2vec.Vocab at 0x7f61aa4d7f98>,
 'fierce': <gensim.models.word2vec.Vocab at 0x7f61aa5e78d0>,
 'fools': <gensim.models.word2vec.Vocab at 0x7f61aa4d13c8>,
 'goes': <gensim.models.word2vec.Vocab at 0x7f61aa4d1400>,
 'wafers': <gensim.models.word2vec.Vocab at 0x7f61aa4d1470>,
 ':-D': <gensim.models.word2vec.Vocab at 0x7f61aa5e7c88>,
 'feathers': <gensim.models.word2vec.Vocab at 0x7f61aa5e7e10>,
 'still...': <gensim.models.word2vec.Vocab at 0x7f61aa4425f8>,
 'selene': <gensim.models.word2vec.Vocab at 0x7f61aa4d15c0>,
 'dinner"': <gensim.models.word2vec.Vocab at 0x7f61aa4d16a0>,
 'EVERY': <gensim.models.word2vec.Vocab at 0x7f61aa4d16d8>,
 '(2)': <gensim.models.word2vec.Vocab at 0x7f61aa4d1710>,
 'hormones': <gensim.models.word2vec.Vocab at 0x7f61aa4d1860>,
 'singing': <gensim.models.word2vec.Vocab at 0x7f61aa4d1898>,
 'carry': <gensim.models.word2vec.Vocab at 0x7f61aa4d1a58>,
 'bestfriend': <gensim.models.word2vec.Vocab at 0x7f61aa4d1ac8>,
 'AmeriCorps': <gensim.models.word2vec.Vocab at 0x7f61aa4d1b00>,
 'tuesday': <gensim.models.word2vec.Vocab at 0x7f61aa4d1c50>,
 'plants.': <gensim.models.word2vec.Vocab at 0x7f61aa4d1fd0>,
 'Presidential': <gensim.models.word2vec.Vocab at 0x7f61aa4d1048>,
 'dunno...i': <gensim.models.word2vec.Vocab at 0x7f61aa4d1278>,
 '[few': <gensim.models.word2vec.Vocab at 0x7f61aa4dd048>,
 'exercise.': <gensim.models.word2vec.Vocab at 0x7f61aa4dd080>,
 'WITH': <gensim.models.word2vec.Vocab at 0x7f61aa4dd198>,
 'Figueroa': <gensim.models.word2vec.Vocab at 0x7f61aa4dd1d0>,
 'softens': <gensim.models.word2vec.Vocab at 0x7f61aa4dd320>,
 'true.': <gensim.models.word2vec.Vocab at 0x7f61aa466eb8>,
 'ballpark': <gensim.models.word2vec.Vocab at 0x7f61aa4dd588>,
 'sleep,': <gensim.models.word2vec.Vocab at 0x7f61aa4dd5f8>,
 'names.': <gensim.models.word2vec.Vocab at 0x7f61aa4dd6d8>,
 'you’re': <gensim.models.word2vec.Vocab at 0x7f61aa4dd710>,
 'price': <gensim.models.word2vec.Vocab at 0x7f61aa4dd7f0>,
 'pig': <gensim.models.word2vec.Vocab at 0x7f61aa4dd940>,
 'time:': <gensim.models.word2vec.Vocab at 0x7f61aa4dda20>,
 'Colella': <gensim.models.word2vec.Vocab at 0x7f61aa4dda90>,
 'gift': <gensim.models.word2vec.Vocab at 0x7f61aa4ddb70>,
 'american': <gensim.models.word2vec.Vocab at 0x7f61aa4ddba8>,
 'poopie': <gensim.models.word2vec.Vocab at 0x7f61aa4ddcf8>,
 'floor': <gensim.models.word2vec.Vocab at 0x7f61aa4ddd68>,
 'talked': <gensim.models.word2vec.Vocab at 0x7f61aa4dddd8>,
 'age': <gensim.models.word2vec.Vocab at 0x7f61aa4dde10>,
 'sad.': <gensim.models.word2vec.Vocab at 0x7f61aa4dde48>,
 'usually': <gensim.models.word2vec.Vocab at 0x7f61aa4dde80>,
 "i'd": <gensim.models.word2vec.Vocab at 0x7f61aa4040f0>,
 'New]:': <gensim.models.word2vec.Vocab at 0x7f61aa4ddeb8>,
 'out,': <gensim.models.word2vec.Vocab at 0x7f61aa4ddef0>,
 'Secondly,': <gensim.models.word2vec.Vocab at 0x7f61aa4ddf28>,
 'kicked': <gensim.models.word2vec.Vocab at 0x7f61aa4e0048>,
 'stuff': <gensim.models.word2vec.Vocab at 0x7f61aa4e0080>,
 'essences': <gensim.models.word2vec.Vocab at 0x7f61aa4e0128>,
 'live': <gensim.models.word2vec.Vocab at 0x7f61aa4e0198>,
 'aditi.': <gensim.models.word2vec.Vocab at 0x7f61aa4e01d0>,
 'prepare,': <gensim.models.word2vec.Vocab at 0x7f61aa4e0320>,
 'Ave': <gensim.models.word2vec.Vocab at 0x7f61aa4e0358>,
 'Given': <gensim.models.word2vec.Vocab at 0x7f61aa4e0438>,
 'C"': <gensim.models.word2vec.Vocab at 0x7f61aa4e04e0>,
 'touching': <gensim.models.word2vec.Vocab at 0x7f61aa4e0588>,
 'Jeep),': <gensim.models.word2vec.Vocab at 0x7f61aa5df438>,
 'Los': <gensim.models.word2vec.Vocab at 0x7f61aa4e0668>,
 'wide.': <gensim.models.word2vec.Vocab at 0x7f61aa4e06d8>,
 'though.': <gensim.models.word2vec.Vocab at 0x7f61aa4e0748>,
 'sometime,': <gensim.models.word2vec.Vocab at 0x7f61aa554dd8>,
 'had.': <gensim.models.word2vec.Vocab at 0x7f61aa4e07f0>,
 'dreams': <gensim.models.word2vec.Vocab at 0x7f61aa4e0908>,
 'jobs': <gensim.models.word2vec.Vocab at 0x7f61aa4e0940>,
 'bike': <gensim.models.word2vec.Vocab at 0x7f61aa4e09b0>,
 'waterfall': <gensim.models.word2vec.Vocab at 0x7f61aa4e09e8>,
 'uhh....': <gensim.models.word2vec.Vocab at 0x7f61aa4e0a58>,
 'strenuous': <gensim.models.word2vec.Vocab at 0x7f61aa4e0a90>,
 'overly-perky': <gensim.models.word2vec.Vocab at 0x7f61aa554e48>,
 '....that': <gensim.models.word2vec.Vocab at 0x7f61aa4e0b70>,
 'fraud': <gensim.models.word2vec.Vocab at 0x7f61aa4e0be0>,
 'ahaha': <gensim.models.word2vec.Vocab at 0x7f61aa4e0c88>,
 'New': <gensim.models.word2vec.Vocab at 0x7f61aa4e0cc0>,
 'shopping': <gensim.models.word2vec.Vocab at 0x7f61aa4e0d30>,
 'extra': <gensim.models.word2vec.Vocab at 0x7f61aa4e0d68>,
 'use.': <gensim.models.word2vec.Vocab at 0x7f61aa4e0e10>,
 'running--while': <gensim.models.word2vec.Vocab at 0x7f61aa4e0e80>,
 "won't": <gensim.models.word2vec.Vocab at 0x7f61aa4e0ef0>,
 'no:': <gensim.models.word2vec.Vocab at 0x7f61aa4e0f28>,
 'verb,': <gensim.models.word2vec.Vocab at 0x7f61aa4e0f60>,
 'punch': <gensim.models.word2vec.Vocab at 0x7f61aa4e0f98>,
 'tamar.': <gensim.models.word2vec.Vocab at 0x7f61aa4e0fd0>,
 'summer': <gensim.models.word2vec.Vocab at 0x7f61aa4e3080>,
 'got': <gensim.models.word2vec.Vocab at 0x7f61aa4e30f0>,
 'breath,': <gensim.models.word2vec.Vocab at 0x7f61aa4e3240>,
 'answer': <gensim.models.word2vec.Vocab at 0x7f61aa4e3278>,
 'selves': <gensim.models.word2vec.Vocab at 0x7f61aa5d4eb8>,
 'everthing': <gensim.models.word2vec.Vocab at 0x7f61aa4dd908>,
 'nap,': <gensim.models.word2vec.Vocab at 0x7f61aa4e3470>,
 'CBC': <gensim.models.word2vec.Vocab at 0x7f61aa4e34a8>,
 'argument': <gensim.models.word2vec.Vocab at 0x7f61aa4e34e0>,
 'if': <gensim.models.word2vec.Vocab at 0x7f61aa4e3550>,
 'sorts': <gensim.models.word2vec.Vocab at 0x7f61aa5edd30>,
 'fields,': <gensim.models.word2vec.Vocab at 0x7f61aa4e35c0>,
 'canning': <gensim.models.word2vec.Vocab at 0x7f61aa4e36a0>,
 'worry..': <gensim.models.word2vec.Vocab at 0x7f61aa4e36d8>,
 'curtains!': <gensim.models.word2vec.Vocab at 0x7f61aa4e3828>,
 'why…': <gensim.models.word2vec.Vocab at 0x7f61aa4e38d0>,
 'fainting': <gensim.models.word2vec.Vocab at 0x7f61aa4e3940>,
 'ONLY': <gensim.models.word2vec.Vocab at 0x7f61aa4e3978>,
 'no-one': <gensim.models.word2vec.Vocab at 0x7f61aa4e3a58>,
 'floating': <gensim.models.word2vec.Vocab at 0x7f61aa4e3a90>,
 'messy,': <gensim.models.word2vec.Vocab at 0x7f61aa4e3b38>,
 'third': <gensim.models.word2vec.Vocab at 0x7f61aa4e3ba8>,
 'stood,': <gensim.models.word2vec.Vocab at 0x7f61aa4e3c18>,
 'fishing?': <gensim.models.word2vec.Vocab at 0x7f61aa4e3cc0>,
 'shall': <gensim.models.word2vec.Vocab at 0x7f61aa4e3d30>,
 'everything': <gensim.models.word2vec.Vocab at 0x7f61aa4e3d68>,
 'dog': <gensim.models.word2vec.Vocab at 0x7f61aa4e3da0>,
 'semester!': <gensim.models.word2vec.Vocab at 0x7f61aa53ca20>,
 'hurts': <gensim.models.word2vec.Vocab at 0x7f61aa4e3dd8>,
 'blab': <gensim.models.word2vec.Vocab at 0x7f61aa4e3e10>,
 'Cyan425:': <gensim.models.word2vec.Vocab at 0x7f61aa4e3e80>,
 'kid': <gensim.models.word2vec.Vocab at 0x7f61aa4e3eb8>,
 'Rumsfeld': <gensim.models.word2vec.Vocab at 0x7f61aa4e3ef0>,
 'be:': <gensim.models.word2vec.Vocab at 0x7f61aa4e3f60>,
 'character': <gensim.models.word2vec.Vocab at 0x7f61aa4e3f98>,
 'too;': <gensim.models.word2vec.Vocab at 0x7f61aa4e3fd0>,
 'cheese.': <gensim.models.word2vec.Vocab at 0x7f61aa4e4048>,
 'showin': <gensim.models.word2vec.Vocab at 0x7f61aa4e4080>,
 'DiFranco.': <gensim.models.word2vec.Vocab at 0x7f61aa4e40b8>,
 'weeks.': <gensim.models.word2vec.Vocab at 0x7f61aa4e40f0>,
 'authorized': <gensim.models.word2vec.Vocab at 0x7f61aa4e4128>,
 'Or': <gensim.models.word2vec.Vocab at 0x7f61aa4e4208>,
 'easier.': <gensim.models.word2vec.Vocab at 0x7f61aa4e4278>,
 'deserve': <gensim.models.word2vec.Vocab at 0x7f61aa4e42b0>,
 'reads': <gensim.models.word2vec.Vocab at 0x7f61aa4e42e8>,
 'beautiful': <gensim.models.word2vec.Vocab at 0x7f61aa4e4390>,
 'avril': <gensim.models.word2vec.Vocab at 0x7f61aa4e43c8>,
 'days.': <gensim.models.word2vec.Vocab at 0x7f61aa5b9e80>,
 '"can': <gensim.models.word2vec.Vocab at 0x7f61aa4e4400>,
 'player:': <gensim.models.word2vec.Vocab at 0x7f61aa4e4470>,
 'american??': <gensim.models.word2vec.Vocab at 0x7f61aa4e44a8>,
 'Michelle': <gensim.models.word2vec.Vocab at 0x7f61aa4e44e0>,
 'confusing,': <gensim.models.word2vec.Vocab at 0x7f61aa4e4550>,
 'YoUr': <gensim.models.word2vec.Vocab at 0x7f61aa4e4588>,
 'away...': <gensim.models.word2vec.Vocab at 0x7f61aa4e5358>,
 'handed': <gensim.models.word2vec.Vocab at 0x7f61aa4e45f8>,
 'casual': <gensim.models.word2vec.Vocab at 0x7f61aa4e46a0>,
 'colorful': <gensim.models.word2vec.Vocab at 0x7f61aa4e4828>,
 'lives.': <gensim.models.word2vec.Vocab at 0x7f61aa4e4898>,
 'selfishness...busying': <gensim.models.word2vec.Vocab at 0x7f61aa4e4978>,
 'shakes': <gensim.models.word2vec.Vocab at 0x7f61aa4e4a20>,
 'workouts.': <gensim.models.word2vec.Vocab at 0x7f61aa4e4b70>,
 'upon': <gensim.models.word2vec.Vocab at 0x7f61aa4e4cc0>,
 'BACK': <gensim.models.word2vec.Vocab at 0x7f61aa4e4d68>,
 'Radio': <gensim.models.word2vec.Vocab at 0x7f61aa4e4e48>,
 '"Truly,': <gensim.models.word2vec.Vocab at 0x7f61aa4e4e80>,
 'lord': <gensim.models.word2vec.Vocab at 0x7f61aa4e4ef0>,
 'Opening': <gensim.models.word2vec.Vocab at 0x7f61aa4e4f28>,
 'counts?': <gensim.models.word2vec.Vocab at 0x7f61aa4e4f60>,
 'sorry?': <gensim.models.word2vec.Vocab at 0x7f61aa5df780>,
 'His': <gensim.models.word2vec.Vocab at 0x7f61aa4e1710>,
 'article': <gensim.models.word2vec.Vocab at 0x7f61aa4e1860>,
 '(Dear': <gensim.models.word2vec.Vocab at 0x7f61aa4e1898>,
 'FAITH': <gensim.models.word2vec.Vocab at 0x7f61aa4e1b70>,
 'Girl**': <gensim.models.word2vec.Vocab at 0x7f61aa4e1c18>,
 'school': <gensim.models.word2vec.Vocab at 0x7f61aa5df7b8>,
 'hheeh.': <gensim.models.word2vec.Vocab at 0x7f61aa4e1dd8>,
 'done,': <gensim.models.word2vec.Vocab at 0x7f61aa4e1e48>,
 'foot': <gensim.models.word2vec.Vocab at 0x7f61aa4e1eb8>,
 'change...ppl': <gensim.models.word2vec.Vocab at 0x7f61aa4e1f28>,
 'lungs': <gensim.models.word2vec.Vocab at 0x7f61aa4e1fd0>,
 "didn't": <gensim.models.word2vec.Vocab at 0x7f61aa4e1630>,
 ']': <gensim.models.word2vec.Vocab at 0x7f61aa4e1048>,
 'summer.': <gensim.models.word2vec.Vocab at 0x7f61aa4e1080>,
 'side,': <gensim.models.word2vec.Vocab at 0x7f61aa4e10b8>,
 'this': <gensim.models.word2vec.Vocab at 0x7f61aa4e1128>,
 'step': <gensim.models.word2vec.Vocab at 0x7f61aa4e1160>,
 'sloth': <gensim.models.word2vec.Vocab at 0x7f61aa4e11d0>,
 'essences,': <gensim.models.word2vec.Vocab at 0x7f61aa4e1438>,
 'spice': <gensim.models.word2vec.Vocab at 0x7f61aa4e14e0>,
 'Interesting:': <gensim.models.word2vec.Vocab at 0x7f61aa4e1518>,
 'survive': <gensim.models.word2vec.Vocab at 0x7f61aa4e1588>,
 'intelligence"': <gensim.models.word2vec.Vocab at 0x7f61aa4e15c0>,
 'cliff': <gensim.models.word2vec.Vocab at 0x7f61aa53c048>,
 'dragging': <gensim.models.word2vec.Vocab at 0x7f61aa53c080>,
 'Worst': <gensim.models.word2vec.Vocab at 0x7f61aa5c1ba8>,
 '"L"': <gensim.models.word2vec.Vocab at 0x7f61aa53c160>,
 'columnists': <gensim.models.word2vec.Vocab at 0x7f61aa53c198>,
 'shopping.': <gensim.models.word2vec.Vocab at 0x7f61aa53c1d0>,
 'have...satisfied': <gensim.models.word2vec.Vocab at 0x7f61aa53c208>,
 'lie.': <gensim.models.word2vec.Vocab at 0x7f61aa53c278>,
 'flying': <gensim.models.word2vec.Vocab at 0x7f61aa53c320>,
 'perhaps': <gensim.models.word2vec.Vocab at 0x7f61aa53c358>,
 'myself..': <gensim.models.word2vec.Vocab at 0x7f61aa53c390>,
 'thing.)': <gensim.models.word2vec.Vocab at 0x7f61aa53c3c8>,
 'shattered': <gensim.models.word2vec.Vocab at 0x7f61aa53c400>,
 'ACL': <gensim.models.word2vec.Vocab at 0x7f61aa53c438>,
 'dressed,': <gensim.models.word2vec.Vocab at 0x7f61aa53c4a8>,
 'someone...and': <gensim.models.word2vec.Vocab at 0x7f61aa53c588>,
 'Random': <gensim.models.word2vec.Vocab at 0x7f61aa558a20>,
 'painful': <gensim.models.word2vec.Vocab at 0x7f61aa53c5f8>,
 'Florida?]:': <gensim.models.word2vec.Vocab at 0x7f61aa53c630>,
 'Gulf': <gensim.models.word2vec.Vocab at 0x7f61aa53c668>,
 'stupid': <gensim.models.word2vec.Vocab at 0x7f61aa53c6a0>,
 'kneecap': <gensim.models.word2vec.Vocab at 0x7f61aa53c6d8>,
 '26th': <gensim.models.word2vec.Vocab at 0x7f61aa53c710>,
 'recently': <gensim.models.word2vec.Vocab at 0x7f61aa53c748>,
 'Eye': <gensim.models.word2vec.Vocab at 0x7f61aa53c780>,
 'Insecure:': <gensim.models.word2vec.Vocab at 0x7f61aa53c7b8>,
 'Organized:': <gensim.models.word2vec.Vocab at 0x7f61aa53c7f0>,
 'school...*sigh*': <gensim.models.word2vec.Vocab at 0x7f61aa53c828>,
 'shoulders': <gensim.models.word2vec.Vocab at 0x7f61aa53c860>,
 'MoO': <gensim.models.word2vec.Vocab at 0x7f61aa53c898>,
 'following': <gensim.models.word2vec.Vocab at 0x7f61aa53c8d0>,
 'on,': <gensim.models.word2vec.Vocab at 0x7f61aa53c908>,
 'pollution,': <gensim.models.word2vec.Vocab at 0x7f61aa53c940>,
 'rosalie': <gensim.models.word2vec.Vocab at 0x7f61aa53c978>,
 'law': <gensim.models.word2vec.Vocab at 0x7f61aa53c9b0>,
 'norway,': <gensim.models.word2vec.Vocab at 0x7f61aa53c9e8>,
 'have]': <gensim.models.word2vec.Vocab at 0x7f61aa5b6438>,
 '...cheers': <gensim.models.word2vec.Vocab at 0x7f61aa53ca58>,
 'DrAmA': <gensim.models.word2vec.Vocab at 0x7f61aa53ca90>,
 'searching': <gensim.models.word2vec.Vocab at 0x7f61aa5b4240>,
 'people!': <gensim.models.word2vec.Vocab at 0x7f61aa53cb00>,
 'fun!': <gensim.models.word2vec.Vocab at 0x7f61aa53cb38>,
 'Yellowcard': <gensim.models.word2vec.Vocab at 0x7f61aa53cb70>,
 'terminally': <gensim.models.word2vec.Vocab at 0x7f61aa53cba8>,
 'right.': <gensim.models.word2vec.Vocab at 0x7f61aa53cbe0>,
 'feet': <gensim.models.word2vec.Vocab at 0x7f61aa53cc18>,
 'person.': <gensim.models.word2vec.Vocab at 0x7f61aa53cc50>,
 "they're": <gensim.models.word2vec.Vocab at 0x7f61aa53cc88>,
 'Opposition': <gensim.models.word2vec.Vocab at 0x7f61aa53ccc0>,
 "veterans'": <gensim.models.word2vec.Vocab at 0x7f61aa53ccf8>,
 'Quiz': <gensim.models.word2vec.Vocab at 0x7f61aa53cd30>,
 'lying,': <gensim.models.word2vec.Vocab at 0x7f61aa53cd68>,
 '7.': <gensim.models.word2vec.Vocab at 0x7f61aa53cda0>,
 'mention': <gensim.models.word2vec.Vocab at 0x7f61aa53cdd8>,
 'weirdest': <gensim.models.word2vec.Vocab at 0x7f61aa53ce10>,
 '"Stay': <gensim.models.word2vec.Vocab at 0x7f61aa5d7b00>,
 'rear': <gensim.models.word2vec.Vocab at 0x7f61aa53ce48>,
 'clairol': <gensim.models.word2vec.Vocab at 0x7f61aa53ce80>,
 'nvm': <gensim.models.word2vec.Vocab at 0x7f61aa53ceb8>,
 'minute': <gensim.models.word2vec.Vocab at 0x7f61aa558358>,
 'getting': <gensim.models.word2vec.Vocab at 0x7f61aa53cf60>,
 'prefer': <gensim.models.word2vec.Vocab at 0x7f61aa53cf98>,
 'open': <gensim.models.word2vec.Vocab at 0x7f61aa53cfd0>,
 'feeble': <gensim.models.word2vec.Vocab at 0x7f61aa54b048>,
 'October': <gensim.models.word2vec.Vocab at 0x7f61aa5bb160>,
 'LIKE': <gensim.models.word2vec.Vocab at 0x7f61aa54b0b8>,
 'do': <gensim.models.word2vec.Vocab at 0x7f61aa54b0f0>,
 'amount': <gensim.models.word2vec.Vocab at 0x7f61aa54b128>,
 'gerbils': <gensim.models.word2vec.Vocab at 0x7f61aa54b160>,
 'nasty': <gensim.models.word2vec.Vocab at 0x7f61aa558400>,
 'Responsible:': <gensim.models.word2vec.Vocab at 0x7f61aa54b1d0>,
 'America.': <gensim.models.word2vec.Vocab at 0x7f61aa54b208>,
 '"I\'d': <gensim.models.word2vec.Vocab at 0x7f61aa54b240>,
 'game': <gensim.models.word2vec.Vocab at 0x7f61aa54b278>,
 'behind"': <gensim.models.word2vec.Vocab at 0x7f61aa54b2b0>,
 'Free': <gensim.models.word2vec.Vocab at 0x7f61aa54b2e8>,
 '6:30.': <gensim.models.word2vec.Vocab at 0x7f61aa54b320>,
 'doom,': <gensim.models.word2vec.Vocab at 0x7f61aa54b358>,
 'family,': <gensim.models.word2vec.Vocab at 0x7f61aa54b390>,
 'odd': <gensim.models.word2vec.Vocab at 0x7f61aa54b3c8>,
 'bio': <gensim.models.word2vec.Vocab at 0x7f61aa54b400>,
 'going...': <gensim.models.word2vec.Vocab at 0x7f61aa54b438>,
 'post-its,': <gensim.models.word2vec.Vocab at 0x7f61aa54b470>,
 'teachers': <gensim.models.word2vec.Vocab at 0x7f61aa54b4a8>,
 'Time': <gensim.models.word2vec.Vocab at 0x7f61aa54b4e0>,
 '11:10': <gensim.models.word2vec.Vocab at 0x7f61aa54b518>,
 'orchestra...': <gensim.models.word2vec.Vocab at 0x7f61aa569b38>,
 'jacket': <gensim.models.word2vec.Vocab at 0x7f61aa54b588>,
 'Talkative:': <gensim.models.word2vec.Vocab at 0x7f61aa54b5c0>,
 'left-middle': <gensim.models.word2vec.Vocab at 0x7f61aa54b5f8>,
 'radical': <gensim.models.word2vec.Vocab at 0x7f61aa54b630>,
 'forever.': <gensim.models.word2vec.Vocab at 0x7f61aa54b668>,
 'Guess': <gensim.models.word2vec.Vocab at 0x7f61aa560b38>,
 'them,': <gensim.models.word2vec.Vocab at 0x7f61aa54b6d8>,
 'normal,': <gensim.models.word2vec.Vocab at 0x7f61aa5dfc18>,
 "lavigne's": <gensim.models.word2vec.Vocab at 0x7f61aa54b748>,
 'places.': <gensim.models.word2vec.Vocab at 0x7f61aa54b7b8>,
 'laugh': <gensim.models.word2vec.Vocab at 0x7f61aa54b7f0>,
 'vik': <gensim.models.word2vec.Vocab at 0x7f61aa54b860>,
 'yet...or': <gensim.models.word2vec.Vocab at 0x7f61aa5cec50>,
 'night..': <gensim.models.word2vec.Vocab at 0x7f61aa54b898>,
 'states': <gensim.models.word2vec.Vocab at 0x7f61aa54b8d0>,
 'done)': <gensim.models.word2vec.Vocab at 0x7f61aa54b908>,
 'excuses': <gensim.models.word2vec.Vocab at 0x7f61aa5dfcc0>,
 'treason.': <gensim.models.word2vec.Vocab at 0x7f61aa54b978>,
 'Gold': <gensim.models.word2vec.Vocab at 0x7f61aa54b9b0>,
 'words?': <gensim.models.word2vec.Vocab at 0x7f61aa54b9e8>,
 'fall': <gensim.models.word2vec.Vocab at 0x7f61aa54ba20>,
 'online': <gensim.models.word2vec.Vocab at 0x7f61aa54ba58>,
 'lips,': <gensim.models.word2vec.Vocab at 0x7f61aa54ba90>,
 'PLEAAAASSSSSSEEEEEEE': <gensim.models.word2vec.Vocab at 0x7f61aa54bac8>,
 'God': <gensim.models.word2vec.Vocab at 0x7f61aa54bb00>,
 'b/c': <gensim.models.word2vec.Vocab at 0x7f61aa54bb38>,
 'worst': <gensim.models.word2vec.Vocab at 0x7f61aa54bb70>,
 'cancelling': <gensim.models.word2vec.Vocab at 0x7f61aa54bba8>,
 'by': <gensim.models.word2vec.Vocab at 0x7f61aa54bbe0>,
 'BS': <gensim.models.word2vec.Vocab at 0x7f61aa54bc18>,
 'bugs': <gensim.models.word2vec.Vocab at 0x7f61aa54bc50>,
 'succumb': <gensim.models.word2vec.Vocab at 0x7f61aa54bc88>,
 'baby...': <gensim.models.word2vec.Vocab at 0x7f61aa54bcc0>,
 'seems': <gensim.models.word2vec.Vocab at 0x7f61aa54bcf8>,
 'color(s):': <gensim.models.word2vec.Vocab at 0x7f61aa54bd30>,
 'Washington-based': <gensim.models.word2vec.Vocab at 0x7f61aa54bd68>,
 'support': <gensim.models.word2vec.Vocab at 0x7f61aa54bda0>,
 'never)': <gensim.models.word2vec.Vocab at 0x7f61aa54bdd8>,
 'afternoon': <gensim.models.word2vec.Vocab at 0x7f61aa54be10>,
 'sprints.': <gensim.models.word2vec.Vocab at 0x7f61aa54be48>,
 'tank': <gensim.models.word2vec.Vocab at 0x7f61aa54be80>,
 'center': <gensim.models.word2vec.Vocab at 0x7f61aa445080>,
 'repetition': <gensim.models.word2vec.Vocab at 0x7f61aa54bef0>,
 'loneliness': <gensim.models.word2vec.Vocab at 0x7f61aa5e5a90>,
 '"Fast': <gensim.models.word2vec.Vocab at 0x7f61aa54bf60>,
 'UNDERWORLD': <gensim.models.word2vec.Vocab at 0x7f61aa438208>,
 '(hmm,': <gensim.models.word2vec.Vocab at 0x7f61aa54bfd0>,
 'shoes.': <gensim.models.word2vec.Vocab at 0x7f61aa54f048>,
 '(chocolate': <gensim.models.word2vec.Vocab at 0x7f61aa54f080>,
 'THE': <gensim.models.word2vec.Vocab at 0x7f61aa54f0b8>,
 'bakin': <gensim.models.word2vec.Vocab at 0x7f61aa54f0f0>,
 'those': <gensim.models.word2vec.Vocab at 0x7f61aa54f128>,
 'post...my': <gensim.models.word2vec.Vocab at 0x7f61aa5dfe48>,
 'about.': <gensim.models.word2vec.Vocab at 0x7f61aa54f198>,
 'helped': <gensim.models.word2vec.Vocab at 0x7f61aa54f1d0>,
 'hit': <gensim.models.word2vec.Vocab at 0x7f61aa54f240>,
 'unlike': <gensim.models.word2vec.Vocab at 0x7f61aa54f278>,
 'comments,': <gensim.models.word2vec.Vocab at 0x7f61aa54f2b0>,
 'yellow.': <gensim.models.word2vec.Vocab at 0x7f61aa54f2e8>,
 'youll': <gensim.models.word2vec.Vocab at 0x7f61aa54f320>,
 'Finally': <gensim.models.word2vec.Vocab at 0x7f61aa54f358>,
 'David': <gensim.models.word2vec.Vocab at 0x7f61aa54f390>,
 'cover': <gensim.models.word2vec.Vocab at 0x7f61aa54f3c8>,
 'Colin': <gensim.models.word2vec.Vocab at 0x7f61aa54f400>,
 'complain': <gensim.models.word2vec.Vocab at 0x7f61aa54f438>,
 'sometime': <gensim.models.word2vec.Vocab at 0x7f61aa54f470>,
 'shore,': <gensim.models.word2vec.Vocab at 0x7f61aa54f4a8>,
 'be?]:': <gensim.models.word2vec.Vocab at 0x7f61aa4420b8>,
 'lee': <gensim.models.word2vec.Vocab at 0x7f61aa54f4e0>,
 'Lonely': <gensim.models.word2vec.Vocab at 0x7f61aa54f518>,
 'starred': <gensim.models.word2vec.Vocab at 0x7f61aa54f550>,
 'sumtin': <gensim.models.word2vec.Vocab at 0x7f61aa54f588>,
 'tints?': <gensim.models.word2vec.Vocab at 0x7f61aa54f5c0>,
 'homework': <gensim.models.word2vec.Vocab at 0x7f61aa54f5f8>,
 'towers': <gensim.models.word2vec.Vocab at 0x7f61aa54f630>,
 'saddest': <gensim.models.word2vec.Vocab at 0x7f61aa46bcf8>,
 'Garden,': <gensim.models.word2vec.Vocab at 0x7f61aa54f668>,
 'green,': <gensim.models.word2vec.Vocab at 0x7f61aa54f6a0>,
 'you:': <gensim.models.word2vec.Vocab at 0x7f61aa54f6d8>,
 'sex?': <gensim.models.word2vec.Vocab at 0x7f61aa54f710>,
 'black,': <gensim.models.word2vec.Vocab at 0x7f61aa54f748>,
 'feasible,': <gensim.models.word2vec.Vocab at 0x7f61aa54f780>,
 'YOU...': <gensim.models.word2vec.Vocab at 0x7f61aa54f7b8>,
 'trouble?': <gensim.models.word2vec.Vocab at 0x7f61aa54f7f0>,
 'me...appreciative': <gensim.models.word2vec.Vocab at 0x7f61aa4609e8>,
 'learner': <gensim.models.word2vec.Vocab at 0x7f61aa54f860>,
 'hours': <gensim.models.word2vec.Vocab at 0x7f61aa54f898>,
 'feast': <gensim.models.word2vec.Vocab at 0x7f61aa54f8d0>,
 'again!': <gensim.models.word2vec.Vocab at 0x7f61aa54f908>,
 'tip': <gensim.models.word2vec.Vocab at 0x7f61aa54f940>,
 'You...': <gensim.models.word2vec.Vocab at 0x7f61aa54f978>,
 'KNOW': <gensim.models.word2vec.Vocab at 0x7f61aa54f9b0>,
 'purple': <gensim.models.word2vec.Vocab at 0x7f61aa54f9e8>,
 'Dreams': <gensim.models.word2vec.Vocab at 0x7f61aa54fa20>,
 'here': <gensim.models.word2vec.Vocab at 0x7f61aa54fa58>,
 'accused': <gensim.models.word2vec.Vocab at 0x7f61aa54fa90>,
 'since': <gensim.models.word2vec.Vocab at 0x7f61aa54fac8>,
 'HATE': <gensim.models.word2vec.Vocab at 0x7f61aa54fb00>,
 'walk': <gensim.models.word2vec.Vocab at 0x7f61aa54fb38>,
 'outta': <gensim.models.word2vec.Vocab at 0x7f61aa54fb70>,
 'yet,': <gensim.models.word2vec.Vocab at 0x7f61aa54fba8>,
 "other...we're": <gensim.models.word2vec.Vocab at 0x7f61aa54fbe0>,
 'look': <gensim.models.word2vec.Vocab at 0x7f61aa54fc18>,
 ':-/': <gensim.models.word2vec.Vocab at 0x7f61aa54fc50>,
 'yet': <gensim.models.word2vec.Vocab at 0x7f61aa54fc88>,
 'background': <gensim.models.word2vec.Vocab at 0x7f61aa54fcc0>,
 'is.': <gensim.models.word2vec.Vocab at 0x7f61aa54fcf8>,
 'now...': <gensim.models.word2vec.Vocab at 0x7f61aa54fd68>,
 'grow': <gensim.models.word2vec.Vocab at 0x7f61aa54fda0>,
 'dough': <gensim.models.word2vec.Vocab at 0x7f61aa54fdd8>,
 'government,': <gensim.models.word2vec.Vocab at 0x7f61aa54fe10>,
 'okie...that': <gensim.models.word2vec.Vocab at 0x7f61aa54fe48>,
 'plan': <gensim.models.word2vec.Vocab at 0x7f61aa54fe80>,
 'ummm...': <gensim.models.word2vec.Vocab at 0x7f61aa54feb8>,
 'king....': <gensim.models.word2vec.Vocab at 0x7f61aa54fef0>,
 'Marianne': <gensim.models.word2vec.Vocab at 0x7f61aa54ff60>,
 'until': <gensim.models.word2vec.Vocab at 0x7f61aa54ff98>,
 'mashed': <gensim.models.word2vec.Vocab at 0x7f61aa5e1208>,
 'rain': <gensim.models.word2vec.Vocab at 0x7f61aa553048>,
 'freshman': <gensim.models.word2vec.Vocab at 0x7f61aa553080>,
 'calls': <gensim.models.word2vec.Vocab at 0x7f61aa5530b8>,
 "us...we're": <gensim.models.word2vec.Vocab at 0x7f61aa5530f0>,
 'Soviet': <gensim.models.word2vec.Vocab at 0x7f61aa553128>,
 'gears,': <gensim.models.word2vec.Vocab at 0x7f61aa553160>,
 'knife': <gensim.models.word2vec.Vocab at 0x7f61aa553198>,
 'Floods,': <gensim.models.word2vec.Vocab at 0x7f61aa5531d0>,
 '(and': <gensim.models.word2vec.Vocab at 0x7f61aa553208>,
 'America': <gensim.models.word2vec.Vocab at 0x7f61aa553240>,
 'shi,': <gensim.models.word2vec.Vocab at 0x7f61aa553278>,
 'considering': <gensim.models.word2vec.Vocab at 0x7f61aa5532b0>,
 'committed': <gensim.models.word2vec.Vocab at 0x7f61aa5532e8>,
 'situation,': <gensim.models.word2vec.Vocab at 0x7f61aa553320>,
 'stole': <gensim.models.word2vec.Vocab at 0x7f61aa553358>,
 'brushing': <gensim.models.word2vec.Vocab at 0x7f61aa553390>,
 'happily': <gensim.models.word2vec.Vocab at 0x7f61aa5533c8>,
 'hand': <gensim.models.word2vec.Vocab at 0x7f61aa553400>,
 'problem': <gensim.models.word2vec.Vocab at 0x7f61aa553438>,
 'us': <gensim.models.word2vec.Vocab at 0x7f61aa553470>,
 'color': <gensim.models.word2vec.Vocab at 0x7f61aa5534a8>,
 'barely': <gensim.models.word2vec.Vocab at 0x7f61aa558a90>,
 '2:': <gensim.models.word2vec.Vocab at 0x7f61aa553518>,
 'repetition.': <gensim.models.word2vec.Vocab at 0x7f61aa553550>,
 'ready': <gensim.models.word2vec.Vocab at 0x7f61aa553588>,
 'everynight,': <gensim.models.word2vec.Vocab at 0x7f61aa5535c0>,
 'brownies': <gensim.models.word2vec.Vocab at 0x7f61aa5364a8>,
 'freaked': <gensim.models.word2vec.Vocab at 0x7f61aa553630>,
 'medium.': <gensim.models.word2vec.Vocab at 0x7f61aa447048>,
 'IS': <gensim.models.word2vec.Vocab at 0x7f61aa5536a0>,
 'helps': <gensim.models.word2vec.Vocab at 0x7f61aa5536d8>,
 'sophie?': <gensim.models.word2vec.Vocab at 0x7f61aa553710>,
 '"Trust': <gensim.models.word2vec.Vocab at 0x7f61aa553748>,
 'Now,': <gensim.models.word2vec.Vocab at 0x7f61aa536748>,
 'tact': <gensim.models.word2vec.Vocab at 0x7f61aa5537b8>,
 'needs': <gensim.models.word2vec.Vocab at 0x7f61aa5537f0>,
 'uniter,': <gensim.models.word2vec.Vocab at 0x7f61aa553828>,
 'He': <gensim.models.word2vec.Vocab at 0x7f61aa553860>,
 'family)': <gensim.models.word2vec.Vocab at 0x7f61aa553898>,
 'again...or': <gensim.models.word2vec.Vocab at 0x7f61aa5538d0>,
 'hearts': <gensim.models.word2vec.Vocab at 0x7f61aa553908>,
 'react': <gensim.models.word2vec.Vocab at 0x7f61aa5e1400>,
 'Flogging': <gensim.models.word2vec.Vocab at 0x7f61aa553978>,
 'running,': <gensim.models.word2vec.Vocab at 0x7f61aa5539e8>,
 'razors': <gensim.models.word2vec.Vocab at 0x7f61aa4eaf28>,
 'rarely': <gensim.models.word2vec.Vocab at 0x7f61aa445630>,
 'daunted:': <gensim.models.word2vec.Vocab at 0x7f61aa553a90>,
 'very': <gensim.models.word2vec.Vocab at 0x7f61aa553ac8>,
 'around': <gensim.models.word2vec.Vocab at 0x7f61aa4ec4e0>,
 'except': <gensim.models.word2vec.Vocab at 0x7f61aa553b38>,
 'war,"': <gensim.models.word2vec.Vocab at 0x7f61aa553b70>,
 'become': <gensim.models.word2vec.Vocab at 0x7f61aa553ba8>,
 'know,,,': <gensim.models.word2vec.Vocab at 0x7f61aa553be0>,
 'asleep': <gensim.models.word2vec.Vocab at 0x7f61aa553c18>,
 'sad...that': <gensim.models.word2vec.Vocab at 0x7f61aa4ed668>,
 'of,': <gensim.models.word2vec.Vocab at 0x7f61aa553c88>,
 'week,': <gensim.models.word2vec.Vocab at 0x7f61aa553cc0>,
 'SATs...fuun...sux...but': <gensim.models.word2vec.Vocab at 0x7f61aa553cf8>,
 '...[should': <gensim.models.word2vec.Vocab at 0x7f61aa553d30>,
 'dropped': <gensim.models.word2vec.Vocab at 0x7f61aa4ee0b8>,
 'sure,': <gensim.models.word2vec.Vocab at 0x7f61aa553da0>,
 'cool.': <gensim.models.word2vec.Vocab at 0x7f61aa553dd8>,
 'jetlag': <gensim.models.word2vec.Vocab at 0x7f61aa553e10>,
 'fit.': <gensim.models.word2vec.Vocab at 0x7f61aa54b828>,
 'Arrogant:': <gensim.models.word2vec.Vocab at 0x7f61aa553e80>,
 'now?]:': <gensim.models.word2vec.Vocab at 0x7f61aa553eb8>,
 'objectives': <gensim.models.word2vec.Vocab at 0x7f61aa553ef0>,
 'me...they': <gensim.models.word2vec.Vocab at 0x7f61aa553f28>,
 'call': <gensim.models.word2vec.Vocab at 0x7f61aa553f60>,
 'Today': <gensim.models.word2vec.Vocab at 0x7f61aa53b240>,
 'checking': <gensim.models.word2vec.Vocab at 0x7f61aa53b400>,
 'tried': <gensim.models.word2vec.Vocab at 0x7f61aa554048>,
 'old,': <gensim.models.word2vec.Vocab at 0x7f61aa554080>,
 'glasses': <gensim.models.word2vec.Vocab at 0x7f61aa5540b8>,
 'bill': <gensim.models.word2vec.Vocab at 0x7f61aa5540f0>,
 'fourth,': <gensim.models.word2vec.Vocab at 0x7f61aa554128>,
 'better': <gensim.models.word2vec.Vocab at 0x7f61aa554160>,
 'ground': <gensim.models.word2vec.Vocab at 0x7f61aa554198>,
 'More': <gensim.models.word2vec.Vocab at 0x7f61aa5541d0>,
 'gameroom': <gensim.models.word2vec.Vocab at 0x7f61aa4e5588>,
 'above': <gensim.models.word2vec.Vocab at 0x7f61aa554240>,
 'eventful.': <gensim.models.word2vec.Vocab at 0x7f61aa554278>,
 'happen': <gensim.models.word2vec.Vocab at 0x7f61aa5542b0>,
 'Lazy': <gensim.models.word2vec.Vocab at 0x7f61aa5542e8>,
 'license': <gensim.models.word2vec.Vocab at 0x7f61aa4e8320>,
 'bleating': <gensim.models.word2vec.Vocab at 0x7f61aa554358>,
 'start.': <gensim.models.word2vec.Vocab at 0x7f61aa554390>,
 'will': <gensim.models.word2vec.Vocab at 0x7f61aa5543c8>,
 '?': <gensim.models.word2vec.Vocab at 0x7f61aa554400>,
 'napping': <gensim.models.word2vec.Vocab at 0x7f61aa554438>,
 'Better?': <gensim.models.word2vec.Vocab at 0x7f61aa554470>,
 'linoleum': <gensim.models.word2vec.Vocab at 0x7f61aa5544a8>,
 'SOMETHING!': <gensim.models.word2vec.Vocab at 0x7f61aa5544e0>,
 'sophie': <gensim.models.word2vec.Vocab at 0x7f61aa4d8828>,
 'reacts,': <gensim.models.word2vec.Vocab at 0x7f61aa554550>,
 'Car"': <gensim.models.word2vec.Vocab at 0x7f61aa554588>,
 'extinct.': <gensim.models.word2vec.Vocab at 0x7f61aa5e1550>,
 'knowin': <gensim.models.word2vec.Vocab at 0x7f61aa5545f8>,
 'looks': <gensim.models.word2vec.Vocab at 0x7f61aa554630>,
 'alex!': <gensim.models.word2vec.Vocab at 0x7f61aa554668>,
 'analyze': <gensim.models.word2vec.Vocab at 0x7f61aa5546a0>,
 'internet': <gensim.models.word2vec.Vocab at 0x7f61aa5546d8>,
 'am,': <gensim.models.word2vec.Vocab at 0x7f61aa554710>,
 "I'll": <gensim.models.word2vec.Vocab at 0x7f61aa554748>,
 'go:': <gensim.models.word2vec.Vocab at 0x7f61aa554780>,
 'hardest': <gensim.models.word2vec.Vocab at 0x7f61aa5547b8>,
 'bed:': <gensim.models.word2vec.Vocab at 0x7f61aa5547f0>,
 'tower!!': <gensim.models.word2vec.Vocab at 0x7f61aa554828>,
 '(analyze': <gensim.models.word2vec.Vocab at 0x7f61aa554860>,
 'Rice': <gensim.models.word2vec.Vocab at 0x7f61aa554898>,
 'bravest': <gensim.models.word2vec.Vocab at 0x7f61aa5548d0>,
 ...}
w2v.similarity('I', 'My')
0.082851942583535218
print(posts[5])
w2v.similarity('ring', 'husband')
I've tried starting blog after blog and it just never feels right.  Then I read today that it feels strange to most people, but the more you do it the better it gets (hmm, sounds suspiciously like something else!) so I decided to give it another try.    My husband bought me a notepad at  urlLink McNally  (the best bookstore in Western Canada) with that title and a picture of a 50s housewife grinning desperately.  Each page has something funny like "New curtains!  Hurrah!".  For some reason it struck me as absolutely hilarious and has stuck in my head ever since.  What were those women thinking?





0.037229111896779618
w2v.similarity('ring', 'housewife')
0.11547398696865138
w2v.similarity('women', 'housewife')  # Diversity friendly
-0.14627530812290576

Doc2Vec

The same technique of word2vec is extrapolated to documents. Here, we do everything done in word2vec + we vectorize the documents too

import numpy as np
# 0 for male, 1 for female
y_posts = np.concatenate((np.zeros(len(filtered_male_posts)),
                          np.ones(len(filtered_female_posts))))
len(y_posts)
4842

Convolutional Neural Networks for Sentence Classification

Train convolutional network for sentiment analysis.

Based on "Convolutional Neural Networks for Sentence Classification" by Yoon Kim http://arxiv.org/pdf/1408.5882v2.pdf

For 'CNN-non-static' gets to 82.1% after 61 epochs with following settings: embedding_dim = 20
filter_sizes = (3, 4) num_filters = 3 dropout_prob = (0.7, 0.8) hidden_dims = 100

For 'CNN-rand' gets to 78-79% after 7-8 epochs with following settings: embedding_dim = 20
filter_sizes = (3, 4) num_filters = 150 dropout_prob = (0.25, 0.5) hidden_dims = 150

For 'CNN-static' gets to 75.4% after 7 epochs with following settings: embedding_dim = 100
filter_sizes = (3, 4) num_filters = 150 dropout_prob = (0.25, 0.5) hidden_dims = 150

  • it turns out that such a small data set as "Movie reviews with one sentence per review" (Pang and Lee, 2005) requires much smaller network than the one introduced in the original article:
  • embedding dimension is only 20 (instead of 300; 'CNN-static' still requires ~100)
  • 2 filter sizes (instead of 3)
  • higher dropout probabilities and
  • 3 filters per filter size is enough for 'CNN-non-static' (instead of 100)
  • embedding initialization does not require prebuilt Google Word2Vec data. Training Word2Vec on the same "Movie reviews" data set is enough to achieve performance reported in the article (81.6%)

** Another distinct difference is slidind MaxPooling window of length=2 instead of MaxPooling over whole feature map as in the article

import numpy as np
import word_embedding
from word2vec import train_word2vec

from keras.models import Sequential, Model
from keras.layers import (Activation, Dense, Dropout, Embedding, 
                          Flatten, Input, 
                          Conv1D, MaxPooling1D)
from keras.layers.merge import Concatenate

np.random.seed(2)
Using gpu device 0: GeForce GTX 760 (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)
Using Theano backend.

Parameters

Model Variations. See Kim Yoon's Convolutional Neural Networks for Sentence Classification, Section 3 for detail.

model_variation = 'CNN-rand'  #  CNN-rand | CNN-non-static | CNN-static
print('Model variation is %s' % model_variation)
Model variation is CNN-rand
# Model Hyperparameters
sequence_length = 56
embedding_dim = 20          
filter_sizes = (3, 4)
num_filters = 150
dropout_prob = (0.25, 0.5)
hidden_dims = 150
# Training parameters
batch_size = 32
num_epochs = 100
val_split = 0.1
# Word2Vec parameters, see train_word2vec
min_word_count = 1  # Minimum word count                        
context = 10        # Context window size

Data Preparation

# Load data
print("Loading data...")
x, y, vocabulary, vocabulary_inv = word_embedding.load_data()

if model_variation=='CNN-non-static' or model_variation=='CNN-static':
    embedding_weights = train_word2vec(x, vocabulary_inv, 
                                       embedding_dim, min_word_count, 
                                       context)
    if model_variation=='CNN-static':
        x = embedding_weights[0][x]
elif model_variation=='CNN-rand':
    embedding_weights = None
else:
    raise ValueError('Unknown model variation')
Loading data...
# Shuffle data
shuffle_indices = np.random.permutation(np.arange(len(y)))
x_shuffled = x[shuffle_indices]
y_shuffled = y[shuffle_indices].argmax(axis=1)
print("Vocabulary Size: {:d}".format(len(vocabulary)))
Vocabulary Size: 18765

Building CNN Model

graph_in = Input(shape=(sequence_length, embedding_dim))
convs = []
for fsz in filter_sizes:
    conv = Conv1D(filters=num_filters,
                  filter_length=fsz,
                  padding='valid',
                  activation='relu',
                  strides=1)(graph_in)
    pool = MaxPooling1D(pool_length=2)(conv)
    flatten = Flatten()(pool)
    convs.append(flatten)

if len(filter_sizes)>1:
    out = Concatenate()(convs)
else:
    out = convs[0]

graph = Model(input=graph_in, output=out)

# main sequential model
model = Sequential()
if not model_variation=='CNN-static':
    model.add(Embedding(len(vocabulary), embedding_dim, input_length=sequence_length,
                        weights=embedding_weights))
model.add(Dropout(dropout_prob[0], input_shape=(sequence_length, embedding_dim)))
model.add(graph)
model.add(Dense(hidden_dims))
model.add(Dropout(dropout_prob[1]))
model.add(Activation('relu'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', 
              metrics=['accuracy'])

# Training model
# ==================================================
model.fit(x_shuffled, y_shuffled, batch_size=batch_size,
          nb_epoch=num_epochs, validation_split=val_split, verbose=2)
Train on 9595 samples, validate on 1067 samples
Epoch 1/100
1s - loss: 0.6516 - acc: 0.6005 - val_loss: 0.5692 - val_acc: 0.7151
Epoch 2/100
1s - loss: 0.4556 - acc: 0.7896 - val_loss: 0.5154 - val_acc: 0.7573
Epoch 3/100
1s - loss: 0.3556 - acc: 0.8532 - val_loss: 0.5050 - val_acc: 0.7816
Epoch 4/100
1s - loss: 0.2978 - acc: 0.8779 - val_loss: 0.5335 - val_acc: 0.7901
Epoch 5/100
1s - loss: 0.2599 - acc: 0.8972 - val_loss: 0.5592 - val_acc: 0.7769
Epoch 6/100
1s - loss: 0.2248 - acc: 0.9112 - val_loss: 0.5559 - val_acc: 0.7685
Epoch 7/100
1s - loss: 0.1994 - acc: 0.9219 - val_loss: 0.5760 - val_acc: 0.7704
Epoch 8/100
1s - loss: 0.1801 - acc: 0.9326 - val_loss: 0.6014 - val_acc: 0.7788
Epoch 9/100
1s - loss: 0.1472 - acc: 0.9449 - val_loss: 0.6637 - val_acc: 0.7751
Epoch 10/100
1s - loss: 0.1269 - acc: 0.9537 - val_loss: 0.7281 - val_acc: 0.7563
Epoch 11/100
1s - loss: 0.1123 - acc: 0.9592 - val_loss: 0.7452 - val_acc: 0.7788
Epoch 12/100
1s - loss: 0.0897 - acc: 0.9658 - val_loss: 0.8504 - val_acc: 0.7591
Epoch 13/100
1s - loss: 0.0811 - acc: 0.9723 - val_loss: 0.8935 - val_acc: 0.7573
Epoch 14/100
1s - loss: 0.0651 - acc: 0.9764 - val_loss: 0.8738 - val_acc: 0.7685
Epoch 15/100
1s - loss: 0.0540 - acc: 0.9809 - val_loss: 0.9407 - val_acc: 0.7648
Epoch 16/100
1s - loss: 0.0408 - acc: 0.9857 - val_loss: 1.1880 - val_acc: 0.7638
Epoch 17/100
1s - loss: 0.0341 - acc: 0.9886 - val_loss: 1.2878 - val_acc: 0.7638
Epoch 18/100
1s - loss: 0.0306 - acc: 0.9901 - val_loss: 1.4448 - val_acc: 0.7573
Epoch 19/100
1s - loss: 0.0276 - acc: 0.9917 - val_loss: 1.5300 - val_acc: 0.7591
Epoch 20/100
1s - loss: 0.0249 - acc: 0.9917 - val_loss: 1.4825 - val_acc: 0.7666
Epoch 21/100
1s - loss: 0.0220 - acc: 0.9937 - val_loss: 1.4357 - val_acc: 0.7601
Epoch 22/100
1s - loss: 0.0188 - acc: 0.9945 - val_loss: 1.4081 - val_acc: 0.7657
Epoch 23/100
1s - loss: 0.0182 - acc: 0.9954 - val_loss: 1.7145 - val_acc: 0.7610
Epoch 24/100
1s - loss: 0.0129 - acc: 0.9964 - val_loss: 1.7047 - val_acc: 0.7704
Epoch 25/100
1s - loss: 0.0064 - acc: 0.9981 - val_loss: 1.9119 - val_acc: 0.7629
Epoch 26/100
1s - loss: 0.0108 - acc: 0.9969 - val_loss: 1.8306 - val_acc: 0.7704
Epoch 27/100
1s - loss: 0.0105 - acc: 0.9973 - val_loss: 1.9624 - val_acc: 0.7619
Epoch 28/100
1s - loss: 0.0112 - acc: 0.9973 - val_loss: 1.8552 - val_acc: 0.7694
Epoch 29/100
1s - loss: 0.0110 - acc: 0.9968 - val_loss: 1.8585 - val_acc: 0.7657
Epoch 30/100
1s - loss: 0.0071 - acc: 0.9983 - val_loss: 2.0571 - val_acc: 0.7694
Epoch 31/100
1s - loss: 0.0089 - acc: 0.9975 - val_loss: 2.0361 - val_acc: 0.7629
Epoch 32/100
1s - loss: 0.0074 - acc: 0.9978 - val_loss: 2.0010 - val_acc: 0.7648
Epoch 33/100
1s - loss: 0.0074 - acc: 0.9981 - val_loss: 2.0995 - val_acc: 0.7498
Epoch 34/100
1s - loss: 0.0125 - acc: 0.9971 - val_loss: 2.2003 - val_acc: 0.7610
Epoch 35/100
1s - loss: 0.0074 - acc: 0.9981 - val_loss: 2.1526 - val_acc: 0.7582
Epoch 36/100
1s - loss: 0.0068 - acc: 0.9984 - val_loss: 2.1754 - val_acc: 0.7648
Epoch 37/100
1s - loss: 0.0065 - acc: 0.9979 - val_loss: 2.0810 - val_acc: 0.7498
Epoch 38/100
1s - loss: 0.0078 - acc: 0.9980 - val_loss: 2.3443 - val_acc: 0.7460
Epoch 39/100
1s - loss: 0.0038 - acc: 0.9991 - val_loss: 2.1696 - val_acc: 0.7629
Epoch 40/100
1s - loss: 0.0062 - acc: 0.9985 - val_loss: 2.2752 - val_acc: 0.7545
Epoch 41/100
1s - loss: 0.0044 - acc: 0.9985 - val_loss: 2.3457 - val_acc: 0.7535
Epoch 42/100
1s - loss: 0.0066 - acc: 0.9985 - val_loss: 2.1172 - val_acc: 0.7629
Epoch 43/100
1s - loss: 0.0052 - acc: 0.9987 - val_loss: 2.3550 - val_acc: 0.7619
Epoch 44/100
1s - loss: 0.0024 - acc: 0.9993 - val_loss: 2.3832 - val_acc: 0.7610
Epoch 45/100
1s - loss: 0.0042 - acc: 0.9989 - val_loss: 2.4242 - val_acc: 0.7648
Epoch 46/100
1s - loss: 0.0048 - acc: 0.9990 - val_loss: 2.4529 - val_acc: 0.7563
Epoch 47/100
1s - loss: 0.0036 - acc: 0.9994 - val_loss: 2.8412 - val_acc: 0.7282
Epoch 48/100
1s - loss: 0.0037 - acc: 0.9991 - val_loss: 2.4515 - val_acc: 0.7619
Epoch 49/100
1s - loss: 0.0031 - acc: 0.9991 - val_loss: 2.4849 - val_acc: 0.7676
Epoch 50/100
1s - loss: 0.0078 - acc: 0.9990 - val_loss: 2.5083 - val_acc: 0.7563
Epoch 51/100
1s - loss: 0.0105 - acc: 0.9981 - val_loss: 2.3538 - val_acc: 0.7601
Epoch 52/100
1s - loss: 0.0076 - acc: 0.9986 - val_loss: 2.4405 - val_acc: 0.7685
Epoch 53/100
1s - loss: 0.0043 - acc: 0.9991 - val_loss: 2.5753 - val_acc: 0.7591
Epoch 54/100
1s - loss: 0.0044 - acc: 0.9989 - val_loss: 2.5550 - val_acc: 0.7582
Epoch 55/100
1s - loss: 0.0034 - acc: 0.9994 - val_loss: 2.6361 - val_acc: 0.7591
Epoch 56/100
1s - loss: 0.0041 - acc: 0.9994 - val_loss: 2.6753 - val_acc: 0.7563
Epoch 57/100
1s - loss: 0.0042 - acc: 0.9990 - val_loss: 2.6464 - val_acc: 0.7601
Epoch 58/100
1s - loss: 0.0037 - acc: 0.9992 - val_loss: 2.6616 - val_acc: 0.7582
Epoch 59/100
1s - loss: 0.0060 - acc: 0.9990 - val_loss: 2.6052 - val_acc: 0.7619
Epoch 60/100
1s - loss: 0.0051 - acc: 0.9990 - val_loss: 2.7033 - val_acc: 0.7498
Epoch 61/100
1s - loss: 0.0034 - acc: 0.9994 - val_loss: 2.7142 - val_acc: 0.7526
Epoch 62/100
1s - loss: 0.0047 - acc: 0.9994 - val_loss: 2.7656 - val_acc: 0.7591
Epoch 63/100
1s - loss: 0.0083 - acc: 0.9990 - val_loss: 2.7971 - val_acc: 0.7526
Epoch 64/100
1s - loss: 0.0046 - acc: 0.9992 - val_loss: 2.6585 - val_acc: 0.7545
Epoch 65/100
1s - loss: 0.0062 - acc: 0.9989 - val_loss: 2.6194 - val_acc: 0.7535
Epoch 66/100
1s - loss: 0.0062 - acc: 0.9993 - val_loss: 2.6255 - val_acc: 0.7694
Epoch 67/100
1s - loss: 0.0036 - acc: 0.9990 - val_loss: 2.6384 - val_acc: 0.7582
Epoch 68/100
1s - loss: 0.0066 - acc: 0.9991 - val_loss: 2.6743 - val_acc: 0.7648
Epoch 69/100
1s - loss: 0.0030 - acc: 0.9995 - val_loss: 2.8236 - val_acc: 0.7535
Epoch 70/100
1s - loss: 0.0048 - acc: 0.9993 - val_loss: 2.7829 - val_acc: 0.7610
Epoch 71/100
1s - loss: 0.0062 - acc: 0.9990 - val_loss: 2.6402 - val_acc: 0.7573
Epoch 72/100
1s - loss: 0.0037 - acc: 0.9992 - val_loss: 2.9089 - val_acc: 0.7526
Epoch 73/100
1s - loss: 0.0069 - acc: 0.9985 - val_loss: 2.7071 - val_acc: 0.7535
Epoch 74/100
1s - loss: 0.0033 - acc: 0.9995 - val_loss: 2.6727 - val_acc: 0.7601
Epoch 75/100
1s - loss: 0.0069 - acc: 0.9990 - val_loss: 2.6967 - val_acc: 0.7601
Epoch 76/100
1s - loss: 0.0089 - acc: 0.9989 - val_loss: 2.7479 - val_acc: 0.7666
Epoch 77/100
1s - loss: 0.0046 - acc: 0.9994 - val_loss: 2.7192 - val_acc: 0.7629
Epoch 78/100
1s - loss: 0.0069 - acc: 0.9989 - val_loss: 2.7173 - val_acc: 0.7629
Epoch 79/100
1s - loss: 8.6550e-04 - acc: 0.9998 - val_loss: 2.7283 - val_acc: 0.7601
Epoch 80/100
1s - loss: 0.0011 - acc: 0.9995 - val_loss: 2.8405 - val_acc: 0.7629
Epoch 81/100
1s - loss: 0.0040 - acc: 0.9994 - val_loss: 2.8725 - val_acc: 0.7619
Epoch 82/100
1s - loss: 0.0055 - acc: 0.9992 - val_loss: 2.8490 - val_acc: 0.7601
Epoch 83/100
1s - loss: 0.0059 - acc: 0.9989 - val_loss: 2.7838 - val_acc: 0.7545
Epoch 84/100
1s - loss: 0.0054 - acc: 0.9994 - val_loss: 2.8706 - val_acc: 0.7526
Epoch 85/100
1s - loss: 0.0060 - acc: 0.9992 - val_loss: 2.9374 - val_acc: 0.7516
Epoch 86/100
1s - loss: 0.0087 - acc: 0.9982 - val_loss: 2.7966 - val_acc: 0.7573
Epoch 87/100
1s - loss: 0.0084 - acc: 0.9991 - val_loss: 2.8620 - val_acc: 0.7619
Epoch 88/100
1s - loss: 0.0053 - acc: 0.9990 - val_loss: 2.8450 - val_acc: 0.7601
Epoch 89/100
1s - loss: 0.0054 - acc: 0.9990 - val_loss: 2.8303 - val_acc: 0.7629
Epoch 90/100
1s - loss: 0.0073 - acc: 0.9991 - val_loss: 2.8474 - val_acc: 0.7657
Epoch 91/100
1s - loss: 0.0037 - acc: 0.9994 - val_loss: 3.0151 - val_acc: 0.7432
Epoch 92/100
1s - loss: 0.0017 - acc: 0.9999 - val_loss: 2.9555 - val_acc: 0.7582
Epoch 93/100
1s - loss: 0.0080 - acc: 0.9991 - val_loss: 2.9178 - val_acc: 0.7554
Epoch 94/100
1s - loss: 0.0078 - acc: 0.9991 - val_loss: 2.8724 - val_acc: 0.7582
Epoch 95/100
1s - loss: 0.0012 - acc: 0.9997 - val_loss: 2.9582 - val_acc: 0.7545
Epoch 96/100
1s - loss: 0.0058 - acc: 0.9989 - val_loss: 2.8944 - val_acc: 0.7479
Epoch 97/100
1s - loss: 0.0094 - acc: 0.9985 - val_loss: 2.7146 - val_acc: 0.7516
Epoch 98/100
1s - loss: 0.0044 - acc: 0.9993 - val_loss: 2.9052 - val_acc: 0.7498
Epoch 99/100
1s - loss: 0.0030 - acc: 0.9995 - val_loss: 3.1474 - val_acc: 0.7470
Epoch 100/100
1s - loss: 0.0051 - acc: 0.9990 - val_loss: 3.1746 - val_acc: 0.7451





<keras.callbacks.History at 0x7f78362ae400>

Another Example

Using Keras + GloVe - Global Vectors for Word Representation

results matching ""

    No results matching ""