Python NLP Bigrams - Geekscoders

Python NLP

About Lesson

In this Python NLP lesson we are going to learn about Python NLP Bigrams, so bigrams are two consecutive words that occurred in a text. or a bigram is a sequence of two adjacent elements from a string of tokens, which are typically letters, or words like rain bow, john doe, heavy rain.

Now let’s create an example.

from nltk.corpus import webtext, stopwords
from nltk import bigrams
from nltk.probability import FreqDist

text_data = webtext.words('grail.txt')


stop_words = set(stopwords.words('english'))


filtered_words = []

for word in text_data:
    if word not  in stop_words:
        if len(word) > 3:
            filtered_words.append(word)



#now we are going to use bigrams for this

bigram = bigrams(filtered_words)

freq_dist = FreqDist(bigram)

from nltk.corpus import webtext, stopwords

from nltk import bigrams

from nltk.probability import FreqDist

text_data = webtext.words('grail.txt')

stop_words = set(stopwords.words('english'))

filtered_words = []

for word in text_data:

if word not in stop_words:

if len(word) > 3:

filtered_words.append(word)

#now we are going to use bigrams for this

bigram = bigrams(filtered_words)

freq_dist = FreqDist(bigram)

So now if you run this code, you will receive this result, you can see that we have two consecutive words.

FreqDist({('BLACK', 'KNIGHT'): 32, ('HEAD', 'KNIGHT'): 
29, ('clop', 'clop'): 26, ('Hello', 'Hello'): 
22, ('FRENCH', 'GUARD'): 21, ('mumble', 'mumble'): 
20, ('ARTHUR', 'What'): 19, ('witch', 'witch'): 
19, ('Burn', 'Burn'): 19, ('Holy', 'Grail'): 19, ...})

FreqDist({('BLACK', 'KNIGHT'): 32, ('HEAD', 'KNIGHT'):

29, ('clop', 'clop'): 26, ('Hello', 'Hello'):

22, ('FRENCH', 'GUARD'): 21, ('mumble', 'mumble'):

20, ('ARTHUR', 'What'): 19, ('witch', 'witch'):

19, ('Burn', 'Burn'): 19, ('Holy', 'Grail'): 19, ...})

You can print the most 10 commons words.

print(freq_dist.most_common(10))

1	print(freq_dist.most_common(10))

Also you can plot the frequency distribution using this code.

freq_dist.plot(10)

1	freq_dist.plot(10)

This will be the result.