classickerop.blogg.se - Word cloud python jupyter notebook

#WORD CLOUD PYTHON JUPYTER NOTEBOOK FULL#
#WORD CLOUD PYTHON JUPYTER NOTEBOOK CODE#
#WORD CLOUD PYTHON JUPYTER NOTEBOOK LICENSE#

While there are more effective ways to visualize word frequencies, word clouds can be beautiful.

#WORD CLOUD PYTHON JUPYTER NOTEBOOK CODE#

Thanks to the plethora of awesome Python packages it requires just a few lines of code to generate a shaped word cloud like the one above. Plt.annotate(footer, xy=(0, -.025), xycoords='axes fraction', fontsize=infosize, color=fontcolor) Plt.title(title, color=fontcolor, size=30, y=1.01) Plt.imshow(wordcloud.recolor(color_func=grey_color, random_state=3)) The remaining statements set the title, footer text, turn off the axis grid and eventually show the image. To achieve the custom font coloring imshow is passed the return value of the recolor method of the wordcloud object.

In the final code block, the matplotlib figure is set up. Mask=imread('img/sherlock-holmes-silhouette.png'), The silhouette in the mask image is black and the background must be white, not transparent. Now we can set up the wordcloud object passing our custom variables, which include the list of stopwords and the image to use as a mask to shape the graphic. With open('data/literature/complete-sherlock-holmes-canon.txt') as f:

def grey_color(word, font_size, position, orientation, random_state=None, **kwargs): Notice that the text is transformed to lowercase to make the frequency calculation done by the wordcloud package case insensitive.

#WORD CLOUD PYTHON JUPYTER NOTEBOOK LICENSE#

This is the ASCII version of the complete works without the table of contents at the beginning and the license note at the end of the downloaded file. Next the whole text is loaded into the variable text. By decreasing the lightness value darker shades of grey will be used to render the words. The function grey_color defined below is used for coloring the words and slightly modified from the example given in the wordcloud documentation.

Data: The Complete Sherlock Holmes - sherlock-holm.es/ascii/'įooter = 'The '.format(limit, chartinfo)įont = '/usr/share/fonts/truetype/ubuntu-font-family/Ubuntu-B.ttf'Įnglish_stopwords = set(stopwords.words('english')) | STOPWORDS | ENGLISH_STOP_WORDS.

Title = 'Most frequent words in the canon of Sherlock Holmes'Ĭhartinfo = 'Author: Ramiro Gómez. %matplotlib inlineįrom sklearn.feature_extraction.stop_words import ENGLISH_STOP_WORDSįrom wordcloud import WordCloud, STOPWORDS For the latter I combine the stopwords sets provided by the scikit-learn, nltk and wordlcloud packages to get a more comprehensive set. Setupįirst we load the required libraries, set the plotting style, display variables and the list of stopwords to exclude from the word cloud. The word cloud is shaped using this Sherlock Holmes silhouette as a mask image and the words will be rendered in different shades of grey using a custom color function. In this notebook I show how you can create a word cloud from the texts of these works using Python and several libraries, most importantly the wordlcloud package.

#WORD CLOUD PYTHON JUPYTER NOTEBOOK FULL#

The full text of these works, which is in the public domain in Europe, can be downloaded from the website sherlock-holm.es. The canon of Sherlock Holmes consists of the 56 short stories and 4 novels written by Sir Arthur Conan Doyle. Word Cloud of the Most Frequent Words in the Canon of Sherlock Holmes