project

Customer Review

Sentiment analysis

Amazon's e-commerce platform is one of the biggest online shopping platforms globally.

Considering its large customer base with an unprecedented amount of day-to-day data on user experience, trying to understand the main message being conveyed from the ocean of user reviews could be a very daunting task. Sentiment analysis or opinion mining could be of help, as it provides an analytical procedure for extracting the hidden polarities in a given body of text. It offers a powerful tool for detecting the nature of opinion reflected in documents, websites, social media feeds, etc.

This project considers a subset of data on Amazon’s kindle store reviews available on Kaggle. Altogether we used a total of 10000 entries with two columns containing the reviewers’ numeric rating and the review texts. Our sentiment analysis was conducted in R mainly with the tidytext package and other relevant packages.

 

Pre-processing:

First, the corpus of interest was subjected to thorough cleansing using the standard pre-processing functions. For instance, the character vectors of the kindle review were turned into a text source using VectorSource() and were subsequently turned into a corpus using VCorpus(). With the help of cleaning functions like removePunctuation(), stripWhitespace() and replace_abbreviation(), the unwanted characters in the corpus were all removed. The stemDocument() function was further used to remove all near-duplicate words.

 

Sentiment scores:

After applying the tidy operation on the corpus such that each row contains a single word, the polarity scores of the unique words in the corpus were obtained. These were visualized based on their corresponding positive and negative scores (see Fig. 1 & 2). As observed, the reviews seem dominated by positive sentiments.

 

PolarityScore.png

 

PolarityScore.png

 

Chronological Polarity:

To understand the pattern of customers’ sentiments over time, a plot of the chronological polarities of the customer’s reviews was obtained. This was achieved with the GAM smoothing method in the ggplot2 package. As observed in Figure 3, though predominantly positive, the sentiment tracking of reviews made a dip at some points. The later reviews were nonetheless on increasing positive sentiments.

 

PolarityScore.png

 

Lexical analyzer:

We further employed the “bing”, “afnn” and “nrc” lexicons to implement filter() over the texts. Bing dichotomises sentiments into positive and negative sentiments. Words in the corpus with strong positive (green) and negative (red) polarities are visualized in a word cloud (Figure 4). Some top positive Words include: love, read, like, good, book, great, enjoy, story and others.

 

PolarityScore.png

 

The afinn lexicon captures sentiments based on numeric values from 5 to -5. Obtained results are shown in Figure 5, with affin scores grouped based on the numerical ratings in reviews (1 to 5). As expected, poor ratings are averagely associated with negative words, while good reviews correspond with positive words.

 

PolarityScore.png

 

The nrc lexicon labels words across multiple emotional states. In this particular instance, words in the review were tagged using the Plutchik’s wheel of emotion. These are visualized in figures 6, 7 and 8. As observed, anticipation, joy and trust were the strongest sentiments in the review.

 

PolarityScore.png

 

PolarityScore.png

 

PolarityScore.png

 

Final thoughts:

In summary, customer reviews of the Amazon kindle store category yielded impressively positive results. Words such as, love, like, good, enjoy, great and many more dominated the reviews. Moreover, the top positive words in the Plutchik’s wheel of emotion include anticipation, joy and trust. Nevertheless, the chronological polarities of reviewers’ sentiments shows some levels of volatility over time . Finally, it was seen that lower numerical ratings were associated with negative sentiments, while higher ratings were associated with positive sentiments.

You are welcome to contact DataXotic for questions or extra materials on this project.

 

Back To Project