Sentement Analysis of Books Reviews Using IBM Watson
(2019)

Introduction:

This project was made with the guidance of Dr. Yael Netzer from 'Ben Gurion' university, as a final project of the course 'Advance Topics in Digital Humanities'

The project divided to two parts:
- Goodreads reviews Analysis
- New York Times reviews Analysis

It is worth noticing that the number of reviews i got from Goodreads, is tremendously higher than those from New York Times.

In this project i used IBM Watson AI to discover number of things:
- How accurate IBM Watson is?, and whether or not its analysis ability is trustful.
- How, generally, people review books (in terms of sentement), and how much their ratings (represented by stars) can affect their reviews.
- Do they write what they really meant? how much their reviews describe their ratings?
- Sentement analysis by genre, which genres the reviewers prefer the most.
- How NYT reviewers review books published in different decades? in which decade they reviewed the most?
Note:
Because of limited technical and time capabilities, i couldn't manage to get all the data (mostly reviews) from both sites (New York Times and Goodreads), so the data in this project represent a small portion of the data provided by both sites:
- Goodreads: 54712 reviews
- New York Times: 1536 reviews



Goodreads Analysis



Graph Explanation and Conclusions:
We will discuss the graph above through two prospectives:
IBM Watson analysis ability: We can see that Watson's ability to apply sentement analysis on reviews, is pretty impressive. the results are making sense for the human mind, the positive-negative ratio keeps increasing, in other words, from 'one star' rating to 'five stars' rating, the 'Positive' bar increases gradually, while the 'Negative' bar decreases ,also, gradually.
Peoples' reviewing habits: Which we can break into two main habits
* By looking at the 'OneStar' colliction of bars and the 'FiveStars' colliction of bars, and comparing the RATIO between the 'Positive' bar and the 'Negative' bar in both collections we can see that the ratio in the 'FiveStars' bar is BIGGER than the one in the 'OneStar' collection. The reason in my opinion is that the reviewer tends to be amazed by the book, so he writes just about what is good about it (see for example 'FourStars' collection, although the 'Negative' reviews are nearly at the same number, but the 'Positive' are alot different). in the other hand, if the reviewer gives 'one star' for the book, in his review he may mention some few good words, not necessarly about the book but maybe about the author (e.g. "...i read a great books wrote by this auther, but..." ...)
* Although the data i collected considered just a sample, but we can see that the people are tend to rate 'five stars'.Generally, they avoid to rate 'one star'.



Genres Analysis
Genre Positive Reviews Neutral Reviews Negative Reviews Reviews Count
Genre Positive Reviews Neutral Reviews Negative Reviews Reviews Count
Table Explanation:

The table above shows the genres of the books that the reviews related to them.The table is interactable, the we can sort by any parameter appears as column and even search. There are 62 genres, for every genre the table provides the number of reviews for it, and a portion of positive, neutral, or negative reviews. If the portion of the positive review is higher than the negative ones, it is indicated in green, if the negative is higher it is indicated in red, if both are equal, both are indicated in yellow.



New York Times Analysis





Graph Explanation and Conclusions:
The above graph shows the NYT reviews of the books published in different decates, the 21th century and later.
From the graph we can conclude two main things:
- Earlier decades, more reviews: In other words, NYT reviewer review more books published in recent decades than those before.
- Books published recently, have a big chance to not having a good review: as we can see, in the'2010s' the ratio between the negative reviews and the positive once, are significantly bigger than all of the previous decades.
Books Analysis
Book Name NYT Review Sentement Goodreads Average Rating Goodreads Ratings Count
Book Name Sentement Goodreads Average Rating Goodreads Ratings Count
Table Explanation:

The table above is more specified, it shows every book, the NYT sentement, the Goodreads average rating, and the ratings count. In addition to the interactable elements that the previous table has (searching and sorting) in this table you can press '+' to show the link to the NYT review and the Goodreads book page. The sentements are marked in 'green' if it is positive, 'yellow' if it is neutral , and 'red' if it is negative.

You can get the resource files from here:
File 1
File 2