Foundation AI helps a large appliance manufacturer analyze customer reviews to identify product issues and sought-after features

Foundation AI is an Artificial Intelligence Solutions Provider. We help organizations process, manage, and leverage their unstructured data to automate labor-intensive tasks, make better data-driven decisions, and drive real business value.


Goals

  • To analyze the topic of each review segment to classify the segment by feature or issue.
  • To identify which part of the product is being talked about.
  • To analyze the sentiment of the review segment to identify if the part, feature, or issue is being talked about positively or negatively.
  • To present aggregated results in a descriptive dashboard indicating what features customers value and what product issues drive dissatisfaction.

Approach

  • Used Natural Language Processing (NLP) techniques to divide each review into individual sections (chunks) based on topic.
  • Constructed parts list for each SKU to aid in identifying the part being discussed in each review chunk.
  • Manually labeled review chunk data with topics and sentiment and used it to train XGBoost Models.

Results

  • Topic/Issue Modeling
    • Accuracy - 86%

    • F1 score - 87%

    • Precision - 92%

    • Recall - 83%

  • Topic/Issue Modeling
    • Accuracy - 88%

    • F1 score - 88%

    • Precision - 89%

    • Recall - 87%

  • We are currently labeling additional data across a broader subset of SKUs, which should improve the classification output.

Background

In the highly competitive world of appliance sales, customer feedback is everything. Before the internet, appliance manufacturers would measure a product’s popularity solely based on sales and would make educated guesses as to what features customers sought. Now, with the availability of online reviews, these companies are drowning in a sea of data. Their customers are telling them what products they like and why. They are explaining what features they value and what issues they have—why they make a purchasing decision.

Processing all of these reviews manually is extremely time-consuming. Using star ratings as a short-cut misses out on a lot of valuable information. Many appliance manufacturers aren’t using most of the data they have available.

Challenge

This large appliance manufacturer approached Foundation AI to customize our Extract Language Platform to help them extract insights from their own product and competing product reviews.

They wanted to use the Extract Language Platform to:

  • Analyze the topic of each review segment to classify the segment by feature or issue.

  • Identify which part of the product is being talked about.

  • Analyze the sentiment of the review segment to identify if the part, feature, or issue is being talked about positively or negatively.

  • Present aggregated results in a descriptive dashboard indicating what features customers value and what product issues drive dissatisfaction.

Solution

Data Used

This appliance manufacturer routinely scrapes product reviews for their own products and competing products from e-commerce sites like Lowes and The Home Depot. To supplement this collection of reviews, we scrape additional reviews on competing products. In its entirety, we aggregate data for 6 companies: Bosch, Electrolux, Haier, LG, Samsung, and Whirlpool. There are close to 4,000 SKUs for these 6 companies.

We used the following fields to build our NLP model:

  • Review & Chunk ID (Unique identifier) - This helps in distinguishing one record/row from another.

  • Review Chunk (Part of the review text) - Each review can contain multiple topics/issues. We split the review text into individual chunks, which each deal with only one topic or issue.

  • Topic Bucket - Groups similar issues from the reviews into categories. Examples include build quality, performance, and spaciousness.

  • Sentiment - Sentiment (Positive/Negative) of the review chunk.

Example Review Classification:

Review & Chunk ID Review Text Review Chunk Topic Bucket Review Chunk Sentiment
12345_001 Love how the drawers have tons of space. Opens and closes so smoothly. Love how the drawers have tons of space. Spaciousness Positive
12345_002 Love how the drawers have tons of space. Opens and closes so smoothly. Opens and closes so smoothly. Functioning Positive

Methodology

In order for the client to draw meaningful insights from the consumer reviews, we have to divide each review into discrete chunks, identify which part of the product is being talked about, classify the topic or issue, and predict the sentiment. Once we’ve done this, we can then conduct a variety of analyses like which products have the highest positive sentiment, what are the top issues that consumers are talking about, which areas is the company doing well compared to competitors, and where the company can improve.

To initially train the models, we collected a dataset of historical consumer reviews and tagged a subset of these reviews with Topic and Sentiment. Each review has a Review Date (date of posting), SKU/Item ID, Consumer Rating, Review Text, Review ID. Since each review can contain multiple topics (E.g.: Surface is easily scratchable, and the drum has less capacity), we split the review into multiple chunks. To do this we first needed to identify what “part” of a product (rack, door, buttons, etc.) the customer was talking about in their review. We worked with the Appliance Manufacturer’s product team to create an exhaustive list of parts for each SKU.

If the code finds 2 “parts” mentioned in a review, we divide the review by conjunctions (and, but), commas, and sentence stoppers (“?”, “!”, “.”). We then check to see which section contains a “part”. Segments that contain a “part” are extracted as a Review Chunk to which we pair a unique Chuck ID. Example - “The dryer works well but the drum has less capacity” would become two chunks: “The dryer works well” and “The drum has less capacity.”

There are instances however where this approach breaks down. The first is when two “parts” are modified by a single description. Example - The ice maker and dispenser are bad. Using our default approach, this would be split into two parts “The ice maker” and “dispenser are bad.” This is obviously less than ideal because in chunking this way, we lose contextual information about the ice maker. To handle cases like this, we test to see if a chunk contains both a “part” and a description. If this is false, we add the common description to both parts. As a result, we would get the following two chunks: “The ice maker is bad” and “the dispenser is bad.” The other edge case is when a “part” is referred to with a pronoun in a subsequent chunk. For example: “The Door looks classy. It also looks premium.” In this case, we scan for impersonal pronouns and use Natural Language Processing to substitute the correct subject or object for the pronoun. In this case, the chunks would be: “The Door looks classy” and “The Door also looks premium.”

We categorized each Review Chunk into a Topic Bucket (Build Quality, Performance, etc.) and tagged its Sentiment (Positive, Negative) to train the models. Topics are inherently neutral and can be talked about either positively or negatively. For instance, the Build Quality (Neutral Topic) has Flimsy (Negative) and Sturdy (Positive) subcategories. We did this in order to reduce the number of items the model has to classify.

Once we had finished tagging our training data, we used it to train a variety of models including Random Forest, Light GBM, Logistic Regression, Naive Bayes, and XGBoost. XGBoost outperformed the other approaches for both topic modeling and sentiment analysis. Now that the model has been trained, it can be used to do topic modeling and sentiment analysis on new reviews as they are scraped.

We present the output from processed reviews in descriptive dashboards, using visualizations like sentiment trend charts, competitor analysis, and internal product topic analysis.

Results

We tested the Classification models used for Topics/Issues and Sentiment against our labeled data set to evaluate their accuracy:

Metric Topics/Issues Sentiment
Accuracy 86% 88%
F1 score 87% 88%
Recall 83% 87%

For the initial deployment of this project, we only labeled data for a subset of SKUs. Some issues and topics were poorly represented across these SKUs. We are currently labeling data across a broader subset of SKUs, which should improve the classification output for these topics.

We are also currently working to improve the recognition of parts based on function (drying, heating, spinning, etc.) when they aren’t explicitly mentioned.

If you are interested in deploying a solution built on our Extract Language Platform, contact us to see what Foundation AI can do for you.
Artificial Intelligence for the Real World
© 2021 Foundation AI