⇲ Implement & Scale
DATA STRATEGY
levi-stute-PuuP2OEYqWk-unsplash-2
A startup in digital health trained a risk model to open up a robust, precise, and scalable processing pipeline so providers could move faster, and patients could move with confidence after spinal surgery. 
Read the Case Study ⇢ 

 

    PREDICTIVE ANALYTICS
    carli-jeen-15YDf39RIVc-unsplash-1
    Thwart errors, relieve in-take form exhaustion, and build a more accurate data picture for patients in chronic pain? Those who prefer the natural albeit comprehensive path to health and wellness said: sign me up. 
    Read the Case Study ⇢ 

     

      MACHINE VISION
      kristopher-roller-PC_lbSSxCZE-unsplash-1
      Using a dynamic machine vision solution for detecting plaques in the carotid artery and providing care teams with rapid answers, saves lives with early disease detection and monitoring. 
      Read the Case Study ⇢ 

       

        INTELLIGENT AUTOMATION
        man-wong-aSERflF331A-unsplash (1)-1
        This global law firm needed to be fast, adaptive, and provide unrivaled client service under pressure, intelligent automation did just that plus it made time for what matters most: meaningful human interactions. 
        Read the Case Study ⇢ 

         

          strvnge-films-P_SSMIgqjY0-unsplash-2-1-1

          Mushrooms, Goats, and Machine Learning: What do they all have in common? You may never know unless you get started exploring the fundamentals of Machine Learning with Dr. Tim Oates, Synaptiq's Chief Data Scientist. You can read and visualize his new book in Python, tinker with inputs, and practice machine learning techniques for free. 

          Start Chapter 1 Now ⇢ 

           

            How Should My Company Prioritize AIQ™ Capabilities?

             

               

               

               

              Start With Your AIQ Score

                9 min read

                Customer Review Sentiment Analysis: A Business Case for Tokenization

                Featured Image
                 

                Sentiment analysis is a must-have for organizations with a business-to-consumer (B2C) business model. This natural language processing technique can be used to discern the sentiment of customer reviews, revealing valuable insights that would otherwise be lost in a sea of unstructured data. Such insights are key for B2C organizations striving to understand customer stories and remain attuned to consumer needs.

                Tokenization is a fundamental step in sentiment analysis. It is the process of splitting a single piece of text into multiple smaller units (tokens) for processing. Let’s explore how B2C organizations can use tokenization with a practical business case: performing sentiment analysis on customer reviews.

                Note: "Tokenization" can also refer to a process in which sensitive data is substituted with a unique non-sensitive equivalent, called a token. We’ll cover that kind of tokenization in a future blog post. ;)

                Inquire About Sentiment Analysis for Your Business

                Sourcing a Suitable Dataset

                Our first step is to gather data suitable for tokenization. We’ve procured an open-source dataset from the online data science community Kaggle that contains about one million customer reviews of Sephora skincare products collected via web scraping. We’ve narrowed down our dataset by filtering for customer reviews specifically related to the product "Lip Sleeping Mask Intense Hydration with Vitamin C," resulting in a subset of 199 customer reviews.

                Learn More About Web Scraping From Our Chief Data Scientist

                Tokenizing the Customer Reviews

                Our next step is to tokenize the data we’ve gathered. Tokenization entails dividing a single piece of text (in our case, a customer review) into smaller pieces, or "tokens." Tokens can range in size from whole words to granular units like subword pieces, which are generated by progressively complicated methods such as Byte-Pair Encoding.

                For the sake of simplicity, let’s settle on word-level tokenization. We've used the Natural Language Toolkit or "NLTK" — a popular Python package for natural language processing for English — to split each of our 199 customer reviews into a series of words and punctuation marks. A single review that reads, "This lip mask is awesome!" thus becomes six tokens (five  words and an exclamation point): "This" "lip" "mask" "is" "awesome" "!"

                The histogram below displays the 10 most frequently occurring tokens within our dataset. On the x-axis, we have the tokens themselves, and on the y-axis, we have the total number of times each token appears across our collection of 199 customer reviews. We can see that two of these 10 tokens are punctuation marks, and a further seven are what we call "stopwords." Stopwords are common words with very little semantic meaning, such as articles, prepositions, and conjunctions. They are generally not useful in the context of sentiment analysis.

                10 Most Frequently Occurring Tokens in Customer Review Dataset

                Note: Reviews were converted to lowercase prior to tokenization.

                Removing the punctuation marks and stopwords from our dataset yields a much more interesting histogram (below). It’s no surprise to find the tokens "lips," "lip," and "product" on the x-axis — remember, we’ve tokenized reviews for a lip mask skincare product— but the other eight tokens hint at customer sentiment. For example, the token "dry" could convey a positive sentiment (e.g., "my lips were dry, but this product helped") or a negative sentiment (e.g., "this product made my lips feel dry"), depending on the context in which it appears.

                10 Most Frequently Occurring Tokens Excluding Stopwords in Customer Review Dataset

                We’ve used tokenization to turn our customer reviews into bite-sized tokens. Our final step is to perform sentiment analysis to evaluate these tokens in context (where they’re useful) and interpret the results. 

                Performing Sentiment Analysis

                Sentiment analysis is a natural language processing technique used to categorize the sentiment expressed in a piece of text as positive, negative, or neutral. We've employed lexicon-based sentiment analysis to cagegorize a single customer review from our dataset: "It is so moisturizing and keeps my lips super soft and hydrated. This approach involves matching each token in the review with a predefined lexicon to get its sentiment score a process similar to “looking up” a word in the dictionary to get its definition. We’ve used a domain-specific lexicon tailored to the makeup and beauty market created by the Stanford Natural Language Processing Group.

                The figure below shows the sentiment scores assigned to each token in the customer review. We can speculate that the tokens "moisturizing," “super,” “soft,” and “hydrated” have a positive sentiment score because they often express desirable qualities in the context of the beauty market, whereas the tokens "keeps" and "lips" have a negative sentiment score because they often express undesirable qualities (e.g., "keeps drying out my lips).

                Sentiment Analysis Results of Customer Review #37

                This example shows how B2C organizations can use sentiment analysis to extract insights from customer reviews. Businesses can understand customer sentiments, identify areas of improvement, and enhance their products or services to better meet consumer needs by tokenizing and analyzing customer feedback in aggregate.

                Inquire About Sentiment Analysis for Your Business

                 

                humankind of ai

                 

                Photo by Oleg Moroz on Unsplash


                 

                About Synaptiq

                Synaptiq is an AI and data science consultancy based in Portland, Oregon. We collaborate with our clients to develop human-centered products and solutions. We uphold a strong commitment to ethics and innovation. 

                Contact us if you have a problem to solve, a process to refine, or a question to ask.

                You can learn more about our story through our past projects, our blog, or our podcast.

                Additional Reading:

                We Helped a Startup Fight Wildfires with AI & Atmospheric Balloons

                Climate Change Fuels Record Wildfires

                The 2021 wildfire season scorched 3.6 million acres in the United States. [1]...

                Finding a Needle in a Haystack: How to Optimize Automated Document Retrieval

                At most companies, employees need a quick and easy way to search through a disorganized repository of documents to...

                Using Linear Regression to Understand the Relationship between Salary & Experience

                Understanding the factors influencing compensation is essential in the tech industry, where talent drives innovation....