20 Jul Review the attached article and answer the questions: What were the results of the study? Note what opinion mining is and how its used in information retrieval. Discuss the vari
Review the attached article and answer the questions: What were the results of the study? Note what opinion mining is and how it’s used in information retrieval. Discuss the various concepts and techniques of opinion mining and the importance to transforming an organizations NLP framework.
2 pages length with APA-7 formatting.
Artif Intell Rev (2019) 52:1495–1545 https://doi.org/10.1007/s10462-017-9599-6
A survey on classification techniques for opinion mining and sentiment analysis
Fatemeh Hemmatian1 · Mohammad Karim Sohrabi1
Published online: 18 December 2017 © Springer Science+Business Media B.V., part of Springer Nature 2017
Abstract Opinion mining is considered as a subfield of natural language processing, infor- mation retrieval and text mining. Opinion mining is the process of extracting human thoughts and perceptions from unstructured texts, which with regard to the emergence of online social media and mass volume of users’ comments, has become to a useful, attractive and also challenging issue. There are varieties of researches with different trends and approaches in this area, but the lack of a comprehensive study to investigate them from all aspects is tan- gible. In this paper we represent a complete, multilateral and systematic review of opinion mining and sentiment analysis to classify available methods and compare their advantages and drawbacks, in order to have better understanding of available challenges and solutions to clarify the future direction. For this purpose, we present a proper framework of opinion mining accompanying with its steps and levels and then we completely monitor, classify, summarize and compare proposed techniques for aspect extraction, opinion classification, summary production and evaluation, based on the major validated scientific works. In order to have a better comparison, we also propose some factors in each category, which help to have a better understanding of advantages and disadvantages of different methods.
Keywords Opinion mining · Sentiment analysis · Machine learning · Classification · Lexicon
Due to the increasing development of web technology, different evaluation areas are growing in this field. The original web had the static pages and the users didn’t allow manipulating
B Mohammad Karim Sohrabi [email protected]
Fatemeh Hemmatian [email protected]
1 Department of Computer Engineering, Semnan Branch, Islamic Azad University, Semnan, Iran
1496 F. Hemmatian, M. K. Sohrabi
its contents. Nevertheless, with the advent of new programming technologies, the possibility of interactions and getting feedback on the web pages grew increasingly. The major part of these interactions includes the users’ comments, which lead to feedback for the owners of the web pages to benefit from the users’ ideas to improve the future performances and causes the products and services adapt with their target group in an appropriate manner. However, manual analysis of such opinions, especially in the social networks with a lot of audience through the world, is very difficult, time consuming and in some cases impossible.
To overcome these limitations, the opinion mining has been introduced as an effective way to discover the knowledge through the expressed comments, especially in the context of the web. Opinion mining or sentiment analysis extracts the users’ opinions, sentiments and demands from the subjective texts in a specific domain and distinguishes their polarity. The exponential and progressive increase of internet usage and the exchange of the public thoughts are the main motivations of researches in opinion mining and sentiment analysis. Since several data processing approaches (Sohrabi and Azgomi 2017a,b; Sohrabi and Ghods 2015), supervised and unsupervised machine learning techniques (Sohrabi and Akbari 2016), data mining and knowledge discovery methods, including association rule mining (Sohrabi and Marzooni 2016), frequent itemset mining (Sohrabi and Barforoush 2012, 2013; Sohrabi and Ghods 2014; Sohrabi 2018), and sequential pattern mining (Sohrabi and Ghods 2016; Sohrabi and Roshani 2017), with various applications (Arab and Sohrabi 2017; Sohrabi and Tajik 2017; Sohrabi and Karimi 2018), and web mining approaches (Zhang et al. 2004; Sisodia and Verma 2012), including web structure mining (WSM) (Velásquez 2013), web usage mining (WUM) (Yin and Guo 2013), and web content mining (WCN) (Mele 2013), have been represented in the literature, there are different choices to select techniques and provide methods for opinion mining and sentiment analysis.
The research about the opinion mining began from the early 2000, but the phrase “opinion mining” was firstly used in Dave et al. (2003) (Liu 2012). In the past 15years, various researches have been conducted to examine and analyze the opinions within news, articles, and product and service reviews (Subrahmanian and Reforgiato 2008). Nowadays, most people benefit from the opinions of different people by a simple search on the Internet when buying a commodity or selecting a service. According to the study conducted in Li and Liu (2014), 81% of the Internet users have searched related comments before buying a commodity at least once. The search rates in related comments before using restaurants, hotels and a variety of other services have been reported from 73 to 87%. It should be noted that these online investigations had a significant impact on the customer’s decisions. People’s sentimental ideas and theories can be extracted from different web resources, such as blogs (Alfaro et al. 2016; Bilal et al. 2016), review sites (Chinsha and Joseph 2015; Molina-González et al. 2014; Jeyapriya and Selvi 2015), and recently micro-blogs (Balahur and Perea-Ortega 2015; Feng et al. 2015; Pandarachalil et al. 2015; Da Silva et al. 2016; Saif et al. 2016; Wu et al. 2016; Ma et al. 2017; Li et al. 2017; Keshavarz and Abadeh 2017; Huang et al. 2017). Micro-blogs, such as Twitter, have become very popular among users and provides the possibility of sending tweets up to a specified limited number of characters (Liu 2015).
Opinion mining, can take place in three levels of the document (Sharma et al. 2014; Moraes et al. 2013; Tang et al. 2015; Sun et al. 2015; Xia et al. 2016), sentence (Marcheggiani et al. 2014; Yang and Cardie 2014) and aspect (Chinsha and Joseph 2015; Marrese-Taylor et al. 2014; Wang et al. 2017b). Also all techniques which used to sentiment analysis can be categorized into three main classes as: machine learning techniques (Pang et al. 2002; Moraes et al. 2013; Saleh et al. 2011; Habernal et al. 2015; Riaz et al. 2017; Wang et al. 2017a), lexicon-based approaches (Kanayama and Nasukawa 2006; Dang et al. 2010; Pandarachalil
A survey on classification techniques for opinion mining… 1497
Data Mining Web Mining
Web Structure Mining
Web Usage Mining
Web Content Mining
Fig. 1 The position of opinion mining
et al. 2015; Saif et al. 2016; Taboada et al. 2011; Turney 2002; Molina-González et al. 2015; Qiu et al. 2011; Liao et al. 2016; Bravo-Marquez et al. 2016; Muhammad et al. 2016; Khan et al. 2017) and hybrid methods (Balahur et al. 2012; Abdul-Mageed et al. 2014; Keshavarz and Abadeh 2017). The machine learning-based opinion mining techniques which have the benefit of using well-known machine learning algorithms, can be divided into three groups: supervised (Jeyapriya and Selvi 2015; Habernal et al. 2015; Severyn et al. 2016; Anjaria and Guddeti 2014), semi-supervised (Hajmohammadi et al. 2015; Hong et al. 2014; Gao et al. 2014; Carter and Inkpen 2015; Lu 2015) and unsupervised (Li and Liu 2014; Claypo and Jaiyen 2015; De and Kopparapu 2013) methods. Lexicon-based method relies on a dictionary of sentiments and has been highly regarded in the recent studies which can be divided into the dictionary-based method (Chinsha and Joseph 2015; Pandarachalil et al. 2015; Saif et al. 2016; Sharma et al. 2014) and corpus-based method (Turney 2002; Molina-González et al. 2015; Keshtkar and Inkpen 2013; Vulić et al. 2015). There are also very few works that are used both corpus-based and dictionary-based methods to improve the results (Taboada et al. 2011). Some literature reviews and books on opinion mining and sentiment analysis techniques and methods have also been represented before, which have investigated the problem from different points of views (Bouadjenek et al. 2016; Liu 2015).
The rest of paper is organized as follows: The clear explanation of the problem, process, tasks and applications of opinion mining has been represented in Sect. 2. Section 3 defines the levels of opinion mining. Section 4 focuses on extraction of aspects. The classification and comparison of sentiments analysis techniques are presented in Sect. 5. The evaluation criteria in the opinion mining are discussed in Sect. 6, the future direction of opinion mining are represented in Sect. 7, and finally Sect. 8 concludes the review.
2 Opinion mining: process, tasks, and applications
Opinion mining can be considered as a new subfield of natural language processing (Daud et al. 2017), information retrieval (Scholer et al. 2016), and text mining (Singh and Gupta 2017). Figure 1 represents the position of opinion mining. Opinion mining is actually consid- ered as a subset of the web content mining process in the web mining research area. Since the web content mining focuses on the contents of the web and texts have formed large volume of web content, text mining techniques are widely used in this area. The most important challenge of using text mining in web content is their unstructured or semi-structured nature that requires the natural language processing techniques to deal with. Web mining itself is
1498 F. Hemmatian, M. K. Sohrabi
also considered a subset of the data mining research area. Here, the use of data mining is to discover the knowledge from massive data sources of the web.
2.1 Opinion mining definitions
The main goal of opinion mining is to automate extraction of sentiments expressed by users from unstructured texts. Two major definitions of opinion mining can be seen in the literature. The first definition is proposed in Saleh et al. (2011), as “The automatic processing of docu- ments to detect opinion expressed therein, as a unitary body of research”. The second major definition says: “Opinion mining is extracting people’s opinion from the web. It analyzes people’s opinions, appraisals, attitudes, and emotions toward organizations, entities, person, issues, actions, topic and their attribute” (Jeyapriya and Selvi 2015; Liu 2012; Liu and Zhang 2012).
Opinion mining contains several tasks with different names which all of them are covered by it (Liu 2012):
• Sentiment Analysis The purpose of sentiment analysis is the sentiment recognition and public opinion examination that is considered as a research area in the field of text mining.
• Opinion extraction The process of extraction of users’ opinions from the web documents is called opinion extraction. The main purpose of opinion extraction is to find out the users’ ways of thinking.
• Sentiment mining Sentiment mining has two main goals: first, it determines whether the given text contains objective or subjective sentences. A sentence is called objective (or factual), when it contains the factual information about the product. The subjective sentences represent the individual emotions about the desired product. In the opinion mining we consider the subjective sentences. Second, it extracts opinions and classifies them into three categories of positive, negative and neutral (Farra et al. 2010).
• Subjection analysis Subjection analysis provides the possibility to identify, classify, and collect subjective sentences.
• Affect or emotion analysis Many of the words at the text are emotionally positive or negative. Affect analysis specifies the aspects that are expressing emotions in the text using the natural language processing techniques (Grefenstette et al. 2004).
• Review mining Review mining is a sub-topic of text sentiment analysis and its main purpose is to extract aspects from the authors’ sentiments and is to produce a summary of the sentiments. More researches in the review mining have been focused on the product reviews (Zhuang et al. 2006).
2.2 Opinion mining procedure
The main objective of the opinion mining is to discover all sentiments exist in the documents (Saleh et al. 2011); in fact, it determines the speaker’s or writer’s attitude about the different aspects of a problem. We have modeled the opinion mining process in Fig. 2, in which, each part has some obligations which are as follows:
1. Data collection Having a comprehensive and reliable dataset is the first step to perform opinion mining process. The necessary information could be collected from various web resources, such as weblogs, micro blogs (such as Twitter1), social networks (such as
A survey on classification techniques for opinion mining… 1499
Opinion active words or phrases
Data CollectionDatasets Opinion Identification
Fig. 2 Opinion mining process
Facebook2) and review websites (such as Amazon,3 Yelp,4 and Tripadvisor5). Using tools that are developed for extracting data through web, and using various techniques such as web scraping (Pandarachalil et al. 2015), can be useful to collect appropriate data. Some datasets are provided in English which can be used as references (Pang et al. 2002; Pang and Lee 2004; Blitzer et al. 2007). Researchers can apply their methods on these datasets for their simplicity. The first dataset6 prepared by Pang et al. (2002) includes 1000 positive movie reviews and 1000 negative movie reviews. This dataset is the most important and the oldest dataset in this area. The second dataset7 prepared by Pang and Lee (2004), which includes 1250 positive reviews, 1250 negative reviews, and 1250 neutral reviews. The Third one is Blitzer (Blitzer et al. 2007),8 which includes 1000 positive movie review and 1000 negative movie reviews. Table 1 shows the obtained accuracies of different researches on the benchmark datasets.
2. Opinion identification All the comments should be separated and identified from the presentedtextsinthisphase.Thentheextractedcommentsshouldbeprocessedtoseparate the inappropriate and fake ones. What we mean by opinions is all the phrases representing the individual emotions about the products, services or any other desired category.
3. Aspect extraction In this phase, all the existing aspects are identified and extracted according to the procedures. Selecting the potential aspects could be very effective in improving the classification.
4. Opinion classification After opinion identification and aspect extraction which can be considered as the preprocessing phase, in this step the opinions are classified using different techniques which this paper summarizes, classifies and compares them.
5. Production summary Based on the results of the previous steps, in the production summary level, a summary of the opinion results is produced which can be in different forms such as text, charts etc.
6. Evaluation the performance of opinion classification can be evaluated using four eval- uation parameters, namely accuracy, precision, recall and f-score.
2 https://www.facebook.com/. 3 http://www.amazon.com/. 4 http://www.yelp.com/. 5 http://www.tripadvisor.com/. 6 http://www.cs.cornell.edu/people/pabo/movie-review-data/. 7 http://www.cs.cornell.edu/people/pabo/movie-review-data/ (review corpus version 2.0). 8 http://www.cs.jhu.edu/~mdredze/datasets/sentiment/.
1500 F. Hemmatian, M. K. Sohrabi
Table 1 Obtained accuracies on the benchmark datasets
Datasets Papers Accuracy (%)
Pang et al. (2002) Chen et al. (2011) 64
Li and Liu (2012) 77
Pang and Lee (2004) Penalver-Martinez et al. (2014) 89.6
Fernández-Gavilanes et al. (2016) 69.95
Fersini et al. (2016) 81.7
Saleh et al. (2011) 85.35
Boiy and Moens (2009) 87.40
Blitzer et al. (2007) Xia et al. (2016) 80
Xia et al. (2011) 85.58
Poria et al. (2014) 87
2.3 Opinion mining applications
Sentiment analysis tries to describe and assess the expressed sentiments about the issues of interest to web users which have been mentioned in textual messages. These issues can include a range of brands or goods up to the broader favorite topics such as social, political, economic and cultural affairs. We note to the several major applications of the opinion mining in this section.
2.3.1 Opinion mining in the commercial product areas
The usage of opinion mining in the area of commercial products (Chen et al. 2014; Marrese- Taylor et al. 2014; Jeyapriya and Selvi 2015; Li et al. 2012; Luo et al. 2015) is important from three viewpoints:
1. The individual customers’ point of view: when someone wants to buy a product, having a summary of the others’ opinions can be more useful than studying the massive amounts of others’ comments about this product. Moreover, the customer will be able to compare the products easily by having a summary of the opinions.
2. The business organizations and producers’ point of view: this issue is important for the organizations to improve their products. This information is used not only for the product marketing and evaluation but also for product design and development. The manufacturing companies can even increase, decrease or change the products based on customer’s opinions.
3. The advertising companies’ point of view: the opinions are important for advertising companies because they can obtain ideas of the market demand. The public perspective of the people and type of products that they are interested in can be found among the items that extracted by opinion mining.
The important achievements of opinion mining in the commercial products are as follows (Tang et al. 2009):
• Products comparison Online sellers want their customers to comment about the pur- chased products. Due to the increasing use of the online marketing and such web services, these sentiments are growing. These sentiments are useful both for product manufactur- ers and consumers because they can have a better decision making by comparing the
A survey on classification techniques for opinion mining… 1501
sentiments and ideas of others on this product. More researches have been carried out in this area, which have focused on the issue of automatic classification of the products in two categories of recommended and non-recommended.
• Sentiments summarization When the number of sentiments increase, its recognition is difficult either for producers or consumers. With the sentiments summarization, cus- tomers find out easier the sentiments of other customers about the product and also manufacturers realize easier to the customers’ sentiments about the products as well.
• Exploring the reason of opinion The reason of the user to give an opinion can also be extracted in the opinion mining process. It is extremely important to determine the reason why consumers like or dislike the product.
2.3.2 Opinion mining in the politics area
Along with the comments on the sale and purchase of goods, with the widespread and comprehensive use of the Internet services by people, users can also comment on various political, social, religious, and cultural issues. Collecting and analyzing these comments helps greatly to politicians, managers of social issues or religious and cultural activists to take appropriate decisions for improving the social life of the community. One of the significant applications among these areas is in the political elections that individuals can benefit from the sentiments of others to make decision in their voting. Analyzing opinions existed in social networks related to election is addressed in (Tsakalidis et al. 2015; Unankard et al. 2014; Kagan et al. 2015; Mohammad et al. 2015; Archambault et al. 2013).
2.3.3 Opinion mining in the stock market and stock forecast
Achieving sustained and long-term economic growth requires optimal allocation of resources at the national economy level and this is not easily possible without the help of appropriate information and knowledge. Investing in supplied stocks in the stock exchange is one of the profitable options in the capital market which plays an important role in the individuals’ better decision making and having its own particular audience which predicting the stock. Among the studies representing the application of the opinion mining in the stock market it can be pointed out to (Bollen et al. 2011; Nofer and Hinz 2015; Bing et al. 2014; Fortuny et al. 2014) that the opinions have been used to predict the stock market. For example, Daily comments of Twitter have been analyzed using OpinionFinder and GPOMS as two important moods tracking tools by Bollen et al. (2011) and showed the correlation to daily changes in Dow Jones Industrial Average closing values.
3 Levels of opinion mining
As shown in Fig. 3, opinion mining is possible on four different levels, namely document level, sentence level, aspect level, and concept level.
Document level (Moraes et al. 2013) of opinion mining is the most abstract level of sentiment analysis and so is not appropriate for precise evaluations. The result of this level of analysis is usually general information about the documents polarity which cannot be very accurate. Sentence level opinion mining (Marcheggiani et al. 2014) is a fine-grain analysis that could be more accurate. Since the polarity of the sentences of an opinion does not imply the same polarity for the whole of opinion necessarily, aspect level of opinion mining (Xia et al. 2015) have been considered by researchers as the third level of opinion mining and
1502 F. Hemmatian, M. K. Sohrabi
Opinion Mining Levels
Sentence LevelDocument Level Concept LevelAspect Level
Fig. 3 Different levels of opinion mining
sentiment analysis. Concept level opinion mining is the forth level of sentiment analysis which focuses on the semantic analysis of the text and analyzes the concepts which do not explicitly express any emotion (Poria et al. 2014). Several recent surveys and reviews on sentiment analysis consider these levels of opinion mining from this point of view (Medhat et al. 2014; Ravi and Ravi 2015; Balazs and Velasquez 2016; Yan et al. 2017; Sun et al. 2017; Lo et al. 2017).
3.1 Document level
The sentiment analysis may be used in the document level. In this level of the opinion mining, sentiments are ultimately summarized on the whole of the document as positive or negative (Pang et al. 2002). The purpose of categorizing comments at the document level is the automatic classification of information based on a single topic, which is expressed as a positive or negative sentiment (Moraes et al. 2013). Since this level of opinion mining does not enter into details and the review process takes place in an abstract and general view, the mining process can be done much faster. In early works, most of the researches conducted at the document level and focused on datasets such as the news and the products review. By increase in the popularity of the social networks, different types of datasets were created which made increasing the studies of this level (Habernal et al. 2015; Gupta et al. 2015). Since the entire document is considered as a single entity in document level opinion mining, this level of opinion mining is not suitable for precise evaluation and comparison. Most of the techniques carrying out in the opinion classification at this level are based on supervised learning methods (Liu and Zhang 2012).
3.2 Sentence level
Since, document level sentiment analysis is too coarse, researchers investigated approaches to focus on the sentence (Wilson et al. 2005; Marcheggiani et al. 2014; Yang and Cardie 2014; Appel et al. 2016). The goal of this level of opinion mining is to classify opinions in each sentence. Sentiments analysis on the sentence level constitutes of two following steps (Liu and Zhang 2012):
• Firstly it is determined that the sentence is subjective or objective. • Secondly the polarity (positive or negative) of sentence is determined.
In the classification of comments at the sentence level, since the documents are broken into several sentences, they provide more accurate information on the polarity of the views and naturally entail more challenges than the level of the document.
A survey on classification techniques for opinion mining… 1503
3.3 Aspect level
Although the classification of text sentiments on the document and sentence level is helpful in many cases but it does not provide all the necessary details. For example, being positive of the sentiments on a document in relation to a particular entity, does not imply that the author’s opinion is positive about all the aspects of an entity. Similarly, negative sentiments do not represent the author negative opinion about all the aspects of an entity (Liu and Zhang 2012). The classification on the document level (Moraes et al. 2013) and sentence level (Marcheggiani et al. 2014) does not provide these kinds of information and we need to perform opinion mining in aspect level (Xia et al. 2015) to achieve these details. When the considered comment does not include a single entity or aspect, this level of opinion mining is the appropriate option, which is an important advantage of this level of classification and distinguishes it from the two previous levels. Aspect level opinion mining actually considers the given opinion itself instead of looking to the language structures (document, sentence or phrase) (Liu 2012). The objective of this level is to identify and extract the aspects from the sentiments text and then specify their polarity. This level of sentiments analysis can produce a summary of the sentiments about different aspects of the desired entity. It can be seen that this level of opinion mining provides a more accurate result (Chinsha and Joseph 2015).
3.4 Concept level
Cambria (2013) introduced the concept level opinion mining as a deep understanding of the natural language texts by the machine, in which, the opinion methods should go beyond the surface level analysis. Cambria et al. (2013) has also presented the concept level of opinion miningasanewavenueinthesentimentanalysis.Theanalysisofemotionsattheconceptlevel is based on the inference of conceptual information about emotion and sentiment associated with natural language. Conceptual approaches focus on the semantic analysis of the text and analyze the concepts which do not explicitly express any emotion (Poria et al. 2014). An enhanced version of SenteicNet have been proposed in Poria et al. (2013), which assign emotion labels to carry out concept level opinion mining. Poria et al. (2014) have proposed a new approach to improve the accuracy of polarity detection. An analysis of comments at the conceptuallevelhasbeenintroducedthatintegrateslinguistic,common-sensecomputing,and machine learning techniques. Their results indicate that the proposed method has a desirable accuracy and better than common statistical methods. A concept level sentiment dictionary has been built in Tsai et al. (2013) based on common-sense knowledge using a two phase method which integrates iterative regression and random walk with in-link normalization. A concept level sentiment analysis system has been presented in Mudinas et al. (2012), which combined lexicon-based and learning based approaches for concept mining from opinions. EventSensor system is represented in Shah et al. (2016) to extract concept tags from visual contents and textual meta data in concept-level sentiment analysis.
4 Aspect extraction
One of the main important steps in sentiments classification is aspect extraction (Rana and Cheah 2016). In this section we categorize current techniques for aspect extraction and selection.Asitmentioned,aspectlevelclassificationhasbetterperformanceandaprerequisite for using it is obtaining aspects. Most researches in the field of aspect extraction have been focused on the online reviews (Hu and Liu 2004; Li et al. 2015; Lv et al. 2017). In general,
1504 F. Hemmatian, M. K. Sohrabi
Aspect Extraction Techniques
Based on exploiting opinion and aspect relations
Based on nouns and the frequent noun phrases
Based on topic modeling
Based on the supervised learning techniques
Fig. 4 Classification of aspect extraction techniques
as is shown in Fig. 4, the related techniques can be placed in four categories (Liu 2012): Extraction based on the frequent noun phrases and nouns (Jeyapriya and Selvi 2015; Hu and Liu 2004; Li et al. 2015), Extraction based on exploiting opinion and aspect relations (Qiu et al. 2011; Wu et al. 2009), Extraction based on the supervised learning (Jin et al. 2009; Yu et al. 2011), Extraction based on topic modeling (Vulić et al. 2015; Mukherjee and Liu 2012).
4.1 Extraction based on frequency of noun phrases and nouns
This method is known as a simple and effective approach. Generally, when people express their comments about various aspects of a product, they basically use similar words frequently to express their sentiments (Liu 2012). In this method, the nouns and noun phrases are determined by a POS tagger and the names that have been frequently repeated are selected as aspect. POS tags indicate the role of the words in a sentence (Wang et al. 2015a). A list of POS tags has been collected in Table 2 which shows all the POS tags based on (Liu 2012).
Li et al. (2015) suggested a method for improving feature extraction performance by online reviews. Their method which is based o
Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.