Ontology-based sentiment analysis for brand crisis detection on online social media

Social media is emerging as a popular channel for online marketing. Nowadays, there more and more brands those are using social media to track and care for their brand health. Especially, social media is a source and also an important channel for brands to take care of their brands. On social media, things can move quickly due to viral information spread among the audience. Thus, a robust and automatic method for detecting crisis and even stop the crisis before it starts is urgently demanded.This paper discusses detection of brand crisis on online social media, i.e. when a brand is being suffered from unexpectedly high frequency of negative comments on online channels such as social networks, electronic news, blog and forum. In order to do so, we combined the usage of probabilistic model for burst detection with ontology-based aspect-level sentiment analysis technique to detect negative mention. The burst on online environment is a trendy topic that is rapidly growing recently, whereas the sentiment analysis process helps to identify the opinion of the audience regarding the brands. By combining domain knowledge captured in the ontology, we can make the analysis process focused on certain domains when needed. Also, the ontological concepts can also improve the accuracy of sentiment analysis at the aspect level.To evaluate the performance of our approach, we collect real data from online social media channels in Vietnam, which are provided by YouNet Media, a professional online data analysis company. Our experimental results show that the aspect-level sentiment analysis technique is extremely useful for detecting of negative mentions that related with the products and brands. Based on the achieved results, commercial products and platforms can be seriously considered.


INTRODUCTION
With the transition of information and communication technology (ICT) over the Internet, social networking has developed rapidly and become a powerful medium for dealing with social crises in realtime 1 . In particular, social media offers potential methods to perceive and respond to emergencies 2 . For example, in the terrorist act in Paris on Friday, November 13, 2015, social networks become important in helping people to be aware of terrorist attacks and encouraging each other to locate safety shelter a . In this article, we addressed a particular form of online emergency occurrence, known as a brand crisis, where a brand suffers from an abnormally high level of derogatory feedback on online platforms. Toyota b and Domino Pizza c are common examples in which online platforms provide successful means of enabling a http://www.wired.com/2015/11/paris-attacks-parisians-use-po rteouverte-hashtag-to-seek-offer-safe-shelter/ b http://www.bsgco.com/work/cases/toyota-reputation-manage ment c http://cssc.uscannenberg.org/cases/v1/v1art6/ derogatory content to circulate rapidly as a viral. Besides that, this environment also helps the brand to successfully counteract a crisis through a range of techniques: (i) early warning or predicting of a crisis 3 and (ii) consumer input on a complicated social media landscape 4 .
In this paper, we proposed the new approach, which combined two techniques for handling the brand detection. We applied probabilistic model to detect burst as a trendy topic that is emerging recently.
Besides, we applied an approach of ontology-based sentiment analysis to detect burst that implies potential crisis of a brand.

BURST IDENTIFICATION FOR CRISIS DETECTION
Burst, or burst of activity, is a case that certain features are rising sharply in frequency, corresponding with the rising of a topic 5 . We review briefly this technique in the context of crisis detection on social network as follows. The crisis detection model on social network can be viewed as a probabilistic automaton A with two states C and N (i.e. crisis and normal), corresponding to the cases of crisis occurring or not. Intuitively, the crisis can occur with brand when the number of negative mentions of the brand is increased suddenly on the online environment or social media in a certain considerable period. When A is in state N, the number of negative mentions is emitted at a slow rate. When A is in state C, the negative mentions are emitted at a faster rate. The cause that makes A switch from state N to C or vice versa depending on the previous emissions and current state.
To illustrate this, let us consider the distribution of negative mentions of in Figure 1. In the first three days, the frequencies of negative mentions are quite low, making A stay in state N, i.e. no crisis. In fourth day, the number of negative mentions is increased suddenly. However, A is still not switched from N to C, implying an inference that it may not be a crisis, but an anomaly occurrence. From fifth day to seventh day, the negative mention frequencies are low again, and A is still in state N. From the eighth day, the negative mention frequencies are gradually increased. Then, at ninth day, A changed from state N to C, implying the starting of crisis. A stay at this crisis state from the ninth day to fourteenth day, due to the averagely high frequency of negative mentions. Note that although in the twelfth day, the negative mention frequency is decreased lower, but A is still not change state, concluding this the crisis may only drop temporarily. From the fourteenth day, the negative mention frequencies decrease remarkably, resulting A changing state to N in the fifteenth, marking the end of the crisis. Therefore, the sequence of states of A in those 15 consecutive days can be represented by the string "NNNNNNNNNCCCCCCN". The authors developed the traditional HMM model to accomplish this transition sequence 5 . The model is further enhanced in 6 for better performance. The application of this approach in the time series data is presented by the work of Parikh et. al 7 . The real-data from electronic news and Twitter have been used to detect the burst 8 . In this paper, we continuously applied the algorithm 6 to detect a crisis as a burst of negative mentions.

SENTIMENT ANALYSIS AND COREFERENCE RESOLUTION
In order to deploy burst detection model as previously discussed, one needs a mechanism to infer whether a mention is negative towards a brand or not. It involves sentiment analysis 9 , which is to research how computer can analyze the user opinions. One of the challenges of this task is to identify objects mentioned by the opinion. The difficulty lies on the fact that sometimes the objects are not directly mentioned, but implied by another means. We refer this case as the problem of coreference. Apart from the typical anaphoric coreference in linguistic, one must consider the aspect coreference, which occur when multiple aspects refer to the same entity, or one aspect is attribute of another aspect 10 . Let us consider the following statement, for instance.
(S1) I consider an iPhone 6S. Unlike Samsung S7, it is unfortunately not really affordable for students. However, the design looks nice and eye-catching.
In this example, in the second sentence, the pronoun it refers to iPhone 6S in the previous sentence, making a case of anaphoric coreference. In the third sentence, design is really an attribute of iPhone 6S, introducing a case of aspect coreference. The coreference resolution of both anaphora and aspect levels can be viewed as a new development trend of sentiment analysis. This problem obviously cannot be tackled without a domain knowledge capturing both aspect and sentiment relations.
In this paper, we develop a specific ontology known as Aspect-oriented Sentiment Ontology, capturing relations between aspects and sentiment terms on a certain domain. This ontology is combined to some other lightweight NLP techniques to solve the problem of coreference for sentiment analysis.

A FRAMEWORK OF SENTIMENT ANALYSIS USING ONTOLOGY-BASED COREFERENCE RESOLUTION
In Figure 2, we presented a framework for crisis detection, which include the following components.
• A Knowledge Base consists of the Aspect-Oriented Sentiment Ontology capturing domain knowledge and Pattern Rules capturing some lightweight NLP rules for shallow processing of textual data.
• Sentiment Engine uses information captured in Knowledge Base to perform sentiment rating for each mention in the User Feeds. Resolution is handled as well by this engine.
• The Crisis Detection Automata to detect negative bursts, which implies potential crisis, from the analyzed results for the Sentiment Engine.

Aspect-Oriented Sentiment Ontology Formal Definition
Definition 1 (Aspect-oriented Sentiment Ontology). An aspect of sentimental ontology S O is a pair of {C, R}; where C = (C A , C S ) is a collection of concepts based on two elements: C A is a collection of aspect definitions, and C S is a set of sentimental definitions; is a set of relationships composed of three components: R N is a set of non-taxonomic connections; R T is a set of taxonomic connections; R S is a sentimental connection. Each definition c i in C symbolizes a group of objects, or instances, one of the same, indicated as an instance-of (c i ). Each relationship r i (c p , c q ) in R symbolizes a binary affiliation between definitions c p and c q , and the examples of that connection indicated as instance-of (r i ), are combinations of (c p , c q ) concept objects. In specific, a case of r s i (a, ) in R S refers to a relationship between a feature a ∈A and the emotion term s∈ S.
is a sentiment ontology where its components are endowed as the following Listing 1. Listing 1 -The formal representation of Generic Ontology C A = {"Thing" } C S = {"Sentiment Term", "Negative Term", "Positive Term"} R N = {} R T = {subconcept-of("Positive Term", "Sentiment Term"), subconcept-of("Negative Term", "Sentiment Term")} R S = {mentioned-by("Thing", "Sentiment Term")} instances-of("Positive Term") = {"like"} instances-of("Negative Term") = {"hate"} Generally, G O includes one element of the definition of Thing, the examples of which may be any real-life idea. For example, Thing can be mentioned or implied by an Emotion Term, which may be either Positive Term or Negative Term. In this case, G O does not pose any example of term element, non-taxonomic or sentimental relationship; while two words "like" and "hate" are examples of positive term and negative term in sentimental definitions. We focus on the notion of T-Box and A-Box to represent the ontology graphically. Practically, the T-Box describes the interaction of the concepts and the A-Box explains the occurrences of the definitions. Figure 3 indicates the T-Box and A-Box of Generic Ontology G O . We also develop two separate sentimental connections for sentiment ontology in Figure 4, referred to as mentioned-by and implied-by. An aspect example c may be mentioned-by a sentiment term s, implying that c is either positive or negative, depending on if s belongs to the Positive Term or Negative Term classes, respectively. Furthermore, implied-by is similar to mentioned-by but it has a more precise sense. An element of instance c may be implied-by a sentiment word s, which means that s is only relevant to c, not to other aspects. Thus, if s appears in the textual statement ϑ , it can be assumed that c is also inferred in ϑ without explicit mention.

Sentiment Analysis
To use a lightweight NLP technique, the corresponding conceptual graph (CG) of this claim can be generated as shown in Figure 5. We have already introduced the methodology for constructing such a computational graph via a knowledge base 12 . Nevertheless, to carry out sentiment analysis, we should catch more complicated linguistic patterns, such as the non-phrase provided in Example 3. Each pattern contained in our NLP Knowledge Base is a Sentiment Phrasing Rule that is used to collect the pattern. The composition of the Sentiment Phrasing Rules is as follows. Sentiment_Phrasing_Rule #pattern: the pattern of the sentiment phrases captured by this rules #sent_parts: the parts of the phrase expressing the sentiment #core_part: The part expresses the main sentiment trend in phrases. #core_word: used when we have multiple words in core parts #neg: Flag to indicate that it is a negative phrase or not. Example 3. Let us consider the following rule: Example_Sentiment_Rule_1 #pattern: (\S+/N\s+)+(\S+/V\s+)+(\S+/A\s*)+ #sent_parts: [V,A] #core_part: V #neg: 0 The #pattern of the rule is described by a regular expression (RE), conforming to the RE convention specified at http://regexpal.com/. Roughly speaking, one can read this rule as follows: "This rule applies for the sentence matching the following pattern: There is a noun N in the sentence, then a verb V after N, and then an adverb A after V. "; The #sent_parts specify that only V and A are necessary to infer the sentiment (meaning N would bear no

Smartphone Knowledge Base
To perform tests with the actual information, we have acquired from YouNet Media (YNM), an organization devoted to social listening and business research, actual customer analysis datasets on mobile products. Databases include 2809, 3098, and 365 negative, neutral, and positive references, overall, to 6 smartphone items. All in all, 1,782 positive terms and 1,469 negative terms are identified for the Smartphone realm. As a result, we have built a Mobile Ontology framework modeled by the Protégé framework, as shown in Figure 6.

Crisis Alert System
A crisis alert system is then developed by our method, as illustrated in Figure 7. The information is organized a "spike chart". Each "spike" shows a discussion phase. In the last spike due to the increasing amount of negative information is becoming higher, the system then changes the color of this spike to the lime for alerting.

Experimental Result
After that, we assessed the precision of our approach to sentiment analysis. We contrasted the performance of the different sentiment analysis techniques as follows.
• SEN-FULL: We have submitted our full structure.
• SEN-NO-ONT: In the system, we did not use Aspect-oriented Sentiment Ontology.
• SEN-NO-RULES: In the system, we did not use Sentiment Phrasing Rules.
• SVM: SVM was used for sentiment grouping, as this strategy was used by numerous related works.
• Delta tf.idf metrics' 13 new findings were also used to achieve the optimum efficiency of the SVM technique. It is noteworthy that SVM could contend with SEN-NO-ONT in goods where neutral evidence was predominant, e.g. It's Huawei or LG Stylus. It can be clarified that the incidence level of sentiment phrases in neutral data was not high, so SVM could show its ability to identify insignificant samples (i.e. to identify samples without sentiment views). Nevertheless, once emotional phrases get huge, SVM has obtained low output due to the difficulty of language constructs, which might contradict the sense of sentiment. This aspect was mirrored in the fact that SEN-NO-RULES and SVM have essentially reached the same efficiency in all datasets.
Our sentiment analysis output is measured following the identification of non-neutral comparisons (i.e., negative and positive situations) from datasets. Undoubtedly, the collection of sentimental terms (both positive and negative) plays a key part in this mission. If we do not use any emotion term, we will not be able to distinguish any non-neutral situations. However, if we use the entire range of emotion terms, we can find the maximum number of non-neutral instances. It also raises the risk of false-positive confirmation (i.e. neutral reference is labeled as positive or negative). Thus, in this test, we differ the scale of the term of sentiment collection from blank to maximum size. After that, we measure the output of the sentiment analysis at each change point. The findings are indicated

SI47
by the respective ROC curves as shown in Figure 9.
As stated, sentiment analysis methods included in our studies produce surprisingly great results as the areas covered by their ROC curves are significantly greater than the value of 0.5 (i.e. the area affected by a random classification). CSS FULL usually does higher than the majority of the three other ways.

DISCUSSION
Brand crisis detection has been an emerging issue nowadays with the advancement of social media. However, how to define a "crisis" formally, in order to be processed automatically in computing systems remains a challenging system. In this paper, this problem is addressed by a mathematical model of buzz, combined with the capability of knowledge representation by ontology. The performance of this proposal has been experimented, but its impact on real data still needs to be further explored.

CONCLUSION
In this article, we address the tracking of brand crisis by integrating burst identification strategies and ontological sentiment analysis. Then, we propose a general framework for this purpose. Under this system, domain ontologies, described as Aspect-oriented Sentiment Ontology, are paired with unique linguistic guidelines, described as Sentiment Phrasing Rules, to effectively facilitate sentiment ratings in online references. Our studies on actual databases from real media networks have obtained positive results.