News for April 2012

Sentiment Analyzer Evaluation Parameters

Product and Service Organizations are increasingly showing keen interest in sentiment analysis as it can make or break their reputation overnight. Understanding the polarity/sentiment of every users post and addressing them on time is becoming critical. Considering the fact that thousands of posts are being generated every day on a single topic, it is nearly impossible to analyze them manually.

Sentiment analysis algorithms are not accurate and cannot be 100% accurate. But it can certainly provide the required warning signals to change the (product) strategy if required at the right time. This can also help the organization save time by narrowing down the thousands of tweets to a few hundred tweets to be analyzed manually.

Once the need for sentiment analysis has been identified, the organization starts to scout for COTS products and find there are way too many. This article has a list of 40+ parameters under six key buckets that the organization can use to shortlist the right tool for them.

In the below table, you can find the six buckets mentioned above along with the parameters and a short description about each.

I Performance
1 Efficiency Scan speed (Ex. Number of tweets/sentences scanned and analyzed per second)
2 Robustness Tool running consistently without crashing. Ex. Loads of data can result in the application hanging after a certain point)
3 Data size Ability to scale to large data sets. Partly capability of the tool (architecture) and partly the underlying database
II Algorithms
4 Natural Language Processing (NLP) NLP or Machine learning algorithms. Mostly statistics based
5 Computational linguistics Statistical and/or rule-based modeling of natural language. Focuses on the practical outcome of modeling human language use
6 Text analytics Set of linguistic, statistical, and machine learning techniques that model and structure the information for BI, exploratory data analysis, research, or investigation
7 Proprietary Vs. Open algorithms Usage of Free and open source algorithms vs. proprietary algorithms
8 Mostly Human interpretation Post extraction of text, most of the analysis for sentiment is performed manually by people
9 Bayesian inference Statistical inference in which some kinds of evidence or observations are used to calculate the probability of the sentiment
10 Keyword based Keyword based search
11 Combination of the above algorithms Ability to pass the text through multiple algorithms to get the right sentiment. Number of techniques employed
12 Ability to override sentiment Automated sentiments are not always right and might need correction before reporting
III Functionality
13 Ability to fine-tune the modeling algorithms Ability to easily modify an existing algorithm to enhance its capability. Example, add an additional layer of filtering mechanism
14 Plug-ins / API / Widget support Ability to add 3rd party plug-ins to perform specialized tasks. Example, 80-20 suppression, additional graphs.
15 Data filtering/Cleansing capability Useful if there are two similar products in the market. Ex. Norton Internet Security Vs. Norton 360
16 Value substitution capability Useful in tweets where users use different abbreviations or there are spelling errors. Example, MS, Microsoft and Microsoft
17 Supported Platforms Ability to work on Linux, Windows, Mac OS, Mobile platforms, etc.
18 Alert/Trigger functionality Ability to set triggers on key metrics where real-time monitoring is available
19 Auditing/Log feature Audit feature helps capture the amount of data grabbed and processed
20 Geo Identification Ability to identify the source of the conversation. Example, Asia, US, UK, etc.
21 Multi-lingual support Support for more than one language. Example, French and English
IV Reporting
22 Export options (output) Excel, PDF, publish to portal
23 Visualization options Variety of graphs. Bar, pie, line, radar, etc.
24 Dashboard capability Refreshable and drillable dashboards
25 Customizable reports Ability to have calculated columns. Generate different visualization for the same data easily
26 Pre-defined reports Out-of-the-box reports to get social media up and running instantly
27 Drill down/drill up facility on reports Ability to see detailed/summarized information by drilling on an item present in the report
28 Web interface View and analyze reports online
V User interface & Integration
29 Training / Learning curve Tools like Radian6 requires experts to handle the tool
30 Targeted user group Analysts, Business users, combination, etc.
31 Error reporting If any of the configured feeds fail, or reports fail, a mechanism using which the error can be reported
32 Web interface Complete tool is available online and can be used to configure and build reports. No thick client
33 Complete GUI support No command line interface to perform any task
34 Bundled database Does the tool come with a built-in database or 3rd party database like MySQL, MS Access, etc. needs to be procured?
35 Native connectivity to popular data sources Built-in native connectivity to popular forums, groups, blog sites, micro blogs, etc.
36 Integration with BI and CRM tools Ability to integrate the processed data with BI tools and other CRM data. Partnership with leading BI Vendors could be considered.
37 Approved APIs APIs approved by Social media providers enable direct connectivity to their servers and enhance the rate of data pull. Ex. Twitter
VI Vendor Credibilty
38 Established Client Base Referencible clientele
39 Licensing options Free/Trial/Paid. User based license, server based, service based, etc.
40 References Installation with more than 1 to 2 years into production. This might not apply always as new companies come up with very innovative solutions
41 Support Services Tool support, training, etc.
42 Consulting services Analysis of data for client

Feel free to comment on the above parameters. If you have any addtional parameters that you feel are relevant, do let me know and I will be happy to include them (with due credits to you).