Sentiment analysis – analysis

Last night’s supper was brilliant for a host of reasons, not least of which being the food served at the awesome Rules restaurant in London. I was there with, amongst others, Chris Condron – a wonderful man I met through Young Rewired State. Conversation ranged from the antics of Edward VII and Lillie Langtry to sentiment analysis (the former I am comfortable with, the latter I was fascinated by).

Anyone working in the world of digital media is used to the feeling of playing catchy uppy, adopting the look of the slightly baffled whilst trying desperately to keep up and learn. That was me last night.

Today I hounded Chris for an explanation of sentiment analysis, and he gave me the following:

Crudely, semantic analysis gives you a non-statistical (unlike search engines) sense of what something (say, an article) is about.

Sentiment analysis uses semantic analysis techniques to measure that against a set of known criteria, eg is a text pro or anti something?

It’s already being used in the financial world. It could be a really cool tool (especially when run across live data [such as Twitter] rather than flat text articles) for brand management. Marketers can use it to test in real time the public’s reaction to a product launch.

Before rapidly handing me over to his much heralded colleague Dr Jarred McGinnis:

It’s a computer that analyses text for keywords and phrases and determines the positive or negative sentiment of the story. For example, “Paddington Bear sucks” would probably be determined to be negative where the statement “Paddington Bear is a hero” would be positive.

The technology is not very accurate but still useful. One example of its use is to monitor mainstream and social media for negative or positive trends with respect to your company or one of its products.

Now, I am rapidly becoming a huge fan of championing the ability of talented people whilst genuflecting to the power of the computer. As the work in my field diverges ever more on information, data and ontologies, so my respect for the statisticians and analysts grows; and my understanding of the limits of computers, and the limits of humans. I am not sure whether to proudly embrace my ever-increasing knowledge of librarian skills and understanding of the importance of cataloguing languages: Dublin Core and the like – or to run away. What I do know, is that it is increasingly important to spend time making sure that the human involvement in the digital revolution is carefully balanced with the awesome power of the computer.

So we come to sentiment analysis. Whilst doing my own homework on this tonight, I understand that it is essentially an ontology of words or phrases that are assigned positive or negative associations. Using this as a framework, you can throw a whole load of content at this wall of good and bad – and have it separated cleanly into positive and negative, using the brilliant processing power of the computer.

To give you an example that Dr J showed me http://www.newssift.com/index.jsp. Using the search box, I can put in a topic. The resulting page gives me bucket loads of information; the graph on the top left is the sentiment analysis, some useful MIS and source material is included and the main centre gives me the search results being analysed. I won’t do it for you, you go and play.

However, what has kept me most intrigued is the semantic search bit, (by semantic I mean refined associated search). Once you have run your initial search, the results page lets you add search terms to refine the results and gives you ever more detailed information.

Now, I don’t know how you would use this – I would say with a note of caution: this is just data being thrown at a pretty brutal analysis tool of positive and negative feeling (something a computer can only do by cataloguing good/bad feeling words against online content) – but it is the first step I have seen in digitally automating the mood of the nation on any given topic.

Please do let me know of other tools that you know of that have refined this further, (don’t google it – I already have!), and please do let me know your thoughts on this. I will certainly be playing about a bit more with this stuff.

23 responses

  1. Labour is already using this and it was used in the presidential election last year. I was extremly thrilled to hear of it coming up at YoungRewiredState.

    Another way of thinking of it is the Wordly images we’ve all seen – that’s one way of quickly grasping early stabs at ‘sentiment analysis’.

    There are a fair few SMEs out there selling this for marketing porpoises.

    It *has the potential to circumvent a lot of ‘engagement’ work IMO :/

  2. Hi Emma, I’ve been poking about with the idea of grabbing tweets within a local authority area, using semantic analysis to spot anything relevant to community cohesion (racist, violent etc) and then using the geopositional data to flag up tensions in localities.

    • Interesting stuff Emma. I like Prestolee’s example of taking the ‘temperature’ of a particular area. But the novice researcher in me has queasy feelings about using these tools to determine emotional responses in their current form, especially if they are used to inform citizen engagement activities. As you point out the analysis is a bit crude at the moment. Some of the potential issues of using these tools for engagement at the moment might include:
      - The demographics of social media users are not representative of a locality (unless of course you are researching the specific demographic who use e.g. Twitter).
      - No easy way to extract demographic data about participants you are analysing – making it difficult to interpret the data. Although this might be possible if people have age and location in their profile info – although this data isn’t trustworthy.
      - The tools do not currently cope well with cultural differences in language (but could learn).
      - Much more research needs to be done around interpreting the way that people behave when chatting online to find out if the way we behave is truly representive of the real values they hold, particuarly in relation to local issues, how they feel about their community etc.
      - Reliability – again more research needed.

      As with any research tools, I guess sentiment analysis tools need to be used with a dose of caution alongside other research methods. Having said that I am fascinated by the potential of natural language tools for analysing online conversation, especially as the tools become more sophisticated. The pro’s for me are:

      - Access to realtime conversations.
      - Potentially a cheap way of carrying out research online.
      - Ability to analyse a huge amount of data.
      - Lots of potential to listen out for negative comments about a service and be more pro-active in online spaces.

  3. I agree Paul – and can see why SMEs would be selling their versions. However, if it is a case of defining what we mean by *positive* or *negative* words or phrases, then surely it can’t be long before an open source ontology appear that everyone starts using.

    What I do see as very useful, as government and everyone else moves with the digital revolution, is simply a clever way of dip-testing the mood; that mixed with great use of dashboards is a step towards recognising the fact that there *is* a societal digital mood, and it *is* important.

  4. @prestolee Neil that all sounds very good, you could use these mixed with the mood tools (such as the one I mentioned above) to measure whether you were having any success. I suppose we need to consider how we would view success: in light of the above, it would be reducing the amount of red on the analyisis graph – which is as good a measure as any?

  5. I doubt we’ll see an ‘open source ontology ‘ anytime soon – online marketing doesn’t work that way. Took years to agree what a ‘unique visitor’ is.

    In UK the ‘mood testing’ has already started, I’d suggest. But it’s very politically driven. Be interesting to see how sophisticated it gets come next May!

    For government I’d suggest looking at things like the CDC’s work in marrying online sentiment analysis to countering H1N1 pandemic. This stuff has a trail back into work last year on matching political sentiment to things like search trending in US election which proved remarkably prescient.

  6. ‘took years to agree what a “unique visitor” was’ <- yes, :)

    I have seen on YouTube how this has been used in the news and so on for political mood testing, but the potential for applying this to topics would be far more useful to us as a society, I think.

    But, the pandemic cannot be countered by any level of analysis and action/reaction – unless you mean reported cases being logged as real, when in fact they are hysteria (and that can't be analysed… or could it?).

    I suppose the thumbs up thumbs down thing we see everywhere would endorse the correctness (if there were such a thing) of the sentiment analysis…

    I want to go and work alongside someone defining the terms and categorising them. That would be really good.

  7. “However, what has kept me most intrigued is the semantic search bit, (by semantic I mean refined associated search). Once you have run your initial search, the results page lets you add search terms to refine the results and gives you ever more detailed information.”

    Yes, I noted that too. Its as if one had by virtue of ones search results fallen into a wikipedia-like “disambiguation screens”

    As in “Did you mean?” features such as :

    e.g. see: http://en.wikipedia.org/wiki/PDF_(disambiguation)

    Certainly similar is do-able in government web circles if content is correctly tagged thanks with IPSV and other controlled LG*L vocabularies, or by using data extraction techniques – which is what I guess is behind this.

    I am not saying it would work as slick as your example site, but the bones are there to do similar – to create an effective “lens” looking back inward to government/local government.

    It would certainly be a lot better than some of the pathetic examples of LG “Google-like home pages” being bandied about by the
    Wordpresserati as “innovativative and bold”.

  8. As you know Emma, I’ve been working in this area with gov with my ‘conversation audit’ idea. I couldn’t agree more that we have to keep the human in the loop. While whizzy software can work with ontology, humans (writers and readers) work with language in all its semiotic messiness. It’s only with real close reading, analysis and deconstruction that we can get at what people are saying and more importantly talking about. The Turing test still stands.

  9. Michele, you are very good at saying what I intended to say.

    David, it’s all the same thing and so I suppose this is all really useful in how we analyse response, any response.

    Mr G – I have been told that this is called ‘faceted search’ – homework for another night. I have never been a fan of IPSV – but only because it did not chime with reality, not that it was not a good piece of work (when it was done).

    @pezholio that’s a really useful example, I think I might follow that for a bit

    Paul, as you know, you speak my language – but let’s not complicate things and let’s try to get that balance right between human and computer analysis. It’s still a bit borked.

  10. Emma,

    Hmmm… IPSV has a lot hidden under the hood in the way of synonyms ( or “non-preferred terms” to use the vernacular) which when brought into the sunshine can create interesting and very natural looking alternatives.

    I think we are going to see a lot of it.

  11. Pingback: Uses for sentiment analysis | Mark Pack

  12. Pingback: Twitter Trackbacks for Sentiment analysis – analysis « Emma Mulqueeny [mulqueeny.wordpress.com] on Topsy.com

  13. Have you explored we feel fine? I love the various visualisations. I just wish I could pick the data to apply it to.
    http://www.wefeelfine.org/index.html
    There’s also a TED talk on it http://www.ted.com/talks/jonathan_harris_tells_the_web_s_secret_stories.html

    Jon Bounds already mentioned above has also created “are mps happy?” http://twitter.com/arempshappy/
    He writes about how that came to be: http://www.jonbounds.co.uk/blog/638/are-mps-happy/

  14. Pingback: Stuff I’ve seen September 19th through to September 20th | Podnosh

  15. There are a few companies out there trying/doing sentiment analysis.
    My company Saplo (Saplo.com) have done some reports (right now only in Swedish but English is supported in the technology). I can translate them if you want =).

    Regarding how accurate the results are I would say it’s not about tracking keywords and doing polarity. To be accurate, the technology need to understand the meaning and be able to have the result on a scale.

    But truly it’s a very interesting field and it will happen a lot during the next years.

    Mattias Tyrberg
    CEO Saplo

  16. Pingback: links for 2009-09-23 « Working Notes 2.0

  17. Pingback: Resources and Links 25/09/09 « Framing the Dot

  18. “I want to go and work alongside someone defining the terms and categorising them. That would be really good.”

    That sounds a bit like betting on Yahoo against Google all those years ago. Surely the only reliable (whatever reliable might mean in this context) way of doing this would be to turn it into a scenic or not style game – but, critically, at the level of paragraphs or larger units, not at the level of words, and then let the values of words emerge from their place in the scored corpus. A dungeon full of lexicographers arguing about the emotional warmth of words doesn’t feel like an enticing proposition to me.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 134 other followers

%d bloggers like this: