Rosette for Social Media Monitoring
Enable social media analysis in over 40 languages
The rise of social media is a worldwide phenomenon, and people are using many languages to interact online. Last year, only half of all tweets were in English and more than 75% of Facebook users are outside the U.S. Many applications have been developed to ingest and analyze the data from various social media sources. Rosette®, a software development kit (SDK), enables these applications to work effectively on text in over 40 of the world’s major languages. Rosette quickly integrates with social media applications to give developers a head start in analyzing multilingual data from Twitter, Facebook, LinkedIn, and other social media channels.
Identify the Language of Tweets, Blogs, and Reviews
Cleaning and aggregating social media content starts with language identification. However, location-based and user-specified language settings for posts can be unreliable. Our language identifier has been tuned for high throughput and accuracy and identifies 55 languages. The language identifier is designed to keep up with the Internet’s unprecedented flow of data—blog entries, product reviews, and the Twitter Firehose at over 140 million tweets a day.
Analyze Text to Support Semantic and Sentiment Analysis
Semantic and sentiment analysis requires analyzing every word in a sentence. In languages such as English, Portuguese, Japanese, Spanish, and Dutch, Rosette’s linguistic analysis will:
- Tag parts-of-speech
- Lemmatize words (find their dictionary form)
- Detect sentence boundaries
- Extract noun phrases
Locate Entities to Add Metadata for Advanced Filtering
Our entity extractor populates metadata for each post, article, and social conversation with extracted entities—e.g., people, places, companies, and product names. Social media monitoring applications can then filter data based on entities in the metadata. Rosette® Entity Extractor automatically generates metadata for 18 types of entities in over a dozen languages. Developers can customize the entity extractor to detect other entities.
Supporting Sentiment Analysis at the Entity Level
Modern vendors of sentiment analysis ascribe sentiment to entities rather than to documents. This method provides a clearer view of what people are saying about brands, products, and their features. Rosette will supply any semantic or sentiment analysis system with accurate and comprehensive entity extraction in the major languages of the Americas, Europe, Asia, and the Middle East.
Cluster Posts to Streamline Search Results
Social media content aggregators can offer a more rewarding experience to subscribers with Rosette’s document clustering. Give your users the ability to review groups of near-identical conversations or posts rather than read every one. The number of items in a group can also indicate trending topics and product, or expose incidents of social media spamming.
When indexing a high volume of tweets, clustering will detect nearly identical posts, such as retweets, to avoid unnecessary processing.
Improve Search of Social Media Content
The quality of a data feed is only as good as its search. For any language searched, adding linguistic processing at index and query time increases the number of relevant search results with little degradation to precision. Our morphological analyzers produce each word’s lemma (dictionary form of a word), which informs indexing. Other methods such as stemming only look at superficial commonalities, leading to potentially unrelated results.
- Related words share a lemma: “speak,” “speaking,” “spoke,” “speaks”
- Common lemma: “speak”
- Unrelated words may share a stem: “severed,” “several”
- Common stem: “sever”
The language-aware approach of lemmatization is used by top enterprise and web search engines today.
Track Names of Products and People
Social media posts are notoriously casual, and are full of misspelled names and nicknames. Overcoming name variants is especially critical for reputation tracking or brand analysis. Our name matcher will find all relevant posts for “Madonna” even when her name is spelled “マドンナ,” “Madonna Ciccone,” or “Madona.” It handles nicknames, missing name components, spelling errors and variants, mixed order names, names in different languages, and more.
Sample name search result for “Steve Jobs” finds variations of his names, even in Arabic!
Try a Product Evaluation
Request a complete set of the Rosette software platform today.
Natural Language Processing for Over 15 Years
Basis Technology has been the industry choice for multi-language natural language processing, starting with major search engines—including Google, Yahoo!, Microsoft Bing, and Oracle Endeca. We’ve continued to refine and hone our linguistic software components to meet the new wave of language challenges inherent in social media analysis. Contact us for a free evaluation of how Rosette can make your social media analysis software internationally ready.