Kathy Hoogeboom-Pot, 6th year PhD candidate in Physics
If you have at least 269 followers on Twitter, you might be, according to the research of Dr. Hansu Gu and his co-workers in CU’s Electrical, Computer, and Energy Engineering department. But that’s not the only criterion.
Hansu worked to develop a new computer algorithm, AnchorMF, which analyzes hundreds of millions of tweets to identify ‘anchors’ – certain Twitter accounts which are followed by many other users and, more importantly, whose followers tend to respond to or tweet about the same kinds of events. The algorithm then combines this anchor information with other contextual information about an event (like where participating users are located or whether they are sports fans) to help a computer recognize the similarity between two users or events in order to better predict how users will respond.
The difficulty is that while computers are great at relating things that are the same (like two baseball bats), it’s much harder for them to relate things that are only similar (like a baseball bat and a catching glove). That requires identification of context (both are used in the sport, baseball).
For example, when Amazon notices you’re browsing for a certain item, it recommends other items you might also be interested in. It uses two kinds of context to make those recommendations more relevant. One focuses on the items, which might be similar because of the same words in their descriptions, same colors, same authors, same tags. The other focuses on the users who have bought many of the same items as you and makes suggestions based on their buying histories.
In the same way, Twitter users might tweet about events that are similar to each other or follow events that similar users are participating in. But Twitter also offers something more.
“The interesting part is the anchor information,” Hansu says.
On Twitter, users actively choose other accounts to follow – relationships which may not follow the obvious connections found in comparing user profiles or which events their tweets have in common. At the outset of a given event, like the announcement of a new iPhone or a local hockey game, any user might be the first to ‘discover’ it; depending on what kinds of events that user enjoys, they may be a ‘discoverer’ for many events of a similar type. This in turn attracts followers who want to find out more. That is how anchors are made. And Hansu’s work shows that information about which anchors are shared between users is an important indicator for finding users who are similar to each other and helps to predict what they will do in the future.
|The Twitter users who tweet about certain |
kinds of events often share a set of similar
characteristics. By uncovering those shared
traits, AnchorMF can group users by similarity
and predict the future events they are likely to
To test out the new algorithm, Hansu collected one month’s worth of Twitter events (identified by keywords associated with trending topics) and all the users who participated in those events by tweeting with those keywords. This involved spending 1-2 hours every day looking at the day’s top 200 keywords to identify those related to real-world events – like ‘Obama inauguration’ but not ‘music I’m listening to’. Then the real-time Twitter stream of more than 500 million tweets per day could be filtered by those keywords to collect only those tweets related to the selected keywords and events, along with the user information from the active users. The result was a database including 461 events involving 1.1 million users and 20.79 million tweets – 554 GB of data, 140 characters at a time.
Up to 50 percent of the events were used as a training set for AnchorMF and several other alternative algorithms, allowing them to identify the common contexts of users, shared events and, in the case of AnchorMF, shared anchors. Then the algorithms predict whether individual users will participate in the rest of the events. These predictions are then compared with the data of what users actually did.
In these tests AnchorMF was at least 20% more effective at making correct predictions than previous algorithms which rely only on a direct comparison of users’ Twitter histories and friends. And when a smaller portion of the database was used as the training set, the advantage was even greater: when only 10% of the events were used for training, AnchorMF performs 32.2% better than the next best option.
Furthermore, AnchorMF allows one to look at the event context identified by the algorithm for given keywords to learn what the participating users had in common. For example, tweets about a local café that moved to a new location largely originated from users in New Orleans, LA, where the café is located. In contrast, #oscarnoms showed no such location concentration; instead it was most strongly identified in the tags assigned to the tweets, particularly ‘news’ and ‘entertain’.
Once you teach a computer how to find similarities by identifying event context, it can uncover connections that humans probably wouldn’t. In the dataset Hansu collected in January 2013, AnchorMF revealed a strong overlap between the set of users who tweeted about the 2013 Obama inauguration and those interested in the 2013 International Consumer Electronics Show. While the connection is not immediately obvious, AnchorMF shows that these two events share the common context of people who like technology. This also suggests that the algorithm could be used to more easily identify relevant recommendations of who to follow, since users belonging to similar event contexts are more likely to be interested in similar accounts.
For Twitter, which makes most of its money by helping advertisers reach a more receptive audience, AnchorMF offers clear advantages by enabling better predictions for the kinds of events users will participate in or identifying the influential anchor accounts. For users, AnchorMF will also enable better suggestions for others you might be interested in following, or upcoming events you won’t want to miss. For society, it can offer new ways to understand connections between disparate events, or to study connections between people and cultural events in their surroundings.
Hansu sees a number of ways to refine the AnchorMF algorithm further, perhaps by accounting for the degree of different users’ participation in a certain event or by using typical anchors’ event participation to directly predict their followers’ participation. He is also excited by the possibility of implementing the algorithm in a distributed-computing environment, which could distribute the required computations among many computer processors to handle the analysis quickly enough to follow the evolution of events and participation in real time.
“I’m very proud of this work,” says Hansu, “and I just hope more people can see it and make use of it.”
Dr. Hansu Gu graduated in December 2013 after working with research advisor Professor Dirk Grunwald. He has since joined Seagate as a Data Scientist, working on big data storage and analytics solutions.
The full research article can be found at http://digitool.library.colostate.edu/webclient/DeliveryManager?pid=203948