On one of my last trips before COVID, I went to Málaga. While there, I bought a ticket to watch a football match between the city’s team and RCD Mallorca. As I arrived at the recently-renovated La Rosaleda Stadium, I found my seat and took in the sight of 30,000 blindingly-blue seats with seagulls circling above. Then, I opened LiveScore.com on my phone to figure out who was playing. I was behaving like a machine.
This article is a non-technical summary of the work presented in An Automatic Participant Detection Framework for Event Tracking on Twitter. More information about this peer-reviewed paper, including technical details about the implementation, the evaluation methodology, and the full results can be viewed in the link above.
That was two years ago. All I remember is that Málaga lost 1-0 and later missed promotion, but I couldn’t tell you who played on that day. Then again, I couldn’t have told you that an hour after the match had ended either. I knew nothing about the teams, but at least I knew enough that I could open LiveScore.com to check who “that winger who’s playing so well” was. Machines can do nothing of the sort.
That’s the backdrop for An Automatic Participant Detection Framework for Event Tracking on Twitter, a paper I co-authored recently. Machines struggle to build good timelines for events, the reasoning goes, because they understand so little about events. It’s a little like a journalist covering a story about a topic they have never heard about. In this paper, we proposed a way for machines to use Twitter to detect an event’s participants before the event starts: a first, as far as we know.
What had been done before?
The idea that participants are useful for machines to understand events isn’t, or at least shouldn’t be, revolutionary. Why wouldn’t a machine build a better timeline of a football match if it understands who is playing? Disappointingly, there was little research on the subject before this paper.
Ironically, the earliest papers in Topic Detection and Tracking prioritized participants more than modern approaches. Some of the earliest algorithms, which focused on general news, used named entities to distinguish between events. It’s an well-reasoned idea because it’s unlikely that the same person is participating in two events at the same time.
Later came algorithms that built separate timelines for each participant, and they were successful and helpful. Imagine splitting a timeline about a political debate into two parts, one for each candidate, as this paper does. Another paper did something similar, but in basketball games, building one timeline for each player.
There are many more applications for participants, like creating summaries that mention football players to describe topics better. Or you could create bigger datasets by collecting tweets that mention the participants, like I had proposed. But that’s it; there are surprisingly few papers that use participants.
Why don’t we use Named Entity Recognition to detect participants?
At this point, you’re probably wondering: can’t we just used Named Entity Recognition? After all, that is how the papers I mentioned above build separate timelines for participants or distinguish between events. Named Entity Recognition has some qualities, but it doesn’t help us detect an event’s participants before the event itself starts.
The most obvious problem is the way people tweet: it’s spontaneous and unruly, often with complete disregard for proper spelling. Named Entity Recognition relies a lot on capitalization, so it’s helpless when it encounters a tweet like this: haaland deserve a champions league trophy thats why he will go to city.
Now, that’s a big problem, but assume for a bit that we somehow overcome these challenges. There are two other problems. First, just because users mention a named entity, it doesn’t make them a participant. As I mention in the paper, Twitter users talk about many named entities, and many aren’t participants. For instance, before an Arsenal match starts, supporters might mention their rivals, Tottenham Hotspur, and their previous results.
The second issue is the opposite. As I will explain in a later post, there are many actual participants who escape attention. They might be unpopular or unimportant, but even though they are participating in the event, few users mention them. Just observe how football fans talk more about attackers than they do about defenders.
Recap: detecting event participants is hard
This is the problem we tried to solve in An Automatic Participant Detection Framework for Event Tracking on Twitter. We wanted to improve machine understanding of events, like football matches, by detecting their participants before the events start. Named Entity Recognition just wasn’t enough, as we show in the paper and as I’ll explain in a later installment.
This post is based on a paper of the same name. You can read the original paper here: An Automatic Participant Detection Framework for Event Tracking on Twitter. The following papers also use participants in their methods.
- On-Line New Event Detection and Tracking. An early paper that used named entities to distinguish between events.
- Real-Time Entity-Based Event Detection for Twitter. An algorithm that built one timeline for each named entity it detected in a stream about news stories.
- Event Summarization for Sports Games using Twitter Streams. Another algorithm that builds one timeline for each participant, this time focusing on players in basketball games.
- Generating Live Sports Updates from Twitter by Finding Good Reporters. A summarization algorithm that gives priority to tweets that mention football players.
- ELD: Event TimeLine Detection–A Participant-Based Approach to Tracking Events. The precursor to the paper I describe in this post, which describes why participants are important.