Actually, Elon, You Can Count Twitter Bots. Here's How

Elon Musk’s assertion last month that the number of Twitter bots is as “unknowable as the human soul” may well be a negotiating tactic from a man who’s probably feeling a bit of buyer’s remorse. Yet tallying up how many machines are running around on Twitter Inc.’s platform is a pretty straightforward process, if only everyone can agree on what they’re counting.

A “very significant matter” deterring Musk from closing the deal at the $44 billion price originally agreed is “whether the number of fake and spam users on the system is less than 5%, as Twitter claims, which I think is probably not most people’s experience when using Twitter,” he told Bloomberg News Editor-in-Chief John Micklethwait at the Qatar Economic Forum on Tuesday. Other items to be overcome include debt financing and a shareholder vote on the acquisition, he said.

With shares of the social media company trading at 30% below Musk’s purchase price of $54.20, there’s $15 billion worth of reasons for the South African businessman to find loopholes and push for a discount. His comment Tuesday is similar to one made in a June 6 filing, when he claimed that Twitter refused to provide the information needed “to facilitate his evaluation of spam and fake accounts on the company’s platform.”

Let’s pause here. So far we have three different terms — bot, spam, and fake accounts — which could have entirely different meanings. Not all bots are fake, and not all fakes send spam.

There are numerous ways to categorize accounts on Twitter, but the simplest and most useful may be to examine every Twitter handle on two bases. Human, where an actual person sets up and runs activities on the account, or automated where a piece of human-written software acts on behalf of the account to tweet, retweet and like. The latter, can be called a bot.

There are also authentic accounts, where the entity operating the account is the same as that which it purports to represent, and inauthentic where this is not the case. This second category would be considered fake.

There’s a lot of gray area when sorting accounts, and we shouldn’t assume a bot account is bad and human is good. For example, many corporations and media outlets do much of their posting and retweeting using third-party software. Technically, their Twitter account is being run by a bot, but we like to think of them as “good bots.” Those that keep tweeting crypto scams are “bad bots.”

In addition, inauthentic accounts are frowned upon, but quite often lay harmlessly dormant for months or years on end before being reactivated to harass individuals or spread misinformation. We’ve seen this in Chinese state-backed entities and ruling-party linked actors in India. Catching an inauthentic account is extremely difficult. Some are run by bots, but by no means all of them. And just because an account is operated by a human with a pseudonym doesn’t mean it’s malicious.

“Through a combination of machine learning and monitoring by our teams, we proactively detect accounts whose activity indicates that there are attempts to manipulate Twitter conversations in an automated way, i.e. ‘bad bots,’” Twitter said in an email to Bloomberg Opinion.

Sorting bot, fake and spam accounts by these categories is as much art as science. In fact, counting bots is the easiest part, and Musk’s claim that at least 20%, and as high as 90%, of users fall into this category is out of whack. It’s an entirely calculable number because Twitter itself lets external developers build bots and plug into its data feed. All you need to do is apply for a developer account and connect to the company’s application programming interface, known as an API.

Twitter counts how many of these are in use, and puts that figure into the “hundreds of thousands” of monthly active developer accounts. If we generously round that up to one million, then we’re looking at 1 out of 229 million users,(1) or less than 0.5%. To be clear, not all of those developer accounts actually post to Twitter — meaning tweet, retweet or like. In fact, thousands of developers use the API to passively gather data, which could be used for marketing and research. That’s a far cry from 20%.

To be fair, there is another way to run a Twitter bot without using the company’s API, and that’s by mimicking a real person. Because Twitter’s platform runs on a web browser using standard protocols, clever software developers can exploit this openness to create a bot that appears to be just like a human on Chrome or Firefox. Twitter can’t count these interactions as easily, but it does have systems in place to hunt them out, and shut them down if need be. If 20% of its users were browser-emulating bots, then there’s every chance Twitter would know about it.

Chief Executive Officer Parag Agrawal has been pretty open about these challenges.

“Spam isn’t just ‘binary’ (human/not human). The most advanced spam campaigns use combinations of coordinated humans + automation,” he said in a thread last month. “They are sophisticated and hard to catch.”

To ameliorate his concerns, Twitter offered to walk Musk through its methodologies, which his lawyers curiously decried as being “tantamount to refusing Mr. Musk’s data requests.” If, instead, the new buyer wanted to access every Tweet since the first was sent in March 2006, that could certainly be arranged. But it’s unlikely to bring much satisfaction because Musk and his team would then be left sorting through reams of data to decide for themselves what is a bot, what is fake, and what is spam.

With a $44 billion social media company and mountains of old tweets in his hands, Musk’s problem won’t be trying to know the unknowable, but deciding what exactly it is that he wants to know.

Bloomberg