We Might Never Know How Much the Internet Is Hiding From Us

The internet is the most comprehensive compendium of human knowledge ever assembled, but is its size a feature or a bug? Does its very immensity undermine its utility as a source of information? How often is it burying valuable data under lots of junk? Say you search for some famous or semifamous person — a celebrity, influencer, politician or pundit. Are you getting an accurate picture of that person’s life or a false, manipulated one?

These aren’t new questions; they’re actually things I’ve been wondering for about as long as I’ve been covering the digital world, and the answers keep changing as the internet changes. But a recent story got me fretting about all this once more. And I worry that it has become more difficult than ever to tune in to any signal amid so much digital noise.

Karen Weise, a Times reporter who covers the technology industry, had a blockbuster story last week documenting a pattern of hostile and abusive behavior by Dan Price, the C.E.O. who became internet famous in 2015 for instituting a $70,000-a-year minimum wage at his Seattle-based credit card processing company. “He has used his celebrity to pursue women online who say he hurt them, both physically and emotionally,” Weise reported, interviewing more than a dozen women who recounted ugly encounters with him in detail. (Price denies the allegations.)

But this was not the first time that Weise punctured the mythos surrounding Price. Late in 2015, months after he was first feted by media outlets around the world for his supposed do-gooder approach to capitalism, she published a piece in Bloomberg Businessweek uncovering many skeletons in his closet — among other things, an ex-wife who’d accused him of extreme violence and an explanation for the employee raises that seemed more self-serving than he’d let on. When Weise linked to that seven-year-old article in the new story, I clicked back to it and realized that I definitely read it at the time. I remembered its headline, “The CEO Paying Everyone $70,000 Salaries Has Something to Hide,” and I remembered that its details had been widely commented on.

Price, who denounced Weise’s Bloomberg article as “reckless” and “baseless,” was canceled temporarily after the story appeared. Then, over the years, Price began to master Twitter, eventually collecting hundreds of thousands of followers and becoming a fixture in some left-leaning Twitter circles. “Tweet by tweet, his online persona grew back,” Weise writes. “The bad news faded into the background. It was the opposite of being canceled. Just as social media can ruin someone, so too can it — through time, persistence and audacity — bury a troubled past.”

This isn’t how the internet is supposed to work. In different ways, Google, Twitter, Facebook and other big tech companies have made it their mission to disseminate and organize online data. Weise’s first story about Price contained important information about a semiprominent online figure; it should have been highlighted, not buried, as he amassed his online following.

The more troubling thing is, how often is this sort of thing happening? In the abstract, the question is almost impossible to answer; by definition, you can’t make a list of stories the internet is hiding from you. I would guess the Price story is an extreme example of information burying, but there’s reason to suspect that some version of this sort of suppression is happening all the time online.

Why? Three things. Recency bias: Google is far more focused on highlighting information from the present than it once was, making events from the past more difficult to suss out. Organized manipulation: Online mobs are bent on shaping online reality — and though the platforms say they’re attentive to the problem, the mobs seem to have the upper hand. And, of course, capitalism: Lacking much competition and keen to boost quarterly numbers, tech companies may have little incentive to solve these problems.

The first issue, recency bias, is mainly about Google, and it’s one that journalists like me have been complaining about for years. Google’s search algorithm heavily favors content that was posted most recently over content from the past, even if the older data provides a much more comprehensive story. There’s a certain sense in this: Nobody wants to read ancient news. But as the Price story suggests, if you’re searching for someone with an active online presence — someone who tweets a lot, who makes a lot of media appearances or whose whole persona is based on riling folks up — the results get murky.

Try Googling Elon Musk. When I do so, I see a lot of evergreen stuff — his Wikipedia page, links to his social media and corporate bio, index pages of articles about him at various media sites — and lots and lots of links to news about the latest Elon dust-up. At the moment, these headlines are about legal maneuverings in his attempt to undo his purchases of Twitter and Tesla’s efforts to stifle video clips of its cars hitting child-size mannequins; by the time you Google him, the results might have moved on to the next controversy.

But for a controversy machine like Musk, is it really helpful for Google to return pages and pages of links to similar stories about the latest thing? What if the latest thing is not the most important thing? In the first several pages of links about him, I didn’t see the Insider story published in May about the $250,000 settlement he reached with a flight attendant who accused him of exposing himself to her. There also isn’t much about his various fights with the Securities and Exchange Commission or the time he called the man who helped the rescue 12 boys trapped in a cave in Thailand a “pedo guy.”

I don’t think Musk has actively tried to suppress this stuff; he’s just very online, and every time he does or says something new, the old stuff goes farther down.

The situation becomes much worse when there are motivated parties trying to shape what the platforms show us. There has been no better example of this than the ugly turn the internet took during the recent defamation case between Johnny Depp and his ex-wife Amber Heard. If you scanned Twitter, YouTube or TikTok during the trial, you were flooded with memes, clips and trollish posts about how terrible Heard was and how righteous Depp was.

This wasn’t because Depp’s case was so much stronger than Heard’s; as researchers have shown, it was more likely because the platforms were overrun by bots and trolls associated with people on the misogynist right who made it their mission to paint Heard in the worst light possible. They seem to have succeeded; even now, you’ve got to dig around online to find information supporting her.

The platforms say they’re constantly fighting such organized campaigns. But their efforts are opaque and seem halfhearted at best — and that’s where we get to misaligned incentives. Because bots are a kind of engagement and engagement is what pays the bills, there are few reasons for the services to really fight such campaigns. As Peiter Zatko, a former security chief at Twitter, said in a recent whistleblower complaint, “Twitter executives have little or no personal incentive to accurately ‘detect’ or measure the prevalence of spam bots.” In the same way, YouTube had little incentive to present a fairer, less manipulated picture of the Depp-Heard case — not when the Depp clips were doing big numbers.

For many readers, none of this will come as a surprise. I’m not breaking any news when I tell you not to trust everything you see on the internet. But after reading the story of Dan Price, I think it bears repeating: The internet probably isn’t giving you a fair picture of what’s happening in the world. And for any given story, you might never really know how much you aren’t seeing.

The New York Times