Alternative Data Offers a Lot. Just Be Careful.

The age of coronavirus has presented investors, analysts, journalists and researchers with two data problems. First, traditional government numbers such as unemployment and gross domestic product growth only come out once a month or once a quarter, making it hard to spot fast-changing trends as they occur. Second, the pandemic produces a lot of unusual economic effects that make traditional numbers harder to interpret. For example, the large number of workers who were sent home but who still received paychecks caused some confusion about whether official unemployment numbers were accurate. And the spread of Covid-19 itself is a crucial factor in economic performance, but federal government agencies have been peculiarly slow to collect good data on key indicators such as hospitalization and testing.

In response to these problems, many people are turning toward alternative data sources. In the internet age, private companies gather a vast amount of information very quickly -- restaurant reservations, airline tickets, job listings, product prices, rents and many other pieces of information. These alternative data were naturally garnering steadily more attention in the years before the pandemic, but now they’re gaining even more in popularity.

One good example is restaurant reservations. Those numbers provided prompt evidence that despite lifting lockdown orders, states like Texas, Florida and Arizona re-opened too quickly and now are suffering economically from rising infections compared with states such as New York, which took a more methodical approach to re-opening.

Another example is how seriously states are taking contact tracing. My own volunteer-built website, www.TestAndTrace.com, has gathered data on the number of contact tracing workers states have hired. Researchers using these data have found a negative correlation between a state’s number of contact tracers and its subsequent case growth. It’s clear that the states now experiencing the worst outbreaks have hired fewer contact tracers than their counterparts in the Northeast whose outbreaks are subsiding.

Private data sources have also made it possible to track other economic shifts in real time. Twitter has declared a shift to fully remote work, so the company's remote job postings provide an indication of how much they intend to follow through on that promise. Ratings for the distance learning app Udemy hint at hint at how popular distance learning is becoming. Rents near college campuses can help measure how much trouble college towns are in from tuition losses, cuts in state funding and restrictions on foreign students.

Although private data sources offer the promise of fast, specific and novel information, using these data are also fraught with peril. One obvious reason is noise. Government agencies have developed lots of tricks for removing random errors and fluctuations from their data; most private data providers have not. And people using private data because it comes out quickly often look only at the last one or two data points, which increases the impact of noise. One painful example: When I mentioned a private website’s wage data in 2018, what looked like a steep drop in wages turned out to be a blip when put in the proper context, but it alarmed many people in the meantime.

A second caution about private data is that it tends to measure only one very specific thing. For example, data on the number of contact tracers don’t show how well-trained and efficient they are, or whether long turnaround times for Covid-19 tests limit the effectiveness of test-and-trace programs. Data on rents don’t fully reflect housing costs because house prices and rents can move in opposite directions. Restaurant reservations don’t measure traffic for other kinds of businesses, and so on.

Finally, private data is usually hasn't been put to the test of time. Most internet data series have only been around for a short time compared with government statistics that usually go back decades. This means that government researchers have had much more time to make sure that their numbers are consistently economically useful and aren’t compromised by unmeasured long-term trends or changes in the way the data is gathered. Lots of private data probably contains big undiscovered measurement issues or will eventually turn out to be less useful than initially hoped.

As private-sector economist Jed Kolko notes, the key is to use private data sources as a complement to government numbers, rather than as a substitute. This means, first of all, that investors and writers should be cautious about reporting or betting on private data. If it’s possible to wait for government numbers before making a decision or reporting a trend, do it. Second, private data series should be interpreted qualitatively rather than quantitatively; they can show which way the economic winds are blowing, but not how strong the gusts are. And finally, it’s crucial to look at as many different private data sources as possible to confirm that trends aren’t illusory.

The proliferation of fast, novel private data sources will help us better understand economic trends in real time and at a higher level of detail. We just have to be very careful to use that data appropriately.
Bloomberg