Back in December, a holiday promotion for Sony’s PlayStation Network inadvertently leaked total player counts for hundreds of PlayStation titles. Now, one site is building off that work to create an ongoing database of player count estimates for every game on PSN.
Gamstat launched in December as a clearinghouse for information gleaned from that “MyPS4Life” data leak. But in recent weeks it has unveiled a new algorithm—as outlined on the about page and in more detailed discussions with Ars Technica—that takes inspiration from previous efforts like Steam Gauge and Steam Spy (before Valve shut off the data spigot, in any case).
Time traveling with trophies
First, the Gamstat algorithm takes a semi-random sample of the entire universe of PlayStation Network users. Gamstat’s administrator, who goes by Dennis, outlined this sampling procedure to Ars but asked that it not be shared to prevent potential meddling by Sony.
Sample in hand, the algorithm then examines the public trophy pages for those PSN accounts to see what games each account has played. This public trophy page also includes games in which the player hasn’t earned a single trophy, but not games that the account might own without ever having played (here’s mine as an example; you may need to log in to PSN to see it).
This sampling generates a rough estimate of the percentage of all PSN accounts that have played any specific game. But it goes deeper than that. What makes this data especially robust, Dennis tells Ars, is the fact that PSN also publishes the date that every trophy was earned for all public PSN accounts. By looking at the earliest trophy earned for each game on each account, Gamstat’s algorithm can effectively “time travel” to well before it was first run, generating player-ratio estimates for dates well in the past.
Tracking trophy data at this level can give some incredibly detailed time series showing precisely when a game started becoming popular. On a sample page of Bulletstorm data, for instance, you can pinpoint the precise moment that the game’s availability on PlayStation Plus sent the game’s player base skyrocketing from about 110,000 in early November to over 1.5 million today.
This “time travel” data also means that, as new accounts are sampled each day, the sample size effectively grows for every day since that account’s inception. That means estimates for the past should only grow more robust and (hopefully) accurate as time goes on.
Compare and contrast
“Time travel” estimates in hand, Gamstat can then go back to compare its numbers to November 19, 2018, the earliest date from which the MyPS4Life data seems to have been derived. By combining the total player numbers from that leak with the ratio estimates from the Gamstat algorithm, the site is able to generate an estimate of the total number of PSN accounts that existed on November 19.
The algorithm can also see how many accounts registered their first trophy on a specific date before or after November 19 and use that ratio to estimate the growth in the total PSN account pool for other days. That total player population sat at about 357 million accounts in February, according to Gamstat, with the caveat that “on average, for every console sold there are two player accounts created.”
By going back to November 19, Gamstat can also compare its own estimates to those provided in the MyPS4Life leak as a sort of sanity check. For 90 percent of the MyPS4Life games, Gamstat’s November 19 estimate is within a 20-percent margin of error, according to data provided to Ars by Dennis. For half of the MyPS4Life games, Gamstat’s estimates are within a 7-percent margin of error.
There are some important limitations to the Gamstat algorithm, of course. The “time travel” methodology doesn’t work for players who haven’t earned a trophy yet, for instance, and may miss a few pre-trophy gameplay days. There are some potential issues with the absolute accuracy and timing of the MyPS4Life numbers as well that could mess with those comparisons. And some players end up setting their accounts to private, which could skew the results if those players are substantially different from others.
For now, Gamstat only provides a single public estimate as of February 11 for the thousands of PS3, PS4, and Vita games in its database. The site notes that it hopes to have detailed time-series data pages up for all games “eventually” but also that the site “is in alpha, so don’t expect frequent updates.”
Even now, though, and with all the caveats, Gamstat’s new algorithm provides an incredibly robust look at what has until now been extremely opaque data on PlayStation usage.