It's more of a misapplication than a stupid underlying system, I'd say. Elo ratings work surprisingly well when you have hundreds of homogeneous, binary-outcome, equally-meaningful datapoints, and a bunch of tightly-clustered competitors trying to differentiate themselves. Chess is of course the original example: a top player will play dozens of matches even within a single tournament, and hundreds over the course of a year, and their health isn't an issue. Tennis is similar: matches are 1v1, health is close to binary (not many players would tough out a tournament despite severe illness the way Serena Williams did at last year's French Open - you'd have to be the favorite to win a Major to justify that), lots of players whose outcomes against each other aren't all that certain, and even though you've got multiple sets, the match outcome is still pretty much binary (and it ain't over until the last point, unlike sports with timed contests).
In football, you've got a whole host of issues: tiny 16-19-game sample size, a 46-man roster turning over frequently, a wide spectrum of gray area on health, and a small number of teams varying enormously in quality, to the point where outcomes are extremely certain by the standards of the sporting world (spare me on "any given sunday", just look at standard deviations of winning percentage). Worse, you have outcomes that are very continuous in nature (widely-ranging margins of victory), parts of which are predictive (good teams tend to blow out bad teams) and parts of which are irrelevant to the point of misleading (i.e., NE's Week 17 vs Miami, or any other not-really-trying effort). I mean, it's worse than even international soccer. Other than, I dunno, ice dancing, I'd be hard pressed to name a sport that's LESS of a fit for an Elo model than American Football.
Long story short, I put just as little stock in it as MMS does, other than as a general long-term tracker of franchise lifetime performance. Shit, I'd take TMQ's "authentic wins" metric over Elo.