The Embarrassing State of Siri

with Jason Anthony Guy

The Embarrassing State of Siri

Paul Kafasis engages in some excellent, self-inflicted nerd-snipping on One Foot Tsunami:

I asked my iPhone who won Super Bowls 1 through 60 (that’s “I” through “LX” in Super Bowl styling) and captured a screenshot of each result.

The results are utterly appalling:

So, how did Siri do? With the absolute most charitable interpretation, Siri correctly provided the winner of just 20 of the 58 Super Bowls that have been played. That’s an absolutely abysmal 34% completion percentage. If Siri were a quarterback, it would be drummed out of the NFL.

Some of the results are especially awful. For example, to the question “Who won Super Bowl XXIII?”, Siri responds with the number of times Bill Belichick has won or appeared in the Super Bowl—completely irrelevant.

John Gruber at Daring Fireball wrote a brutally (but fairly) titled follow-up, Siri Is Super Dumb and Getting Dumber, sharing the appalling results to his own query, “Who won the 2004 North Dakota high school boys’ state basketball championship?”

New Siri — powered by Apple Intelligence™ with ChatGPT integration enabled — gets the answer completely but plausibly wrong, which is the worst way to get it wrong. It’s also inconsistently wrong — I tried the same question four times, and got a different answer, all of them wrong, each time. It’s a complete failure.

We’ve all had the Siri experience of getting a clearly wrong or patently useless answer to our query. It’s gotten to the point where I merely roll my eyes and move on—I rarely even screenshot mistakes anymore.

But I do feel sorry for the Siri team. I have some good friends who work there, and I had occasion to work with the team on Siri responses a few years back. I know they cringe every time these failures hit the blogs. They know more than anyone just how much Siri needs to improve.

The latest scuttlebutt (from Mark Gurman at Bloomberg) is that longtime Apple exec Kim Vorrath is moving to Apple Intelligence in an effort to whip it into shape. I’ve watched Vorrath and her Program Office teams operate from the inside for many years. The biggest impact she and her team had across engineering was instilling discipline: every feature or bug fix had to be approved; tied to a specific release; and built, tested, and submitted on time. It was (is!) a time-intensive process—and engineering often complained about it, sometimes vocally—but the end result was a more defined, less kitchen-sink release each year. To a significant extent, her team is the reason why a feature may get announced at WWDC but not get released until the following spring. She provided engineering risk management.

I hope the Vorrath and the Siri team can make this work. I need them to make this work. The future promised by Apple Intelligence is too compelling for it to fail.

⚙︎ Friday, January 24 2025

Subscribe to JAG’s Workshop to get new posts by email, and follow JAG’s Workshop using RSS, Mastodon, Bluesky, or LinkedIn . You can also support the site with a one-time tip of any amount.