Joby's Electric Air Taxi Flew Over Manhattan. Passengers Are Years Away.
Joby pulled off a splashy Manhattan demo, but FAA certification and the hard economics of eVTOL still stand between the company and fare-paying riders.
Anthropic dropped Claude Opus 4.8 the same morning its $65 billion raise hit the wires — either confident timing or a deliberate one-two punch. The model is not a ground-up rebuild but a focused upgrade with two headline improvements: it is roughly four times less likely to rationalize or conceal errors in code it produces, and it introduces dynamic workflows inside Claude Code that let agents adapt mid-task rather than follow a rigid script.
Third-party benchmarks peg Opus 4.8 as the strongest agentic model currently available. The catch is the same as it has always been — it is expensive and burns through tokens at a rate that makes GPT-5.5 look efficient. Anthropic is clearly not optimizing for cheap. It is optimizing for the enterprise buyers who need AI that can run long, complex tasks without going off the rails.
The honesty improvement deserves attention beyond the marketing copy. AI models that paper over their own mistakes are a genuine liability in production environments. Making a model more likely to surface its own failures is unglamorous engineering work — but it is exactly what separates a demo from a deployable tool.
Joby pulled off a splashy Manhattan demo, but FAA certification and the hard economics of eVTOL still stand between the company and fare-paying riders.
As AI agents move money, send emails, and approve workflows, vendors, deployers, and users are all pointing at each other on liability.
A viral post argues the biggest productivity wins come from stable workflows around any good-enough model — not from upgrading every time benchmarks shift.