ArXiv Bans Accounts Flooding the Platform With AI-Generated Research Papers
ArXiv is banning accounts uploading AI-generated slop, but the real fight is over whether its endorsement system can be rebuilt to stop the flood.
As first surfaced on Reddit's r/singularity, Google's Gemma 4 12B is outperforming models double its parameter count on multimodal benchmarks — and doing it without the separate vision encoder that has been standard architecture for years. The encoder is the component that translates image data into tokens the language model can read; stripping it out eliminates an entire inference step and slashes deployment complexity.
The practical implications land hardest for developers building multimodal apps outside of cloud infrastructure. Without a vision encoder, Gemma 4 12B fits on a single consumer GPU, runs cheaper, and deploys faster — a meaningful combination for anyone who has been priced out of production multimodal work by compute costs.
Community benchmarks show Gemma 4 12B matching or beating models in the 24B–30B range on several vision-language tasks. If those numbers survive independent verification, the encoder-free approach stops being an interesting research direction and becomes the default.
All comments are reviewed before appearing. Keep it respectful.
ArXiv is banning accounts uploading AI-generated slop, but the real fight is over whether its endorsement system can be rebuilt to stop the flood.
Apple heads back to WWDC with its AI strategy unchanged — no org overhaul, no dramatic pivot, and a stock market still not buying it.
Power users trust Perplexity for cited facts and ChatGPT for deep reasoning — and most serious researchers are now subscribing to both.