Show HN: OCR Arena – A playground for OCR models
ocrarena.aiI built OCR Arena as a free playground for the community to compare leading foundation VLMs and open-source OCR models side-by-side.
Upload any doc, measure accuracy, and (optionally) vote for the models on a public leaderboard.
It currently has Gemini 3, dots.ocr, DeepSeek, GPT5, olmOCR 2, Qwen, and a few others. If there's any others you'd like included, let me know!
I've been really impressed with this model specifically because of how insanely cheap it is: https://replicate.com/ibm-granite/granite-vision-3.3-2b
I didn't expect IBM to be making relevant AI models but this thing is priced at $1 per 4,000,000 output tokens... I'm using it to transcribe handwritten input text and it works very well and super fast.
I'm the dev who made this:) We are looking into adding granite!
English only :( . it seems only 2 orders of magnitude larger models have support for ie greek :/
Thanks for this! Will test this model out because we do a lot of in between steps to get around the output token limits.
Super nice if it worked for our use case to simply get full output.
I suggest you make explicit the assumption that this website is specifically about English text. Otherwise the leaderboard is pretty meaningless, with extreme differences in performance across other scripts - and potentially even languages such as Vietnamese or Czech which use Latin but have lots of accents.
Hey! I'm the dev who made this:) I think that you are right, data will bias towards english because we have a dataset that people can use that is in english. But you can also upload non-english docs into the battle mode as well as the playground!
LMArena splits their leaderboard by language: maybe you should consider doing the same thing
I assume to do that you’d need another model to do language detection on the inputs and/or outputs; but a language detection model can be a lot cheaper than an OCR model or an LLM
That's unfortunate because I have a bunch of photos with handwritten German on the back that I need to transcribe, and seeing as that I can't read German I can't really do it by myself either.
I reckon performance on German will be similar to English, the only real difference is the umlauts and those are very consistent. Not sure how it will do on the ß.
qwen 3.5 vl instruct on openrouter is damn cheap - and works quite well with non english stuff.
i have it verify some stamps which are quite messy and sometimes obscured and honestly some i could not even read.
Interesting that the 8B of the Qwen3-VL family 9th place, above a few proprietary models. This thing can run locally with llama.cpp on modest hardware.
Offtopic, but what's the best OCR that can run offline on browsers with js/wasm with reasonable CPU/memory cost?
Working on a hobby project that interacts with user handwriting on <canvas>. Tried some CNN models for digits but had trouble with characters.
If the text is written interactively on the canvas (as opposed to extraction from pixels) this task is known as "online handwriting recognition" ("online" because you can watch the text being formed incrementally, which makes it easier to e.g. distinguish individual strokes.)
I don't know what the state of the art is, but an old model for digitizer pens might not do so bad either.
There have been such a large number of OCR tools pop up over the past ~year; sorely in need for some benchmarks to compare them. Would love to see support for normal OCR tools like tesseract, EasyOCR, Microsoft Azure, etc. I'm using these for some projects, and my experiments with VLMs for OCR have resulted in too much hallucination for me to switch. Benchmarks comparing across this aisle would be incredibly useful.
A limitation of this leaderboard approach that I want to point out is that while the large general-purpose LLMs can make greater leaps of inference (on handwriting and poor quality scans), and almost always produce better layouts and more coherent output, they can also sometimes be less correct. My experience is that they're more prone to skipping or transposing sections of text, or even hallucinating completely incorrect output, than the purpose-trained models. (A similar comparison can be made in turn to the character- or word-based OCR approaches like Tesseract, which are even less "intelligent" but also even less prone to those malbehaviors.)
Also, some of the models are prone to infinite loops and I suspect this is not being punished appropriately; the frontend seems to get into a bad state after around 50k characters, which prevents the user from selecting a winner. Probably would be beneficial to make sure every model has an output length limit.
Still, a really cool resource - I'm looking forward to more models being added.
Totally agree w/ your first point! For the looping, we just added a stop condition for now in battle mode, and you can still vote on the other model afterwards. A bit of a hard problem to solve. We will add more models!
Any plans to add Document Pre-trained transformer-2 (DPT-2) from https://landing.ai/?
Love this! Would have liked to see something like textract for a pre-LLM benchmark (but of course that's expensive), and also a distinction between handwritten text and printed one.
But still, this is incredibly useful!
Two suggestions:
UX on mobile isn’t great. It wasn’t obvious to me where the second model output was and I was thrown off even more so because the option to vote for model 1 output was presented without ever even seeing model two output.
Second suggestion would be to install a MathJax plugin so one can properly rate mathematical equations and formulas. Raw LATeX is easy to mistake and it makes comparing between LATeX and Unicode outputs hard.
Hey! Dev who made this here. I hear you on the mobile UX, it's on my docket of things to fix. Same with math plugin! Thanks for the suggestions.
Really like the idea. Unfortunately, my first upload is still spinning on one of the models about 5 minutes in. Clicking "Stop Battle" seems to do nothing either
Hey, I'm the dev who built this! Looking into it. Wondering if it's because of load due to this post.
Would be great to compare these against Apple’s LiveText. This project now supports it: https://github.com/mkyt/OCRmyPDF-AppleOCR
I’ve had great results locally. Albeit you need macOS >=13 for this.
Really hope there is a layout mode or ocr with bbox mode, I want to see the model restore the whole page.
yeah, that would be a cool long term goal
We need to see Landing.ai DPT-2, from my tests its the best in term of ability to extract structure from complex tables so far.
This needs a "both are bad" button. There are some generations where I cannot rightfully beats the other.
Most of these are general LLM’s and not specifically OCR models. Where is Google Vision, Mistral, Paddle, Nanonets, or Chandra??
We wanted to keep the focus on (1) foundation VLMs and (2) open source OCR models.
We had Mistral previously but had to remove it because their hosted API for OCR was super unstable and returned a lot of garbage results unfortunately.
Paddle, Nanonets, and Chandra being added shortly!
nanonets is live now!
This is super helpful :) Curious about Grok as well!
Hello! Dev who made this here. Working on adding grok.
FYI one of the models on the battle was pretty slow to load. Are these also being rated on latency or just quality?
Ultimately, there’s some intersection of accuracy x cost x speed that’s ideal, which can be different per use case. We’ll surface all of those metrics shortly so that you can pick the best model for the job along those axes.
ideally we want people to rate based on quality - but i imagine some of the results are biased rn based on loading time
That's an easy fix if you wait for the slowest one and pop them both in at the same time, no?
Opus is multimodal??
I would be curious to see how Sonnet does. Their models are pretty solid when it comes to PDFs
Sonnet/Opus is being added shortly!
sonnet and opus are live now :)
Please add Chandra by Datalab
Claude would be good!
Claude coming shortly (in the next ~1 hour)
claude is live now!
[under-the-rug stub]
[see https://news.ycombinator.com/item?id=45988611 for explanation]
We've got like 10 LLM arenas but nothing for OCR yet, really hope this takes off!
Nice! Would love to see Azure Document Intelligence on this
This is a killer idea!