Zero-Shot Text Classification on a low-end CPU-only machine?

3 points by backend-dev-33 4 hours ago

I want to do zero-shot text classification either with the model [1] (711 MB) or with something similar. Want to achieve high throughput in classification requests per second. Classification will run on low-end hardware: some Hetzner [2] machine without GPU (Hetzner is great, reliable and cheap, they just do not have GPU machines), something like this:

* CCX13: Dedicated vCPU, 2 VCPU, 8 GB RAM

* CX32: Shared vCPU, 4 VCPU, 8 GB RAM

Now there are multiple options for deploying and serving LLMs:

* lmdeploy

* text-generation-inference

* TensorRT-LLM

* vllm

There are more and more new frameworks for this. I am a bit lost. Would you suggest the best option for deploying the above-listed model (No-GPU hardware)?

[1] https://huggingface.co/MoritzLaurer/roberta-large-zeroshot-v2.0-c

[2] https://www.hetzner.com/cloud/