Absurd Workflows: Durable Execution with Just Postgres

lucumr.pocoo.org

116 points by ingve 3 days ago

phs318u 23 minutes ago

Wow. Everything old is new again. I built a business state machine for a bespoke application using Oracle 8i and their stateful queues back in 2005. I had re-architected a batch-driven application (which couldn't scale temporally i.e. we had a bunch of CPU sitting near idle for a lot of the time), and turned it into an event driven solution. CPU usage became almost a horizontal line, saving us lots of money as we scaled (for the record, "scale" for this solution was writing 5M records a day into a partitioned table where we kept 13 months of data online, and then billed on it). Durable execution was just one of the many benefits we got out of this architecture. Love it.

mfrye0 7 hours ago

I've been keeping an eye on this space for awhile as it matures a bit further. There's been a number of startups that have popped up around this - apart from Temporal and DBOS, Hatchet.run looked interesting.

I've been using BullMQ for awhile with distributed workers across K8 and have hacked together what I need, but a lightweight DAG of some sort on Postgres would be great.

I took a brief look at your docs. What would you say is the main difference of yours vs some of the other options? Just the simplicity of it being a single sql file and a sdk wrapper? Sorry if the docs answer this already - trying to take a quick look between work.

the_mitsuhiko 7 hours ago

> I took a brief look at your docs. What would you say is the main difference of yours vs some of the other options? Just the simplicity of it being a single sql file and a sdk wrapper? Sorry if the docs answer this already - trying to take a quick look between work.
It's really just trying to be as simple as possible. I was motivated by trying to just do the most simple thing I could come up with after I did not really find the other solutions to be something I wanted to build on.
I'm sure they are great, but I want to leave the window open to having people self host what we are building / enable us to deploy a cellular architecture later and thus I want to stick to a manageable number of services until until I can no longer. Postgres is a known quantity in my stack and the only postgres only solution was DBOS which unfortunately did not look ready for prime time yet when I tried it. That said, I noticed that DBOS is making quite some progress so I'm somewhat confident that it will eventually get there.
- jedberg 5 hours ago
  
  Could you provide some more specifics as to why DBOS isn’t “ready for prime time”? Would love to know what you think is missing!
  FWIW DBOS is already in production at multiple Fortune 500 companies.
  - biasafe_belm 4 hours ago
    
    I'd love to hear both of your thoughts! I'm considering durable execution and DBOS in particular and was pretty happy to see Armin's shot at this.
    I'm building/architecting a system which will have to manage many business-critical operations on various schedules. Some will be daily, some bi-weekly, some quarterly, etc. Mostly batch operations and ETL, but they can't fail. I have already designed a semblance of persistent workflow in that any data ingestion and transformation is split into atomic operations whose results are persisted to blob storage and indexed in a database for cataloguing. This means that, for example, network requests can be "replayed", and data transformation can be resumed at any intermediate step. But this is enforced at the design stage, not runtime like other solutions.
    My system also needs to be easily auditable and written in Python. There are many, many ways to build this (especially if you include cloud offerings) but, like Armin, I'm trying to find the simplest architecture possible so our very small team can focus on building and not maintaining.

eximius an hour ago

This is pretty great! The main thing you need for durable execution is 1) retries (absurd does this) 2) idempotency (absurd does this via steps - but would be better handled with the APIs themselves being idempotent, then not using steps. Though absurd would certainly _help_ mitigate some APIs not being idempotent, but not completely).

saadatq 8 hours ago

Somebody said this the other day on HN, but we really are living in the golden age of Postgres.

rodmena 6 hours ago

Armin, I managed to review absurd.sql and the migrations. I am so impressed that I am rewriting the state management of my workflow engine with Absurd. Just wanted to thank you for sharing it with us. I'll keep you posted of the outcome.

rodmena 7 hours ago

I think it's a brilliant idea. Absurd can be a very good match to highway_dsl as well (which is a domain-specific-language, for workflows)

https://github.com/rodmena-limited/highway_dsl?tab=readme-ov...

motoboi 7 hours ago

Restate was built for agents before agents were cool.

Surprisingly haven take off yet when agents is all we are looking for now.

crabmusket 5 hours ago

Not to be confused with https://github.com/jlongster/absurd-sql (note the hyphenation)

oulipo2 3 days ago

Really cool! How does it compare to DBOS ? https://docs.dbos.dev/architecture

the_mitsuhiko 3 days ago

I'm sure with time DBOS will be great, I just did not have a lot of success with it when I tried it. It's quite complex, the quality of the SDKs was not overly amazing (when I initially used it, it had a ton of dependencies in it) and it just felt early.

oulipo2 3 days ago

Other question: why reimplementing your framework, rather than using an existing agent framework like Claude + MCP, or OpenAI + tool calling? Is it because you're using your own LM models, or just because you wanted more control on retries, etc?

the_mitsuhiko 3 days ago

There are not that many agent frameworks around at the moment. If you want to be provider independent you most likely either use pydantic AI or the vercel AI SDK would be my guess. Neither one have built-in solution for durable execution so you end up driving the loop yourself. So it's not that I don't use these SDKs, it's just that I need to drive the loop myself.
- oulipo2 3 days ago
  
  Okay very clear! I was saying that because your post example is just a kind of basic "tool use" example which is already implemented by MCP/OpenAI tool use, but obviously I guess your code can be suited to more complex scenarios
  Two small questions:
  1. in your README you give this example for durable execution:
  const shipment = await ctx.awaitEvent(`shipment.packed:${params.orderId}`);
  I was just wondering, how does it work? I was more expecting a generator with a `yield` statement to run "long-running tasks" in the background... otherwise is the node runtime keeping the thread running with the await? doesn't this "pile up"?
  2. would your framework be suited to long-running jobs with multiple steps? I have sometimes big jobs running in the background on all of my IoT devices, eg:
  for each d in devices: doSomeWork(d)
  and I'd like to run the big outerloop each hour (say), but only if the previous one is complete (eg max num of workers per task = 1), and that the inner-loop be some "steps" that can be cached, but can be retried if they fail
  would your framework be suited for that? or is that just a simpler use-case for pgmq and I don't need the Absurd framework?
  - the_mitsuhiko 3 days ago
    
    > Okay very clear! I was saying that because your post example is just a kind of basic "tool use" example which is already implemented by MCP/OpenAI tool use, but obviously I guess your code can be suited to more complex scenarios
    That's mostly just because I found that to be the easiest way to use any existing AI API to work. There are things like vercel's AI SDK which internally runs the agentic loop in generateText, but then there is no way to checkpoint that.
    > I was just wondering, how does it work? I was more expecting a generator with a `yield` statement to run "long-running tasks" in the background... otherwise is the node runtime keeping the thread running with the await? doesn't this "pile up"?
    When you `awaitEvent` or `sleepUntil`/`sleepFor` it sets a wake point or sets a re-schedule on the database. Then it raises `SuspendTask` and ends the execution of the task temporarily until it's rescheduled.
    As for your IOT case: yes, you should be able to do that.
  - oulipo2 3 days ago
    
    Ah, got it, it throws Exception in order to stop the task each time https://github.com/earendil-works/absurd/blob/main/sdks/type...
- jedberg 2 days ago
  
  > If you want to be provider independent you most likely either use pydantic AI ... Neither one have built-in solution for durable execution
  PydanticAI has DBOS built in [0].
  [0] https://ai.pydantic.dev/durable_execution/dbos/
  - the_mitsuhiko 2 days ago
    
    Oh interesting, maybe this makes for a better example then. If it has DBOS and Temporal it must be exposing some way to drive the loop. I'll investigate.
  - aitchnyu 2 days ago
    
    Did Pydantic jump into both observability business (Logfire) and AI?
    
    jedberg a day ago
    
    Yes! They have logfire and the AI framework (and they play nicely together and with DBOS).
    https://docs.dbos.dev/integrations/logfire

andrewstuart 5 hours ago

Reminder that Postgres does not have a monopoly on SKIP LOCKED

You can do that in Oracle, SQL server and MySQL too.

In fact you might be able to replicate what Armin is doing with SQLite because it too works just fine as a queue though no via SKIP LOCKED.

SrslyJosh 8 hours ago

Durable execution paired with an unpredictable text generator? Sign me up! /s