> Do they merely memorize training data and reread it out loud, or are they picking up the rules of English grammar and the syntax of C language?
This is a false dichotomy. Functionally the reality is in the middle. They "memorize" training data in the sense that the loss curve is fit to these points but at test time they are asked to interpolate (and extrapolate) to new points. How well they generalize depends on how well an interpolation between training points works. If it reliably works then you could say that interpolation is a good approximation of some grammar rule, say. It's all about the data.
This only applies to intra-distribution "generalisation", which is not the meaning of the term we've come to associate with science. Here generalisation means across all environments (ie., something generalises if its valid and reliable where valid = measures property, and reliable = under causal permutation to the environment).
Since an LLM does not change in response to the change in meaning of terms (eg., consider the change to "the war in ukraine" over the last 10 years) -- it isn't reliable in the scientific sense. Explaining why it isnt valid would take much longer, but its not valid either.
In any case: the notion of 'generalisation' used in ML just means we assume there is a single stationary distribution of words, and we want to randomly sample from that distribution without bias to oversampling from points identical to the data.
Not least that this assumption is false (there is no stationary distribution), it is also irrelevant to generalisation in traditional sense. Since whether we are biased towards the data or not isn't what we're interested in. We want output to be valid (the system to use words to mean what they mean) and to be reliable (to do so across all environments in which they mean something).
This does not follow from, nor is it even related to, this ML sense of generalisation. Indeed, if LLMs generalised in this sense, they would be very bad at usefully generalising -- since the assumptions here are false.
I don't really follow what you're saying here. I understand that the use of language in the real-world world is not sampled from a stationary distribution, but it also seems plausible that you could relax that assumption in an LLM, e.g. conditioning the distribution on time, and then intra-distribution generalization would still make sense to study how well the LLM works for held-out test samples.
Intra-distribution generalization seems like the only rigorously defined kind of generalization we have. Can you provide any references that describe this other kind of generalization? I'd love to learn more.
intra-distribution generalization is also not well posed in practical real world settings. suppose you learn a mapping f : x -> y. casually, intra-distribution generalization implies that f generalizes for "points from the same data distribution p(x)". Two issues here:
1. In practical scenarios, how do you know if x' is really drawn from p(x)? Even if you could compute log p(x') under the true data distribution, you can only verify that the support for x' is non-zero. one sample is not enough to tell you if x' drawn from p(x).
2. In high dimensional settings, x' that is not exactly equal to an example within the training set can have arbitrarily high generalization error. here's a criminally under-cited paper discussing this: https://arxiv.org/abs/1801.02774
What we mean by x ~ p(x), y ~ p(y|x) is not x -> y st. x = f(y)
Reality itself has no probability distributions. Reality follows a causal model, where a causal relation is given in terms of necessity and possibility.
Eg., there is no such thing as Photo ~ P(Photo|PhotoOfCat) to be learned, only (All Causes) -> PhotoOfCat. Thus the setup of ML as y = f(x) is incorrect, there is no `f` which satisfies this formula (in almost all cases).
Consider the LLM case: reality has no P("The War in Ukraine"| TheWarIn2022) -- either the speaker meant TheWarIn2022, or they didnt. There's no sense in which reality has it that the utterance is intrinsically ambiguous (necessarily, for communication to be possible, pragmatics+semantics has to be able to fully resolve meaning).
So what are LLMs learning? Just an implied empirical distribution which is "smoothed over" the data just enough that it "hangs on to it, without repeating it" -- and this is vital, since if it were to try to generalise in the scientific sense, it would cease to be meaningful, since no algorithm which computes P(y|x) in this manner could capture the necessary relata that fully resolves meaning. Any system capable of modelling meaning would be probabilistic only in the sense of having a prior over such causal models: P("TheWarInUkraine"|TheWarIn2022, CausalModel) = 1, but P(CausalModel) < 1
So it's always undefined what it means to "generalise" wrt to an empirical distribution -- there aren't any.
When we say scientific theories generalise, we mean their posited necessary causal relations are maintained across irrelevant interventions. Eg., newton's theory of gravity generalises in that each term (F, M, m, r) is a valid measure of some property, and it remains a valid measure across a very large number of environments.
It fails to generalise for extreme values of M, m, etc.
In the ML sense, all intra-distributional generalisation fails for trivial permutations of any causal property, eg., m+dm -- because this induces an entirely new distribution. The "generalisation error" depends on what m+dm does within our model, but regardless, generalisation fails.
Scientific theories do not fail to generalise in this way, irrelevant causal interventions make no difference to the explanatory adequacy (or predictive power) of the theory.
Thanks for the clarification. I understand much better what you mean by "scientific generalization". I can't tell whether you're suggesting that LLMs are a dead end for modeling meaning or just that LLMs as estimating probability distributions is the wrong way to think about them?
LLMs fail to model meaning, but in doing so, model empirical distributions of meaningful tokens which is more useful, given the method being used.
If you were only modelling conditional probability, trying to model meaning this way, would make your solution worse.
ie., if LLMs really generalised in the ML sense, i.e., unbiasedly randomly sampled from some hypothetical "Meaning Distribution", they'd perform terribly -- since there is no such distribution to choose from.
By hijacking an empirical distribution, and "replaying it back", its actually possible to generate useful output.
Think about it this way, probability distributions are just measures of subjective confidence: each person has their own subjective confidence distribution P("some written words"|WhatTheyMean). If you could actually model this -- which one would you model? If you modelled any of them, you'd not be able to understand a great deal, since each person's confidence is poorly calibrated and missing meanings (eg., "acetylcholine").
So the LLM models some half-baked average of the subjective distributions of all speakers on the internet (/ in the training data) with respect to next word expectations.
This is not what we're modelling when we mean things (eg., when I say, "pass the pen", the cause of my saying it is: 1) need for a pen; 2) you having a pen; etc. -- these reasons are unavailable to the LLM, so it cannot model meaning). But as stated, it would be useless if it actually tried to -- because these methods are incapable of saying, "pass me a pen" and meaning it.
Thanks. Now, after almost two years of incomparably explosive growth in LLMs since that paper, it's remarkable to realize that we still don't know if Scarecrow has a brain. Or if he'll forever remain just a song and dance man.
Why does an LLM have to be better than you to be useful to you?
Personally, I use them for the things they can do, and for the things they can't, I just don't, exactly as I would for any other tool.
People assuming they can do more than they are actually capable of is a problem (compounded by our tendency to attribute intelligence to entities with eloquent language, which might be more of a surface level thing than we used to believe), but that's literally been one for as long as we had proverbial hammers and nails.
> Why does an LLM have to be better than you to be useful to you?
If
((time to craft the prompt) + (time required to fix LLM output)) ~ (time to achieve the task on my own)
it's not hard to see that working on my own is a very attractive proposition. It drives down complexity, does not require me to acquire new skills (i.e., prompt engineering), does not require me to provide data to a third party nor to set up an expensive rig to run a model locally, etc.
Then they might indeed not be the right tool for what you're trying to do.
I'm just a little bit tired of sweeping generalizations like "LLMs are completely broken". You can easily use them as a tool part of a process that then ends up being broken (because it's the wrong tool!), yet that doesn't disqualify them for all tool use.
Yeah but the sampling process required to determine what they are good at is not free either. (For starters, it's consuming huge amounts of public research funding and compute, but let's not go down that rabbit-hole)
If you can't find a use for the best LLMs it is 100% a skill issue. IF the only way you can think to use them is re-factoring complex java codebases you're ngmi.
So far I haven't found one that does my dishes and laundry. I really wish I knew how to properly use them.
My point being: Why would anyone have to find a use for a new tool? Why wouldn't "it doesn't help me with what I'm trying to do" be an acceptable answer in many cases?
Well I found the exact reverse: people saying LLM are useful actually need them to be useful, so they could boast about their "prompt engineering" "skill" (i.e. typing a sentence) and "AI knowledge". I've seen a caricature of this a few hours ago on LinkedIn, from a "data guy" saying devs not using AI are gonna replaced by those who do. Yet it was very clear from his reply to comments he never wrote code and wasn't a position to give an opinion on the matter, especially one like the extreme and rude one he wrote.
Both your and GPs observations (and many more) can be true simultaneously.
Some people are quick to dismiss any new technology as useless; others are quick to hail it as the thing that will take everyone's jobs in just a few months (and might consider that a good or bad thing) or solve any number of humanity's hard problems.
Usually one or the other will be seen as slightly more accurate in retrospect, but since both ultimately come from a knee-jerk reaction to something new, with rationalizations bolted on to support their respective case, most of these can be safely ignored.
Remove all the parroting of technical jargon, wishful thinking, appeals to morality etc. and you essentially have two crowds arguing why this time the roulette ball will surely fall on red/black (and everybody forgetting about the zero/green).
The point is that "internal database of statistical correlations" is a world model of sorts. We all have an internal representation of the world featuring only probabilistic accuracy after all. I don't think the distinction is as clear as you want it to be.
> "internal database of statistical correlations" [would be] a world model of sorts
Not in the sense used in the article: «memorizing “surface statistics”, i.e., a long list of correlations that do not reflect a causal model of the process generating the sequence».
A very basic example: when asked "two plus two", would the interface reply "four" because it memorized a correlation of the two ideas, or because it counted at some point (many points in its development) and in that way assessed reality? That is a dramatic difference.
so humans don't typically have world models then. you ask most people how they arrived at their conclusions (outside of very technical fields) and they will confabulate just like an LLM.
the best example is phenomenology, where people will grant themselves skills that they don't have, to reach conclusions. see also heterophenomenology, aimed at working around that: https://en.wikipedia.org/wiki/Heterophenomenology
Let me rephrase it, there could be a misunderstanding: "Surely many people cannot think properly but some have much more ability than others: the proficient ability to think well is a potential (expressed in some and not expressed in many".
To transpose that to LLMs, you should present one that systematically gets it right, not occasionally.
And anyway, the point was about two different processes before statement formulation: some output the strongest correlated idea ("2+2" → "4"); some look at the internal model and check its contents ("2, 2" → "1 and 1, 1 and 1: 4").
Did (could) Einstein think about things long and hard? Yes - that is how he explained having solved problems ("How did you do it?" // "I thought about long and hard").
The artificial system in question should (1) be able to do it, and (2) do it systematically, because it is artificial.
In a way the opposite, I'd say: the archetypes in Plato are the most stable reality and are akin to the logos that the past and future tradition hunted - knowing it is to know how things are (how things work), hence knowledge of the state of things, hence a faithful world model.
To utter conformist statements spawned from surface statistics would be "doxa" - repeating "opinions".
If you mean that just like the experiencer in the cave, seeing shadows instead of things (really, things instead of Ideas), the machine sees words instead of things, that would be in a way very right.
But we could argue it could not be impossible to create an ontology (a very descriptive ontology - "this is said to be that, and that, and that...") from language alone. Hence the question whether the ontology is there. (Actually, the question at this stage remains: "How do they work - in sufficient detail? Why the appearance of some understanding?")
Yeah, what I'm saying is that something very similar to an ontology is there. (It's incomplete but extensive, not coherent, and it's deeper in details than anything anybody ever created.)
It's just that it's a kind of a useless ontology, because the reality it's describing is language. Well, only "kind of useless" because it should be very useful to parse, synthesize and transform language. But it doesn't have the kind of "knowledge" that most people expect an intelligence to have.
Also, its world isn't only composed of words. All of them got a very strong "Am I fooling somebody?" signal during training.
Honestly, I think it’s somewhere in between. LLMs are great at spotting patterns in data and using that to make predictions, so you could say they build a sort of "world model" for the data they see. But it’s not the same as truly understanding or reasoning about the world, it’s more like theyre really good at connecting the dots we give them.
They dont do science or causality theyre just working with the shadows on the wall, not the actual objects casting them. So yeah, they’re impressive, but let’s not overhype what they’re doing. It’s pattern matching at scale, not magic. Correct me if I am wrong.
They are learning a grammar, finding structure in the text. In the case of Othello, the rules for what moves are valid are quite simple, and can be represented in a very small model. The slogan is "a minute to learn, a lifetime to master". So "what is a legal move" is a much simpler problem than "what is a winning strategy".
It's similar to asking a model to only produce outputs corresponding to a regular expression, given a very large number of inputs that match that regular expression. The RE is the most compact representation that matches them all and it can figure this out.
But we aren't building a "world model", we're building a model of the training data. In artificial problems with simple rules, the model might be essentially perfect, never producing an invalid Othello move, because the problem is so limited.
I'd be cautious about generalizing from this work to a more open-ended situation.
I don't think the point is that Othello-GPT has somehow modellled the real world training on only games but that tasking it to predict the next move forces it to model its data in a deep way. There's nothing special about Othello games vs internet text except that the latter will force it to model much more things.
Lots of problems with this paper including the fact that, even if you accept their claim that internal board state is equivalent to world model, they don't appear to do the obvious thing which is display the reconstructed "internal" board state. More fundamentally though, reifying the internal board as a "world model" is absurd: otherwise a (trivial) autoencoder would also be building a "world model".
>More fundamentally though, reifying the internal board as a "world model" is absurd: otherwise a (trivial) autoencoder would also be building a "world model".
The point is that they aren't directly training the model to output the grid state, like you would an autoencoder. It's trained to predict the next action and learning the state of the 'world' happens incidentally.
It's like how LLMs learn to build world models without directly being trained to do so, just in order to predict the next token.
By the same reasoning if you train a neural net to output next action from the output of the autoencoder then the whole system also has a "world model", but if you accept that definition of "world model" then it is extremely weak and not the intelligence-like capability that is being implied.
And as I said in my original comment they are probably not even able to extract the board state very well, otherwise they would depict some kind of direct representation of the state, not all of the other figures of board move causality etc.
Note also that the board state is not directly encoded in the neural network: they train another neural network to find weights to approximate the board state if given the internal weights of the Othello network. It's a bit of fishing for the answer you want.
> And as I said in my original comment they are probably not even able to extract the board state very well,
They do measure and report on this, both in summary in the blog post and in more detail in the paper.
> otherwise they would depict some kind of direct representation of the state
If you can perfectly accurately extract the state the result would be pretty boring to show right? It'd just be a picture of a board state and next to it the same board state with "these are the same".
> Note also that the board state is not directly encoded in the neural network: they train another neural network to find weights to approximate the board state if given the internal weights of the Othello network.
If you can extract them, they are encoded in the activations. That's pretty much by definition surely.
> It's a bit of fishing for the answer you want.
How so?
Given a sequence of moves, they can accurately identify which state most of the positions of the board are in just by looking at the network. In order for that to work, the network must be turning a sequence of moves into some representation of a current board state. Assume for the moment they can accurately identify them do you agree with that conclusion?
> They do measure and report on this, both in summary in the blog post and in more detail in the paper.
I didn't see this in the blog post, where is it? Presumably they omitted it from the blog post because the results are bad as I describe below, which is precisely why I cited it as a red flag.
> If you can perfectly accurately extract the state the result would be pretty boring to show right? It'd just be a picture of a board state and next to it the same board state with "these are the same".
But they don't! They have nearly 10% tile level error on the human-data-trained model. That's nearly 100% board error. It's difficult to understand how bad this is but if you visualize it somehow (for example sampling random boards) it becomes obvious that it is really bad. On average about 6 or 7 tiles are going to be wrong. With nearly 100% probability you get an incorrect board.
> If you can extract them, they are encoded in the activations. That's pretty much by definition surely.
No, that's silly. For example you can cycle through every algorithm/transformation imaginable until you hit one that extracts the satanic verses of the Bible. As I said in another comment although it is in theory mitigated somewhat by doing test/validation splits in practice you keep trying different neural network hyperparameters to finesse your validation performance.
> How so? Given a sequence of moves, they can accurately identify which state most of the positions of the board are in just by looking at the network. In order for that to work, the network must be turning a sequence of moves into some representation of a current board state. Assume for the moment they can accurately identify them do you agree with that conclusion?
What conclusion? I believe you can probably train a neural network to take in board moves and output board state with some level of board error. So what?
> hey don't appear to do the obvious thing which is display the reconstructed "internal" board state.
I've very confused by this, because they do. Then they manipulate the internal board state and see what move it makes. That's the entire point of the paper. Figure 4 is literally displaying the reconstructed board state.
I replied to a similar comment elsewhere: They aren't comparing the reconstructed board state with the actual board state which is the obvious thing to do.
Unless I'm misunderstanding something they are not comparing the reconstructed board state to the actual state which is the straightforward thing you would show. Instead they are manipulating the internal state to show that it yields a different next-action, which is a bizarre, indirect way to show what could be shown in the obvious direct way.
Figure 4 is showing both things. Yes, there is manipulation of the state but they also clearly show what the predicted board state is before any manipulations (alongside the actual board state)
The point is not to show only a single example it is to show how well the recovered internal state reflects the actual state in general —— analyze the performance (this is particularly tricky due to the discrete nature of board positions). That’s ignoring all the other more serious issues I raised.
I haven’t read the paper in some time so it’s possible I’m forgetting something but I don’t think so.
>That’s ignoring all the other more serious issues I raised.
The only other issue you raised doesn't make any sense. A world model is a representation/model of your environment you use for predictions. Yes, an auto-encoder learns to model that data to some degree. To what degree is not well known. If we found out that it learned things like 'city x in country a is approximately distance b from city y' let's just learn where y is and unpack everything else when the need arises then that would certainly qualify as a world model.
Linear regression also learns to model data to some degree. Using the term “world model” that expansively is intentionally misleading.
Besides that and the big red flag of not directly analyzing the performance of the predicted board state I also said training a neural network to return a specific result is fishy, but that is a more minor point than the other two.
The degree matters. If we find auto encoders learning surprisingly deep models then i have no problems saying they have a world model. It's not the gotcha you think it is.
>the big red flag of not directly analyzing the performance of the predicted board state I also said training a neural network to return a specific result is fishy
The idea that probes are some red flag is ridiculous. There are some things to take into account but statistics is not magic. There's nothing fishy about training probes to inspect a models internals. If the internals don't represent the state of the board then the probe won't be able to learn to reconstruct the state of the board. The probe only has access to internals. You can't squeeze blood out of a rock.
I don’t know what makes a “surprisingly deep model” but I specifically chose autoencoders to show that simply encoding the state internally can be trivial and therefore makes that definition of “world model” vacuous. If you want to add additional stipulations or some measure of degree you have to make an argument for that.
In this case specifically “the degree” is pretty low since predicting moves is very close to predicting board state (because for one you have to assign zero probability to moves to occupied positions). That’s even if you accept that world models are just states, which as mtburgess explained is not reasonable.
Further if you read what I wrote I didn’t say internal probes are a big red flag (I explicitly called it the minor problem). I said not directly evaluating how well the putative internal state matches the actual state is. And you can “squeeze blood out of a rock”: it’s the multiple comparison problem and it happens in science all the time and it is what you are doing by training a neural network and fishing for the answer you want to see. This is a very basic problem in statistics and has nothing to do with “magic”. But again all this is the minor problem.
>In this case specifically “the degree” is pretty low since predicting moves is very close to predicting board state (because for one you have to assign zero probability to moves to occupied positions).
The depth/degree or whatever is not about what is close to the problem space. The blog above spells out the distinction between a 'world model' and 'surface statistics'. The point is that Othello GPT is not in fact playing Othello by 'memorizing a long list of correlations' but by modelling the rules and states of Othello and using that model to make a good prediction of the next move.
>I said not directly evaluating how well the putative internal state matches the actual state is.
This is evaluated in the actual paper with the error rates using the linear and non linear probes. It's not a red flag that a precursor blog wouldn't have such things.
>And you can “squeeze blood out of a rock”: it’s the multiple comparison problem and it happens in science all the time and it is what you are doing by training a neural network and fishing for the answer you want to see.
The multiple comparison problem is only a problem when you're trying to run multiple tests on the same sample. Obviously don't test your probe on states you fed it during training and you're good.
> The point is that Othello GPT is not in fact playing Othello by 'memorizing a long list of correlations' but by modelling the rules and states of Othello and using that model to make a good prediction of the next move.
I don't know how you rule out "memorizing a long list of correlations" from the results. The big discrepancy in performance between their synthetic/random-data training and human-data training suggests to me the opposite: random board states are more statistically nice/uniform and suggests that these are in fact correlations not state computations.
> This is evaluated in the actual paper with the error rates using the linear and non linear probes. It's not a red flag that a precursor blog wouldn't have such things.
It's the main claim/result! Presumably the reason it is omitted from the blog is that the results are not good: nearly 10% error per tile. Othello boards are 64 tiles so the board level error rate (assuming independent errors) is 99.88%.
> The multiple comparison problem is only a problem when you're trying to run multiple tests on the same sample. Obviously don't test your probe on states you fed it during training and you're good.
In practice what is done is you keep re-running your test/validation loop with different hyperparameters until the validation result looks good. That's "running multiple tests on the same sample".
I think they learn how to become salespeople, politicians, lawyers, and résumé consultants with fanciful language lacking in facts, truth, and honesty.
I suddenly have a vision of an AI driven sales pipeline that uses millions of invasive datapoints about you to create the most convincing sales pitch mathematically possible.
This is irrelevant: games, rules, shapes, etc. are all abstract. So any model of samples of these is a model of them.
The "world model" in question is a model of the world. Here "data" is not computer science data, ie., numbers its measurements of the world, ie., the state of a measuring device causally induced by the target of measurement.
Here there is no "world" in the data, you have to make strong causal assumptions about what properties of the target cause the measures. This is not in the data. There is no "world model" in measurement data. Hence the entirety of experimental science.
No result based on one mathematical function succeeding in approximating another is relevant whether measurement data "contains" a theory of the world which generates it: it does not. And of course if your data is abstract, and hence constitutes the target of modelling (only applies to pure math), then there is no gap -- a model of "measures" (ie., the points on a circle) is the target.
No model of actual measurement data, ie., no model in the whole family we call "machine learning", is a model of its generating process. It contains no "world model".
Photographs of the night sky are compatible with all theories of the solar system in human history (including, eg., stars are angels). There is no summary of these photographs which gives information about the world over and above just summarising patterns in the night sky.
The sense in which any model of measurement data is "surface statistics" is the same. Consider plato's cave: pots, swords, etc. on the outside project shadows inside. Modelling the measurement data is taking cardboard and cutting it out so it matches the shadows. Modelling the world means creating clay pots to match the ones passing by.
The latter is science: you build models of the world and compare them to data, using the data to decide between them.
The former is engineering (, pseudoscience): you take models of measures and reply these models to "predict" the next shadow.
If you claim the latter is just a "surface shortcut" you're an engineer. If you claim its a world model you're a pseudoscientist.
> There is no summary of these photographs which gives information about the world over and above just summarising patterns in the night sky.
You're stating this as fact but it seems to be the very hypothesis the authors (and related papers) are exploring. To my mind, the OthelloGPT papers are plainly evidence against what you've written - summarising patterns in the sky really does seem to give you information about the world above and beyond the patterns themselves.
(to a scientist this is obvious, no? the precession of mercury, a pattern observable in these photographs, was famously not compatible with known theories until fairly recently)
> Modelling the measurement data is taking cardboard and cutting it out so it matches the shadows. Modelling the world means creating clay pots to match the ones passing by.
I think these are matters of degree. The former is simply a worse model than the latter of the "reality" in this case. Note that our human impressions of what a pot "is" are shadows too, on a higher-dimensional stage, and from a deeper viewpoint any pot we build to "match" reality will likely be just as flawed. Turtles all the way down.
It is exactly this non-sequitur which I'm pointing out.
Approximating an abstract discrete function (a game), with a function approximator has literally nothing to do with whether you can infer the causal properties of the data generating process from measurement data.
To equate the two is just rank pseudoscience. The world is not made of measurements. Summaries of measurement data aren't properties in the world, they're just the state of the measuring device.
If you sample all game states from a game, you define the game. This is the nature of abstract mathematical objects, they are defined by their "data".
Actual physical objects are not defined by how we measure them: the solar system isnt made of photographs. This is astrology: to attribute to the patterns of light hitting the eye some actual physical property in the universe which corresponds to those patterns. No such exists.
It is impossible, and always has been, to treat patterns in measurements as properties of objects. This is maybe one of the most prominent characteristics of psedusocience.
The point is that approximating a distribution causally downstream of the game (text-based descriptions, in this case) produces a predictive model of the underlying game mechanics itself. That is fascinating!
Yes, the one is formally derivable from the other, but the reduction costs compute, and to a fixed epsilon of accuracy this is the situation with everything we interact with on the day to day.
The idea that you can learn underlying mechanics from observation and refutation is central to formal models of inductive reasoning like Solomonoff induction (and idealised reaoners like AIXI, if you want the AI spin). At best this is well established scientific method, at worst a pretty decent epistemology.
Talking about sampling all of the game states is irrelevant here; that wouldn't be possible even in principle for many games and in this case they certainly didn't train the LLM on every possible Othello position.
> This is astrology: to attribute to the patterns of light hitting the eye some actual physical property in the universe which corresponds to those patterns. No such exists.
Of course not - but they are highly correlated in functional human beings. What do you think our perception of the world grounds out in, if not something like the discrepancies between (our brain's) observed data and it's predictions? There's even evidence in neuroscience that this is literally what certain neuronal circuits in the cortex are doing (the hypothesis being that so-called "predictive processing" is more energy efficient than alternative architectures).
Patterns in measurements absolutely reflect properties of the objects being measured, for the simple reason that the measurements are causally linked to the object itself in controlled ways. To think otherwise is frankly insane - this is why we call them measurements, and not noise.
The "Ladder of Causation" proposed by Judea Pearl covers similar ground - "Rung 1” reasoning is the purely predictive work of ML models, "Rung 2" is the interactive optimization of reinforcement learning, and "Rung 3" is the counterfactual and casual reasoning / DGP construction and work of science. LLMs can parrot Rung 3 understanding from ingested texts but it can't generate it.
> Here there is no "world" in the data, you have to make strong causal assumptions about what properties of the target cause the measures. This is not in the data. There is no "world model" in measurement data.
That's wrong. Whatever your measuring device, it is fundamentally a projection of some underlying reality, eg. a function m in m(r(x)) mapping real values to real values, where r is the function governing reality.
As you've acknowledged that neural networks can learn functions, the neural network here is learning m(r(x)). Clearly the world is in the model here, and if m is invertible, then we can directly extract r.
Yes, the domain of x and range of m(r(x)) is limited, so the inference will be limited for any given dataset, but it's wrong to say the world is not there at all.
In the limited sense in which the world is recoverable from measures of it requires a model of how it was generated.
For animals, we are born with primitive causal models of our bodies we can recurse on to build models of the world in this sense. So as toddlers we learn perception by having an internal 3d model of our bodies -- so we can ascribe distances to our optical measures.
Without such assumptions there really isnt any world at all in this data. A grid of pixel patterns has no meaning as a grid of numbers. NNs are just mapping this grid to a "summary space" under supervision of how to place the points. This supervision enables a useful encoding of the data, but does not provide the kind of assumptions needed to work backwards to properties of its generation.
In the case of photos, there is no such `m` -- the state of a sensor is not uniquley caused by any catness or dogness properties. Almost no photographs acquire their state from a function X -> Y, because the sensor state is "radically uncontrolled" in a causal sense. Thus the common premise of ML, that y = f(x) is false from the start -- the relevant causal graph has a near infinite number of causes that are unspecified, so f does not exist.
> For animals, we are born with primitive causal models of our bodies we can recurse on to build models of the world in this sense. So as toddlers we learn perception by having an internal 3d model of our bodies -- so we can ascribe distances to our optical measures.
If you agree that we evolved such an internal model from some base system that had no such abilities or assumptions, then it follows that evolving programs to do such things is also possible. If you think the base system itself already had such abilities, then you have a bootstrapping problem.
> Almost no photographs acquire their state from a function X -> Y, because the sensor state is "radically uncontrolled" in a causal sense.
This seems like a bizarre claim to me. A camera is clearly capturing a 2D projection of a 3D world. We can also construct 3D structures from a set of 2D pictures, as we do this all of the time with photogrammetry.
It's also clear that we can infer "catness" and "dogness" of subjects despite the fact that these are not strictly defined categories, because we literally did this in our pre-genotyping scientific taxonomies which organized the animal kingdom based on morphology, and this was pretty accurate. It then follows that with enough 2D pictures, one could create a taxonomy of objects based on reconstructed 3D morphology with a high degree of statistical accuracy.
I think this demonstrates that your premise is incorrect that there is no "world" that can be modelled only from pictures. Current AI systems are probably only doing a subset of this, finding statistical correlations among morphological features, but that's still a world model IMO, if a bit anemic.
Trivial, m is not invertible in that case. By contrast, measuring devices need to be invertible within some domain, otherwise they're not actually measuring, and we wouldn't use them.
You defined "m" as the measuring function which is not the pseudo-random number generator itself. I guess I don't understand your definitions.
In any case it's pretty obvious you have have deterministic chaotic output from which you cannot practically (or even in theory) recover the internal workings of the system that generated them. Take just a regular pseudorandom number generator or a cellular automata.
> In any case it's pretty obvious you have have deterministic chaotic output from which you cannot practically (or even in theory)
Solomonoff induction says otherwise. Of course it might take a stupendously large number of samples, but as the number of samples goes to infinity, the probability of reproducing the PRNG goes to 1.
In the example, the 'world' is the grid state. Obviously that's much simpler than the real world but the point is to show that even when the model is not directly trained to input/output this world state it is still learned as a side effect of prediction the next token.
There is no world. The grid state is not a world, there is no causal relationship between the grid state and the board. No one in this debate denies that NNs approximate functions. Since a game is just a discrete function, no one denies an NN can approximate it. Showing this is entirely irrelevant and shows a profound misunderstanding of what's at issue.
The whole debate is about whether surface patterns in measurement data can be reversed by NNs to describe their generating process, ie., the world. If the "data" isnt actual measurements of the world, no one arguing about it.
If there is no gap between the generating algorithm and the samples, eg., between a "circle" and "the points on a circle" -- then there is no "world model" to learn. The world is the data. To learn "the points on a cirlce" is to learn the cirlce.
By taking cases where "the world" and "the data" are the same object (in the limit of all samples), you're just showing that NNs model data. That's already obvious, no ones arguing about it.
That a NN can approximate a discrete function does not mean it can do science.
The whole issue is that the cause of pixel distributions is not in those distributions. A model of pixel patterns is just a model of pixel patterns, not of the objects which cause those patterns. A TV is not made out of pixels.
The "debate" insofar as there is one, is just some researchers being profoundly confused about what measurement data is: measurements are not their targets, and so no model of data is a model of the target. A model of data is just "surface statistics" in the sense that these statistics describe measurements, not what caused them.
> Photographs of the night sky are compatible with all theories of the solar system in human history (including, eg., stars are angels). There is no summary of these photographs which gives information about the world over and above just summarising patterns in the night sky.
This is blatantly incorrect. Keep in mind that much of modern physics has been invented via observation. Kepler's law and ultimately the law of Gravitation and General Relativity came from these "photographs" of the night sky.
If you are talking about the fact that these theories only ever summarize what we see and maybe there's something else behind the scenes that's going on, then this becomes a different discussion.
> Do they merely memorize training data and reread it out loud, or are they picking up the rules of English grammar and the syntax of C language?
This is a false dichotomy. Functionally the reality is in the middle. They "memorize" training data in the sense that the loss curve is fit to these points but at test time they are asked to interpolate (and extrapolate) to new points. How well they generalize depends on how well an interpolation between training points works. If it reliably works then you could say that interpolation is a good approximation of some grammar rule, say. It's all about the data.
This only applies to intra-distribution "generalisation", which is not the meaning of the term we've come to associate with science. Here generalisation means across all environments (ie., something generalises if its valid and reliable where valid = measures property, and reliable = under causal permutation to the environment).
Since an LLM does not change in response to the change in meaning of terms (eg., consider the change to "the war in ukraine" over the last 10 years) -- it isn't reliable in the scientific sense. Explaining why it isnt valid would take much longer, but its not valid either.
In any case: the notion of 'generalisation' used in ML just means we assume there is a single stationary distribution of words, and we want to randomly sample from that distribution without bias to oversampling from points identical to the data.
Not least that this assumption is false (there is no stationary distribution), it is also irrelevant to generalisation in traditional sense. Since whether we are biased towards the data or not isn't what we're interested in. We want output to be valid (the system to use words to mean what they mean) and to be reliable (to do so across all environments in which they mean something).
This does not follow from, nor is it even related to, this ML sense of generalisation. Indeed, if LLMs generalised in this sense, they would be very bad at usefully generalising -- since the assumptions here are false.
I don't really follow what you're saying here. I understand that the use of language in the real-world world is not sampled from a stationary distribution, but it also seems plausible that you could relax that assumption in an LLM, e.g. conditioning the distribution on time, and then intra-distribution generalization would still make sense to study how well the LLM works for held-out test samples.
Intra-distribution generalization seems like the only rigorously defined kind of generalization we have. Can you provide any references that describe this other kind of generalization? I'd love to learn more.
intra-distribution generalization is also not well posed in practical real world settings. suppose you learn a mapping f : x -> y. casually, intra-distribution generalization implies that f generalizes for "points from the same data distribution p(x)". Two issues here:
1. In practical scenarios, how do you know if x' is really drawn from p(x)? Even if you could compute log p(x') under the true data distribution, you can only verify that the support for x' is non-zero. one sample is not enough to tell you if x' drawn from p(x).
2. In high dimensional settings, x' that is not exactly equal to an example within the training set can have arbitrarily high generalization error. here's a criminally under-cited paper discussing this: https://arxiv.org/abs/1801.02774
Worse even than this: there are no distributions.
What we mean by x ~ p(x), y ~ p(y|x) is not x -> y st. x = f(y)
Reality itself has no probability distributions. Reality follows a causal model, where a causal relation is given in terms of necessity and possibility.
Eg., there is no such thing as Photo ~ P(Photo|PhotoOfCat) to be learned, only (All Causes) -> PhotoOfCat. Thus the setup of ML as y = f(x) is incorrect, there is no `f` which satisfies this formula (in almost all cases).
Consider the LLM case: reality has no P("The War in Ukraine"| TheWarIn2022) -- either the speaker meant TheWarIn2022, or they didnt. There's no sense in which reality has it that the utterance is intrinsically ambiguous (necessarily, for communication to be possible, pragmatics+semantics has to be able to fully resolve meaning).
So what are LLMs learning? Just an implied empirical distribution which is "smoothed over" the data just enough that it "hangs on to it, without repeating it" -- and this is vital, since if it were to try to generalise in the scientific sense, it would cease to be meaningful, since no algorithm which computes P(y|x) in this manner could capture the necessary relata that fully resolves meaning. Any system capable of modelling meaning would be probabilistic only in the sense of having a prior over such causal models: P("TheWarInUkraine"|TheWarIn2022, CausalModel) = 1, but P(CausalModel) < 1
So it's always undefined what it means to "generalise" wrt to an empirical distribution -- there aren't any.
When we say scientific theories generalise, we mean their posited necessary causal relations are maintained across irrelevant interventions. Eg., newton's theory of gravity generalises in that each term (F, M, m, r) is a valid measure of some property, and it remains a valid measure across a very large number of environments.
It fails to generalise for extreme values of M, m, etc.
In the ML sense, all intra-distributional generalisation fails for trivial permutations of any causal property, eg., m+dm -- because this induces an entirely new distribution. The "generalisation error" depends on what m+dm does within our model, but regardless, generalisation fails.
Scientific theories do not fail to generalise in this way, irrelevant causal interventions make no difference to the explanatory adequacy (or predictive power) of the theory.
Thanks for the clarification. I understand much better what you mean by "scientific generalization". I can't tell whether you're suggesting that LLMs are a dead end for modeling meaning or just that LLMs as estimating probability distributions is the wrong way to think about them?
LLMs fail to model meaning, but in doing so, model empirical distributions of meaningful tokens which is more useful, given the method being used.
If you were only modelling conditional probability, trying to model meaning this way, would make your solution worse.
ie., if LLMs really generalised in the ML sense, i.e., unbiasedly randomly sampled from some hypothetical "Meaning Distribution", they'd perform terribly -- since there is no such distribution to choose from.
By hijacking an empirical distribution, and "replaying it back", its actually possible to generate useful output.
Think about it this way, probability distributions are just measures of subjective confidence: each person has their own subjective confidence distribution P("some written words"|WhatTheyMean). If you could actually model this -- which one would you model? If you modelled any of them, you'd not be able to understand a great deal, since each person's confidence is poorly calibrated and missing meanings (eg., "acetylcholine").
So the LLM models some half-baked average of the subjective distributions of all speakers on the internet (/ in the training data) with respect to next word expectations.
This is not what we're modelling when we mean things (eg., when I say, "pass the pen", the cause of my saying it is: 1) need for a pen; 2) you having a pen; etc. -- these reasons are unavailable to the LLM, so it cannot model meaning). But as stated, it would be useless if it actually tried to -- because these methods are incapable of saying, "pass me a pen" and meaning it.
Big thread at the time https://news.ycombinator.com/item?id=34474043
Thanks. Now, after almost two years of incomparably explosive growth in LLMs since that paper, it's remarkable to realize that we still don't know if Scarecrow has a brain. Or if he'll forever remain just a song and dance man.
Thanks! Macroexpanded:
Do Large Language Models learn world models or just surface statistics? - https://news.ycombinator.com/item?id=34474043 - Jan 2023 (174 comments)
Idk from when even id this article? Got me LLMs currently are broke and the majority is already aware of this.
Copilot fails the cleanly refactor complex Java methods in a way that I’m better of writing that stuff by my own as I have to understand it anyways.
And the news that they don’t scale as predicted is too bad compared to how weak they currently perform…
Why does an LLM have to be better than you to be useful to you?
Personally, I use them for the things they can do, and for the things they can't, I just don't, exactly as I would for any other tool.
People assuming they can do more than they are actually capable of is a problem (compounded by our tendency to attribute intelligence to entities with eloquent language, which might be more of a surface level thing than we used to believe), but that's literally been one for as long as we had proverbial hammers and nails.
> Why does an LLM have to be better than you to be useful to you?
If
((time to craft the prompt) + (time required to fix LLM output)) ~ (time to achieve the task on my own)
it's not hard to see that working on my own is a very attractive proposition. It drives down complexity, does not require me to acquire new skills (i.e., prompt engineering), does not require me to provide data to a third party nor to set up an expensive rig to run a model locally, etc.
Then they might indeed not be the right tool for what you're trying to do.
I'm just a little bit tired of sweeping generalizations like "LLMs are completely broken". You can easily use them as a tool part of a process that then ends up being broken (because it's the wrong tool!), yet that doesn't disqualify them for all tool use.
Yeah but the sampling process required to determine what they are good at is not free either. (For starters, it's consuming huge amounts of public research funding and compute, but let's not go down that rabbit-hole)
If you can't find a use for the best LLMs it is 100% a skill issue. IF the only way you can think to use them is re-factoring complex java codebases you're ngmi.
So far I haven't found one that does my dishes and laundry. I really wish I knew how to properly use them.
My point being: Why would anyone have to find a use for a new tool? Why wouldn't "it doesn't help me with what I'm trying to do" be an acceptable answer in many cases?
I have found more often than not that people in the "LLMs are useless" camp are actually in the "I need LLMs to be useless" camp.
Well I found the exact reverse: people saying LLM are useful actually need them to be useful, so they could boast about their "prompt engineering" "skill" (i.e. typing a sentence) and "AI knowledge". I've seen a caricature of this a few hours ago on LinkedIn, from a "data guy" saying devs not using AI are gonna replaced by those who do. Yet it was very clear from his reply to comments he never wrote code and wasn't a position to give an opinion on the matter, especially one like the extreme and rude one he wrote.
Both your and GPs observations (and many more) can be true simultaneously.
Some people are quick to dismiss any new technology as useless; others are quick to hail it as the thing that will take everyone's jobs in just a few months (and might consider that a good or bad thing) or solve any number of humanity's hard problems.
Usually one or the other will be seen as slightly more accurate in retrospect, but since both ultimately come from a knee-jerk reaction to something new, with rationalizations bolted on to support their respective case, most of these can be safely ignored.
Remove all the parroting of technical jargon, wishful thinking, appeals to morality etc. and you essentially have two crowds arguing why this time the roulette ball will surely fall on red/black (and everybody forgetting about the zero/green).
Do not forget the very linear reality of those people that shout "The car does not work!" in frustration because they would gladly use a car.
nice example of poisoning the well!
It turns out our word for "surface statistics" is "world model".
World model based interfaces have an internal representation and when asked, describe its details.
Surface statistics based interfaces have an internal database of what is expected, and when asked, they give a conformist output.
The point is that "internal database of statistical correlations" is a world model of sorts. We all have an internal representation of the world featuring only probabilistic accuracy after all. I don't think the distinction is as clear as you want it to be.
> "internal database of statistical correlations" [would be] a world model of sorts
Not in the sense used in the article: «memorizing “surface statistics”, i.e., a long list of correlations that do not reflect a causal model of the process generating the sequence».
A very basic example: when asked "two plus two", would the interface reply "four" because it memorized a correlation of the two ideas, or because it counted at some point (many points in its development) and in that way assessed reality? That is a dramatic difference.
> and when asked, describe its details.
so humans don't typically have world models then. you ask most people how they arrived at their conclusions (outside of very technical fields) and they will confabulate just like an LLM.
the best example is phenomenology, where people will grant themselves skills that they don't have, to reach conclusions. see also heterophenomenology, aimed at working around that: https://en.wikipedia.org/wiki/Heterophenomenology
That the descriptive is not the prescriptive should not be a surprise.
That random people will largely have suboptimal skills should not be a surprise.
Yes, many people can't think properly. Proper thinking remains there as a potential.
> Yes, many people can't think properly. Proper thinking remains there as a potential.
that's a matter of faith, not evidence. by that reasoning, the same can be said about LLMs. after all, they do occasionally get it right.
Let me rephrase it, there could be a misunderstanding: "Surely many people cannot think properly but some have much more ability than others: the proficient ability to think well is a potential (expressed in some and not expressed in many".
To transpose that to LLMs, you should present one that systematically gets it right, not occasionally.
And anyway, the point was about two different processes before statement formulation: some output the strongest correlated idea ("2+2" → "4"); some look at the internal model and check its contents ("2, 2" → "1 and 1, 1 and 1: 4").
> one that systematically gets it right, not occasionally.
could Einstein systematically get new symphonies right? could Feynman create tasty new dishes every single time? Could ......
> could Einstein systematically
Did (could) Einstein think about things long and hard? Yes - that is how he explained having solved problems ("How did you do it?" // "I thought about long and hard").
The artificial system in question should (1) be able to do it, and (2) do it systematically, because it is artificial.
Well, for some sufficiently platonic definition of "world".
In a way the opposite, I'd say: the archetypes in Plato are the most stable reality and are akin to the logos that the past and future tradition hunted - knowing it is to know how things are (how things work), hence knowledge of the state of things, hence a faithful world model.
To utter conformist statements spawned from surface statistics would be "doxa" - repeating "opinions".
It has a profound and extensive knowledge about something. But that "something" is how words follow each other on popular media.
LLMs are very firmly stuck inside the Cave Allegory.
If you mean that just like the experiencer in the cave, seeing shadows instead of things (really, things instead of Ideas), the machine sees words instead of things, that would be in a way very right.
But we could argue it could not be impossible to create an ontology (a very descriptive ontology - "this is said to be that, and that, and that...") from language alone. Hence the question whether the ontology is there. (Actually, the question at this stage remains: "How do they work - in sufficient detail? Why the appearance of some understanding?")
Yeah, what I'm saying is that something very similar to an ontology is there. (It's incomplete but extensive, not coherent, and it's deeper in details than anything anybody ever created.)
It's just that it's a kind of a useless ontology, because the reality it's describing is language. Well, only "kind of useless" because it should be very useful to parse, synthesize and transform language. But it doesn't have the kind of "knowledge" that most people expect an intelligence to have.
Also, its world isn't only composed of words. All of them got a very strong "Am I fooling somebody?" signal during training.
Honestly, I think it’s somewhere in between. LLMs are great at spotting patterns in data and using that to make predictions, so you could say they build a sort of "world model" for the data they see. But it’s not the same as truly understanding or reasoning about the world, it’s more like theyre really good at connecting the dots we give them.
They dont do science or causality theyre just working with the shadows on the wall, not the actual objects casting them. So yeah, they’re impressive, but let’s not overhype what they’re doing. It’s pattern matching at scale, not magic. Correct me if I am wrong.
They are learning a grammar, finding structure in the text. In the case of Othello, the rules for what moves are valid are quite simple, and can be represented in a very small model. The slogan is "a minute to learn, a lifetime to master". So "what is a legal move" is a much simpler problem than "what is a winning strategy".
It's similar to asking a model to only produce outputs corresponding to a regular expression, given a very large number of inputs that match that regular expression. The RE is the most compact representation that matches them all and it can figure this out.
But we aren't building a "world model", we're building a model of the training data. In artificial problems with simple rules, the model might be essentially perfect, never producing an invalid Othello move, because the problem is so limited.
I'd be cautious about generalizing from this work to a more open-ended situation.
I don't think the point is that Othello-GPT has somehow modellled the real world training on only games but that tasking it to predict the next move forces it to model its data in a deep way. There's nothing special about Othello games vs internet text except that the latter will force it to model much more things.
I’m reminded of the Holographic Principle in physics: https://en.m.wikipedia.org/wiki/Holographic_principle
Sometimes a sufficiently good model of a surface is completely identical to a model of the volume.
Lots of problems with this paper including the fact that, even if you accept their claim that internal board state is equivalent to world model, they don't appear to do the obvious thing which is display the reconstructed "internal" board state. More fundamentally though, reifying the internal board as a "world model" is absurd: otherwise a (trivial) autoencoder would also be building a "world model".
>More fundamentally though, reifying the internal board as a "world model" is absurd: otherwise a (trivial) autoencoder would also be building a "world model".
The point is that they aren't directly training the model to output the grid state, like you would an autoencoder. It's trained to predict the next action and learning the state of the 'world' happens incidentally.
It's like how LLMs learn to build world models without directly being trained to do so, just in order to predict the next token.
By the same reasoning if you train a neural net to output next action from the output of the autoencoder then the whole system also has a "world model", but if you accept that definition of "world model" then it is extremely weak and not the intelligence-like capability that is being implied.
And as I said in my original comment they are probably not even able to extract the board state very well, otherwise they would depict some kind of direct representation of the state, not all of the other figures of board move causality etc.
Note also that the board state is not directly encoded in the neural network: they train another neural network to find weights to approximate the board state if given the internal weights of the Othello network. It's a bit of fishing for the answer you want.
> And as I said in my original comment they are probably not even able to extract the board state very well,
They do measure and report on this, both in summary in the blog post and in more detail in the paper.
> otherwise they would depict some kind of direct representation of the state
If you can perfectly accurately extract the state the result would be pretty boring to show right? It'd just be a picture of a board state and next to it the same board state with "these are the same".
> Note also that the board state is not directly encoded in the neural network: they train another neural network to find weights to approximate the board state if given the internal weights of the Othello network.
If you can extract them, they are encoded in the activations. That's pretty much by definition surely.
> It's a bit of fishing for the answer you want.
How so?
Given a sequence of moves, they can accurately identify which state most of the positions of the board are in just by looking at the network. In order for that to work, the network must be turning a sequence of moves into some representation of a current board state. Assume for the moment they can accurately identify them do you agree with that conclusion?
> They do measure and report on this, both in summary in the blog post and in more detail in the paper.
I didn't see this in the blog post, where is it? Presumably they omitted it from the blog post because the results are bad as I describe below, which is precisely why I cited it as a red flag.
> If you can perfectly accurately extract the state the result would be pretty boring to show right? It'd just be a picture of a board state and next to it the same board state with "these are the same".
But they don't! They have nearly 10% tile level error on the human-data-trained model. That's nearly 100% board error. It's difficult to understand how bad this is but if you visualize it somehow (for example sampling random boards) it becomes obvious that it is really bad. On average about 6 or 7 tiles are going to be wrong. With nearly 100% probability you get an incorrect board.
> If you can extract them, they are encoded in the activations. That's pretty much by definition surely.
No, that's silly. For example you can cycle through every algorithm/transformation imaginable until you hit one that extracts the satanic verses of the Bible. As I said in another comment although it is in theory mitigated somewhat by doing test/validation splits in practice you keep trying different neural network hyperparameters to finesse your validation performance.
> How so? Given a sequence of moves, they can accurately identify which state most of the positions of the board are in just by looking at the network. In order for that to work, the network must be turning a sequence of moves into some representation of a current board state. Assume for the moment they can accurately identify them do you agree with that conclusion?
What conclusion? I believe you can probably train a neural network to take in board moves and output board state with some level of board error. So what?
>It's like how LLMs learn to build world models without directly being trained to do so, just in order to predict the next token
That's the whole point under contention, but you're stating it as fact.
> hey don't appear to do the obvious thing which is display the reconstructed "internal" board state.
I've very confused by this, because they do. Then they manipulate the internal board state and see what move it makes. That's the entire point of the paper. Figure 4 is literally displaying the reconstructed board state.
I replied to a similar comment elsewhere: They aren't comparing the reconstructed board state with the actual board state which is the obvious thing to do.
>they don't appear to do the obvious thing which is display the reconstructed "internal" board state
This is literally figure 4
This also re-constructs the board state of a chess-playing LLM
https://adamkarvonen.github.io/machine_learning/2024/01/03/c...
Unless I'm misunderstanding something they are not comparing the reconstructed board state to the actual state which is the straightforward thing you would show. Instead they are manipulating the internal state to show that it yields a different next-action, which is a bizarre, indirect way to show what could be shown in the obvious direct way.
Figure 4 is showing both things. Yes, there is manipulation of the state but they also clearly show what the predicted board state is before any manipulations (alongside the actual board state)
The point is not to show only a single example it is to show how well the recovered internal state reflects the actual state in general —— analyze the performance (this is particularly tricky due to the discrete nature of board positions). That’s ignoring all the other more serious issues I raised.
I haven’t read the paper in some time so it’s possible I’m forgetting something but I don’t think so.
>That’s ignoring all the other more serious issues I raised.
The only other issue you raised doesn't make any sense. A world model is a representation/model of your environment you use for predictions. Yes, an auto-encoder learns to model that data to some degree. To what degree is not well known. If we found out that it learned things like 'city x in country a is approximately distance b from city y' let's just learn where y is and unpack everything else when the need arises then that would certainly qualify as a world model.
Linear regression also learns to model data to some degree. Using the term “world model” that expansively is intentionally misleading.
Besides that and the big red flag of not directly analyzing the performance of the predicted board state I also said training a neural network to return a specific result is fishy, but that is a more minor point than the other two.
The degree matters. If we find auto encoders learning surprisingly deep models then i have no problems saying they have a world model. It's not the gotcha you think it is.
>the big red flag of not directly analyzing the performance of the predicted board state I also said training a neural network to return a specific result is fishy
The idea that probes are some red flag is ridiculous. There are some things to take into account but statistics is not magic. There's nothing fishy about training probes to inspect a models internals. If the internals don't represent the state of the board then the probe won't be able to learn to reconstruct the state of the board. The probe only has access to internals. You can't squeeze blood out of a rock.
I don’t know what makes a “surprisingly deep model” but I specifically chose autoencoders to show that simply encoding the state internally can be trivial and therefore makes that definition of “world model” vacuous. If you want to add additional stipulations or some measure of degree you have to make an argument for that.
In this case specifically “the degree” is pretty low since predicting moves is very close to predicting board state (because for one you have to assign zero probability to moves to occupied positions). That’s even if you accept that world models are just states, which as mtburgess explained is not reasonable.
Further if you read what I wrote I didn’t say internal probes are a big red flag (I explicitly called it the minor problem). I said not directly evaluating how well the putative internal state matches the actual state is. And you can “squeeze blood out of a rock”: it’s the multiple comparison problem and it happens in science all the time and it is what you are doing by training a neural network and fishing for the answer you want to see. This is a very basic problem in statistics and has nothing to do with “magic”. But again all this is the minor problem.
>In this case specifically “the degree” is pretty low since predicting moves is very close to predicting board state (because for one you have to assign zero probability to moves to occupied positions).
The depth/degree or whatever is not about what is close to the problem space. The blog above spells out the distinction between a 'world model' and 'surface statistics'. The point is that Othello GPT is not in fact playing Othello by 'memorizing a long list of correlations' but by modelling the rules and states of Othello and using that model to make a good prediction of the next move.
>I said not directly evaluating how well the putative internal state matches the actual state is.
This is evaluated in the actual paper with the error rates using the linear and non linear probes. It's not a red flag that a precursor blog wouldn't have such things.
>And you can “squeeze blood out of a rock”: it’s the multiple comparison problem and it happens in science all the time and it is what you are doing by training a neural network and fishing for the answer you want to see.
The multiple comparison problem is only a problem when you're trying to run multiple tests on the same sample. Obviously don't test your probe on states you fed it during training and you're good.
> The point is that Othello GPT is not in fact playing Othello by 'memorizing a long list of correlations' but by modelling the rules and states of Othello and using that model to make a good prediction of the next move.
I don't know how you rule out "memorizing a long list of correlations" from the results. The big discrepancy in performance between their synthetic/random-data training and human-data training suggests to me the opposite: random board states are more statistically nice/uniform and suggests that these are in fact correlations not state computations.
> This is evaluated in the actual paper with the error rates using the linear and non linear probes. It's not a red flag that a precursor blog wouldn't have such things.
It's the main claim/result! Presumably the reason it is omitted from the blog is that the results are not good: nearly 10% error per tile. Othello boards are 64 tiles so the board level error rate (assuming independent errors) is 99.88%.
> The multiple comparison problem is only a problem when you're trying to run multiple tests on the same sample. Obviously don't test your probe on states you fed it during training and you're good.
In practice what is done is you keep re-running your test/validation loop with different hyperparameters until the validation result looks good. That's "running multiple tests on the same sample".
[dead]
I think they learn how to become salespeople, politicians, lawyers, and résumé consultants with fanciful language lacking in facts, truth, and honesty.
If we can put salespeople out of work it will be a great boon to humankind
I suddenly have a vision of an AI driven sales pipeline that uses millions of invasive datapoints about you to create the most convincing sales pitch mathematically possible.
[flagged]
This is irrelevant, and it's very frustrating that computer scientists think it is relevant.
If you give a universal function approximator the task of approximating an abstract function, you will get an approximation.
Eg.,
This is irrelevant: games, rules, shapes, etc. are all abstract. So any model of samples of these is a model of them.The "world model" in question is a model of the world. Here "data" is not computer science data, ie., numbers its measurements of the world, ie., the state of a measuring device causally induced by the target of measurement.
Here there is no "world" in the data, you have to make strong causal assumptions about what properties of the target cause the measures. This is not in the data. There is no "world model" in measurement data. Hence the entirety of experimental science.
No result based on one mathematical function succeeding in approximating another is relevant whether measurement data "contains" a theory of the world which generates it: it does not. And of course if your data is abstract, and hence constitutes the target of modelling (only applies to pure math), then there is no gap -- a model of "measures" (ie., the points on a circle) is the target.
No model of actual measurement data, ie., no model in the whole family we call "machine learning", is a model of its generating process. It contains no "world model".
Photographs of the night sky are compatible with all theories of the solar system in human history (including, eg., stars are angels). There is no summary of these photographs which gives information about the world over and above just summarising patterns in the night sky.
The sense in which any model of measurement data is "surface statistics" is the same. Consider plato's cave: pots, swords, etc. on the outside project shadows inside. Modelling the measurement data is taking cardboard and cutting it out so it matches the shadows. Modelling the world means creating clay pots to match the ones passing by.
The latter is science: you build models of the world and compare them to data, using the data to decide between them.
The former is engineering (, pseudoscience): you take models of measures and reply these models to "predict" the next shadow.
If you claim the latter is just a "surface shortcut" you're an engineer. If you claim its a world model you're a pseudoscientist.
> There is no summary of these photographs which gives information about the world over and above just summarising patterns in the night sky.
You're stating this as fact but it seems to be the very hypothesis the authors (and related papers) are exploring. To my mind, the OthelloGPT papers are plainly evidence against what you've written - summarising patterns in the sky really does seem to give you information about the world above and beyond the patterns themselves.
(to a scientist this is obvious, no? the precession of mercury, a pattern observable in these photographs, was famously not compatible with known theories until fairly recently)
> Modelling the measurement data is taking cardboard and cutting it out so it matches the shadows. Modelling the world means creating clay pots to match the ones passing by.
I think these are matters of degree. The former is simply a worse model than the latter of the "reality" in this case. Note that our human impressions of what a pot "is" are shadows too, on a higher-dimensional stage, and from a deeper viewpoint any pot we build to "match" reality will likely be just as flawed. Turtles all the way down.
Well it doesnt, seem my other comment below.
It is exactly this non-sequitur which I'm pointing out.
Approximating an abstract discrete function (a game), with a function approximator has literally nothing to do with whether you can infer the causal properties of the data generating process from measurement data.
To equate the two is just rank pseudoscience. The world is not made of measurements. Summaries of measurement data aren't properties in the world, they're just the state of the measuring device.
If you sample all game states from a game, you define the game. This is the nature of abstract mathematical objects, they are defined by their "data".
Actual physical objects are not defined by how we measure them: the solar system isnt made of photographs. This is astrology: to attribute to the patterns of light hitting the eye some actual physical property in the universe which corresponds to those patterns. No such exists.
It is impossible, and always has been, to treat patterns in measurements as properties of objects. This is maybe one of the most prominent characteristics of psedusocience.
The point is that approximating a distribution causally downstream of the game (text-based descriptions, in this case) produces a predictive model of the underlying game mechanics itself. That is fascinating!
Yes, the one is formally derivable from the other, but the reduction costs compute, and to a fixed epsilon of accuracy this is the situation with everything we interact with on the day to day.
The idea that you can learn underlying mechanics from observation and refutation is central to formal models of inductive reasoning like Solomonoff induction (and idealised reaoners like AIXI, if you want the AI spin). At best this is well established scientific method, at worst a pretty decent epistemology.
Talking about sampling all of the game states is irrelevant here; that wouldn't be possible even in principle for many games and in this case they certainly didn't train the LLM on every possible Othello position.
> This is astrology: to attribute to the patterns of light hitting the eye some actual physical property in the universe which corresponds to those patterns. No such exists.
Of course not - but they are highly correlated in functional human beings. What do you think our perception of the world grounds out in, if not something like the discrepancies between (our brain's) observed data and it's predictions? There's even evidence in neuroscience that this is literally what certain neuronal circuits in the cortex are doing (the hypothesis being that so-called "predictive processing" is more energy efficient than alternative architectures).
Patterns in measurements absolutely reflect properties of the objects being measured, for the simple reason that the measurements are causally linked to the object itself in controlled ways. To think otherwise is frankly insane - this is why we call them measurements, and not noise.
I think this is a great explanation.
The "Ladder of Causation" proposed by Judea Pearl covers similar ground - "Rung 1” reasoning is the purely predictive work of ML models, "Rung 2" is the interactive optimization of reinforcement learning, and "Rung 3" is the counterfactual and casual reasoning / DGP construction and work of science. LLMs can parrot Rung 3 understanding from ingested texts but it can't generate it.
> Here there is no "world" in the data, you have to make strong causal assumptions about what properties of the target cause the measures. This is not in the data. There is no "world model" in measurement data.
That's wrong. Whatever your measuring device, it is fundamentally a projection of some underlying reality, eg. a function m in m(r(x)) mapping real values to real values, where r is the function governing reality.
As you've acknowledged that neural networks can learn functions, the neural network here is learning m(r(x)). Clearly the world is in the model here, and if m is invertible, then we can directly extract r.
Yes, the domain of x and range of m(r(x)) is limited, so the inference will be limited for any given dataset, but it's wrong to say the world is not there at all.
In the limited sense in which the world is recoverable from measures of it requires a model of how it was generated.
For animals, we are born with primitive causal models of our bodies we can recurse on to build models of the world in this sense. So as toddlers we learn perception by having an internal 3d model of our bodies -- so we can ascribe distances to our optical measures.
Without such assumptions there really isnt any world at all in this data. A grid of pixel patterns has no meaning as a grid of numbers. NNs are just mapping this grid to a "summary space" under supervision of how to place the points. This supervision enables a useful encoding of the data, but does not provide the kind of assumptions needed to work backwards to properties of its generation.
In the case of photos, there is no such `m` -- the state of a sensor is not uniquley caused by any catness or dogness properties. Almost no photographs acquire their state from a function X -> Y, because the sensor state is "radically uncontrolled" in a causal sense. Thus the common premise of ML, that y = f(x) is false from the start -- the relevant causal graph has a near infinite number of causes that are unspecified, so f does not exist.
> For animals, we are born with primitive causal models of our bodies we can recurse on to build models of the world in this sense. So as toddlers we learn perception by having an internal 3d model of our bodies -- so we can ascribe distances to our optical measures.
If you agree that we evolved such an internal model from some base system that had no such abilities or assumptions, then it follows that evolving programs to do such things is also possible. If you think the base system itself already had such abilities, then you have a bootstrapping problem.
> Almost no photographs acquire their state from a function X -> Y, because the sensor state is "radically uncontrolled" in a causal sense.
This seems like a bizarre claim to me. A camera is clearly capturing a 2D projection of a 3D world. We can also construct 3D structures from a set of 2D pictures, as we do this all of the time with photogrammetry.
It's also clear that we can infer "catness" and "dogness" of subjects despite the fact that these are not strictly defined categories, because we literally did this in our pre-genotyping scientific taxonomies which organized the animal kingdom based on morphology, and this was pretty accurate. It then follows that with enough 2D pictures, one could create a taxonomy of objects based on reconstructed 3D morphology with a high degree of statistical accuracy.
I think this demonstrates that your premise is incorrect that there is no "world" that can be modelled only from pictures. Current AI systems are probably only doing a subset of this, finding statistical correlations among morphological features, but that's still a world model IMO, if a bit anemic.
This is obviously false: consider a (cryptographic) pseudorandom number generator.
Trivial, m is not invertible in that case. By contrast, measuring devices need to be invertible within some domain, otherwise they're not actually measuring, and we wouldn't use them.
You defined "m" as the measuring function which is not the pseudo-random number generator itself. I guess I don't understand your definitions.
In any case it's pretty obvious you have have deterministic chaotic output from which you cannot practically (or even in theory) recover the internal workings of the system that generated them. Take just a regular pseudorandom number generator or a cellular automata.
> In any case it's pretty obvious you have have deterministic chaotic output from which you cannot practically (or even in theory)
Solomonoff induction says otherwise. Of course it might take a stupendously large number of samples, but as the number of samples goes to infinity, the probability of reproducing the PRNG goes to 1.
I don't understand your objection at all.
In the example, the 'world' is the grid state. Obviously that's much simpler than the real world but the point is to show that even when the model is not directly trained to input/output this world state it is still learned as a side effect of prediction the next token.
There is no world. The grid state is not a world, there is no causal relationship between the grid state and the board. No one in this debate denies that NNs approximate functions. Since a game is just a discrete function, no one denies an NN can approximate it. Showing this is entirely irrelevant and shows a profound misunderstanding of what's at issue.
The whole debate is about whether surface patterns in measurement data can be reversed by NNs to describe their generating process, ie., the world. If the "data" isnt actual measurements of the world, no one arguing about it.
If there is no gap between the generating algorithm and the samples, eg., between a "circle" and "the points on a circle" -- then there is no "world model" to learn. The world is the data. To learn "the points on a cirlce" is to learn the cirlce.
By taking cases where "the world" and "the data" are the same object (in the limit of all samples), you're just showing that NNs model data. That's already obvious, no ones arguing about it.
That a NN can approximate a discrete function does not mean it can do science.
The whole issue is that the cause of pixel distributions is not in those distributions. A model of pixel patterns is just a model of pixel patterns, not of the objects which cause those patterns. A TV is not made out of pixels.
The "debate" insofar as there is one, is just some researchers being profoundly confused about what measurement data is: measurements are not their targets, and so no model of data is a model of the target. A model of data is just "surface statistics" in the sense that these statistics describe measurements, not what caused them.
> Photographs of the night sky are compatible with all theories of the solar system in human history (including, eg., stars are angels). There is no summary of these photographs which gives information about the world over and above just summarising patterns in the night sky.
This is blatantly incorrect. Keep in mind that much of modern physics has been invented via observation. Kepler's law and ultimately the law of Gravitation and General Relativity came from these "photographs" of the night sky.
If you are talking about the fact that these theories only ever summarize what we see and maybe there's something else behind the scenes that's going on, then this becomes a different discussion.