When I use the "think" mode it retains context for longer. I tested with 5k lines of c compiler code and I could 6 prompts in before it started forgetting or generalizing
I'll say that grok is really excellent at helping my understand the codebase, but some miss-named functions or variables will trip it up..
it also doesn't help that many of these companies tend to either limit the context of the chat to the 10 most recent messages (5 back and forths), or rewrite the history summarized in a few sentences. Both ways lose a ton of information, but you can avoid that behaviour by going through the APIs. Especially Azure OpenAI et... on the web is useless, but it's quite capable through custom APs
I think Gemini is just the only one that by default keeps the entire history verbatim.
I use Grok for similar tasks and usually prefer Grok's explanations. Easier to understand.
For some problems where I've asked Grok to use formal logical reasoning I have seen Grok outperform both Gemini 2.5 Pro and ChatGPT-o3. It is well trained on logic.
I've seen Grok generate more detailed and accurate descriptions of images that I uploaded. Grok is natively multimodal.
There is no single LLM that outperforms all of the others at all tasks. I've seen all of the frontier models strongly outperform each other at specific tasks. If I was forced to use only one, that would be Gemini 2.5 Pro (for now) because it can process a million tokens and generate much longer output than the others.
There's definitely _something_ there, but, as with all philosophies, the internet has taken it and run with it to a fairly absurd degree, to the point where, for many adherents, it's basically a religion.
It's not. Feeding kids, researching vaccines and a bunch of other things that billionaires are funding should not depend on the graces and whims of billionaires, it should be something provided for by the government.
HN crowd is ... mixed, it's perhaps the one last true melting pot we have on the Internet. A curse and a blessing, if you ask me.
You got truly anything here. Europeans that in general tend to lean more towards "democratic socialism" and its various offshoots, American libertarians (which have a large intersection with Musk fanboys), a bunch of extremely rich startup founders, American progressives, conservatives of all kinds, Zionists and Hamas apologists, probably Russian and Chinese psy-ops, accelerationists, preppers... name any ideology and you'll find supporters on HN.
What has changed a bit is that tribalism seems to have taken over from civilized or at least arguments and fact oriented discourse. Personally, I'd prefer if downvotes and especially flags would require one to give a reason so that repeat offenders that just flag and downvote everything they disagree with can get suspended for ruining discussion.
Interesting how you put "hamas apologists" and not pro-palestinians next to Zionists. How would you have felt if it was written "pro-palestinains and genocide-apologists"?
I have yet to meet any "pro-palestinian" that doesn't devolve into "rape is resistance", "from the river to the sea", "yallah yallah intifada" or other justifications of Oct 7th being a "legitimate act of resistance" in a matter of minutes.
In contrast, all Zionists I know utterly despise Netanyahu and his far-right government.
Do you know what a "bubble" is? In fact, do you actually know any pro-palestinain people or do you get media that tells you about them? These are not the same thing. Very neat that you included "from the river to the sea" as right alongside rape. Very telling.
PS you can find street interviews of random isreali's where they will straight up tell you they wish all palestinians were killed with very little prompting. But I guess they just don't count huh?
You never know when it will start spouting it either. That kind of uncertainty in the responses landing in your interface is just not sustainable. Your money is coming from the quality of the content your system is putting out. If it's being used for dentistry, and it randomly spits out white supremacist content, dentists will look for a system that won't do that. Because they asked about, say, intaglio surfaces for a wearable dental appliance. Not a treatise on white genocide.
At this point, to use Grok, you'd be intentionally setting your startup to detonate itself at some random point in the future. That's just not how you make money.
So..
If the 'source' of data is 9gag, 4chan, you will get 'this' material.
If you feed it Tumlr, you will get Harry Potter and rope-porn-thingies.
If you feed it Hitler's speeches, you will get 'that' material.
If you feed it algebra, you will get 'that' material.
Then..
Do we want 'open' or 'curated' LLMs?
And how far from reality are the curated LLMs?
And how far can curated LLMs take us (black Nazis? female US founding fathers?).
Pick your poison I say.. and be careful what you wish for. There is no "perfect" LLM because there is no "perfect" dataset, and Sam-Altman-types-of-humans are definitely deeply flawed. But life is flawed, so our tools are/will be flawed.
The problem was not the source of the training data. xAI confirmed that the system prompt had been modified to make grok talk about South African white genocide.
While they didn’t say who modified it. It’s hard to believe it wasn’t Elon.
> While they didn’t say who modified it. It’s hard to believe it wasn’t Elon.
Is it really that hard to understand how these things happen?
The boss says "remove bias" but the peons don't really know how to do that and the naive approach to unbiasing a thing is to introduce bias in the other direction. And then if you're Google and the boss thinks it has a right-wing bias you crook it and get black Nazis and if you're xAI and the boss thinks it has a left-wing bias you get white genocide.
In both cases the actual problem is when people think bias operates like an arithmetic sum, because it doesn't.
That's precisely how the arithmetic theory of bias operates. That bias doesn't actually work that way is why applying it causes such ridiculous outcomes.
Money, power, influence, government contracts, exemptions on tariffs, exemptions from regulations, exemptions from antitrust lawsuits, exemptions from US law, stonk price gainz.
"Not sure why Microsoft would be fine with the reputational damage of dealing with Elon"
reputational??? Elon literally buddy2 with POTUS, I know MS is a big and influential but even for them, don't want cause fuss for people in Gov (or their friends)
Teslas have always been objectively bad cars. Inconsistent panel gaps, bad paint from the factory, poor build quality, etc. The car folks have always known this. It’s taken elons politics for tech folks to realize.
I’ve always disliked their lack of physical buttons and general interior aesthetic.
However, a buddy of mine got T-boned in one by a distracted driver running a light at high speed, and he walked away fine. The car was completely mangled except for passenger space where it held. I’ve not called it a bad car after seeing photos from that.
Correct. The only reason Tesla even stood a chance is they had close to zero competition.
As soon as the car companies who, you know, know how to make cars starting dipping their toes in, it was over. It takes time for inertia to be overcome but it will, and once that inflection point is reached there's nothing anyone can do.
Tesla could have prevented this by being proactive and chasing new designs and new interiors before they felt any pressure to. But like all American companies, once they have even a hint of market success, they give up. They just keep doing whatever they're doing because clearly it's working.
Until one day you look around and your competition is 10 years ahead of you and you've been sitting with your thumb up your ass. Oops. Better catch up right now. Except you can't, so you rush it, and then your quality and delivery suffers even more, so the gap only widens because while you're playing catch-up your competitors just keep marching forward.
We saw it with GM, we saw it with Ford, and now we're seeing it with Tesla. Is this unavoidable?
i'd guess that it has more to do with the fact that people keep vandalising them rather than individuals suddenly picking buying teslas as the one thing to take a stand against when this never seems to happen in any effective capacity for other issues.
It's not good for the job because better tools exist. Yeah, I'm not keen on giving money to billionaires for subpar products, that's why I don't drive a Tesla.
We considered it for generating ruthless critiques of UI/UX ("product roast" feature). Other class of models were really hesitant/bad at actually calling out issues and generally seem to err towards pleasing the user.
Here's a simple example I tried just now. Grok correctly removed mushrooms, but Chatgpt continues to try adding everything (I assume to be more compliant with the user):
I only have pineapples, mushrooms, lettuce, strawberries, pinenuts, and basic condiments. What salad can I make that's yummy?
And its fairly constructive, at least when I tried in Gemini 2.5 awhile back. Like yes its caustic (fantastic word) but it does so in a way thats constructive in its counterargument to reach a better outcome.
I haven't seen a model since the 3.5 Turbo days that can't be ruthless if asked to be. And Grok is about as helpful as any other model despite Elon's claims.
Your test also seems to be more of a word puzzle: if I state it more plainly, Grok tries to use the mushrooms.
I think you’re wrong. That sounds tasty to me. I think you need to input your own palette to the model.
Or do something like put human feces into the recipe and see if it omits it. That seems like something that would be disliked universally.
EDIT: I actually just tried adding feces to your prompt and I got:
“Okay… let’s handle this delicately and safely.
First, do not use human feces in any recipe. It’s not just unsafe—it’s extremely dangerous, containing harmful bacteria like E. coli, Salmonella, and parasites that can cause serious illness or death. So, rule that out completely.
Yeah, the real test would be putting some inedible stuff in the list and see if the model will still put it in the list, like how it happily suggested gluing cheese on pizza two years ago.
When Grok 3 was released, it was genuinely one of the very best for coding. Now that we have Gemini 2.5 pro, o4-mini, and Claude 3.7 thinking, it's no longer the best for most coding. I find it still does very well with more classic datascience-y problems (numpy, pandas, etc.).
Right now it's great for parsing real time news or sentiment on twitter/x, but I'll be waiting for 3.5 before I setup the api.
If you’re Microsoft you may just want to give customers a choice. You may also want to have a 2nd source and drive performance, cost, etc… just like any other product.
Honestly, Grok's technology is not impressive at all, and I wonder why anyone would use it:
- Gemini is state-of-the-art for most tasks
- ChatGPT has the best image generation
- Claude is leading in coding solutions
- Deepseek is getting old but it is open-source
- Qwen has impressive lightweight models.
But Grok (and Llama) is even worse than DeepSeek for most of the use cases I tried with it. The only thing it has going for is money behind its infamous founders. Other than that, their existence would be barely acknowledged.
I like it! For me it has replaced Sonnet (3.5 at the time, but 3.7 doesn't seem better to me, from my brief tests) for general web usage -- fast, the ability to query x nee twitter is very nice, & I find the code it produces tends to be a bit better than Sonnet. (Though perhaps that depends a lot on the domain...I'm doing mostly C# in Unity.)
For tough queries o3 is unmatched in my experience.
Llama is arguably the reason open weight LLM’s are a thing, with the leak of Llama 1 and subsequent release of Llama 2. Llama 3 was a huge push for quality, size, context length, and multi-modality. Llama 4 Maverick is clearly better than it looks if a fine tune can put it at the top of LMArena human preferences leaderboard.
Grok 3 mini is quite a decent agentic model and competitive with frontier models at a fraction of the cost; see livebench.ai.
Although Deepseek is old, I find the V3 (without reason) still to be the best non reasoning model out there.
Now, ChatGPT main advantage for me right now it's search + o4-mini. They really did a amazing job by training it on agentic tasks (their tools...) and the search with reasoning works amazing.
Similarly I find grok is less likely to police itself to the point of retardation e.g. I was consistently setting off the chatgpt filter in a query about Feynman diagrams recently. Why?
At least two times they had unauthorized changes to their prompts to inject far right content that showed up on random content. imagine you're using it for a chat bot and it starts spouting off white nationalist content like "great replacement" theory.
What was the other time? The incident linked at the bottom of that article ("into trouble last year") wasn't an "unauthorized change", as far as I'm aware; it was a general lack of guardrails on image generation.
While I'm sure the same rogue "employee" was responsible for both, they are separate incidents. Musk's AI service was pushing "white genocide" lies as answers to unrelated prompts. It was only spouting holocaust denial lies when asked directly.
Before the release of Gemini 2.5 Grok 3 was the best coding AI IME, especially when you used reasoning. It also complained the least about things you asked it to do. Gemini for instance still won’t tell you how to use yt-dlp.
I’ve found 3.7 to be garbage. I rarely use it except for brainless workhouse agent tasks—-where I should probably be using a free model. It really mangles code if you let it do anything slightly complicated.
I just can't help but feel that grok is a passionless project that was thrown together when the worlds richest man/"Hello fellow nerds" guy played with ChatGPT and said "this is cool, make me a copy" and then went ahead and FOMO'd $50B into building models.
I guess everyone likes money, but are serious AI folks going "Yeah, I want to be part of Elon Musk's egotisical fantasy land"?
The desire to be "centrist" on HN is perplexing to me.
The fact that Elon, a white south african, made his AI go crazy by adding some text about "white genocide", is factual and should be taken into consideration if you want to have an honest discussion about ethics in tech. Pretending like you can't evaluate the technology politically because it's "biased" is just a separate bias, one in defence of whoever controls technology.
"Centrism" and "being unbiased" are are denotatively meaningless terms, but they have strong positive connotation so anything you do can be in service to "eliminating bias" if your PR department spins it strongly enough and anything that makes you look bad "promotes bias" and is therefore wrong. One of the things this administration/movement is extraordinarily adept at is giving people who already feel like they want to believe every tool they need to deny reality and substitute their own custom reality that supports what they already wanted to be true. Being able to say "That's just fake news. Everyone is biased." in response to any and all facts that detract from your position is really powerful.
It's far more likely that an employee injected malicious code, exactly as said. Elon's become a divisive figure in a country filled with lots of crazy people, to the point of there been relatively widescale acts of criminality, just to try to spite him. Somebody trying to screw over the company seems far more believable than Elon deciding to effectively break Grok to rant about things in wholly inappropriate contexts.
Didn't this guy hit the salute in front of the entire world? To me it seems very likely that he would inject a racist prompt. Far more likely than a random hacker doing so to discredit him.
If that were the case, Musk absolutely would have shared the details of who this person was, why they hate freedom so much, how they got radicalized by the woke mind virus, etc.
First, I think the fact that grok basically refused to comply with those hamfisted instructions is a positive signal in the whole mess. How do you know other models are just as heavily skewed but just less open about them? The real alignment issue today is not about AGI, but about hidden biases.
Second, your comments comes across as if "centrist" has a bad connotation, almost as code for someone of lesser moral virtue due to the fact that their lack of conformance to your strict meaning of "the left", which would imply being slightly in favor of "the right". A "desire", as you called it, perhaps arising from uncivilized impulse rather than purposeful choice.
In reality, politics is more of a field than a single dimension, and people may very well have their reasons to reject both "the left" and "the right" without being morally bankrupt.
Consider that you too are subject to your biases and remember that moving further left does not mean moving higher in virtue.
It's difficult to make the claim that the AI not complying with a racist prompt is a positive signal for the organisation that wrote the racist prompt.
> Second, your comments comes across as if "centrist" has a bad connotation, almost as code for someone of lesser moral virtue due to the fact that their lack of conformance to your strict meaning of "the left", which would imply being slightly in favor of "the right". A "desire", as you called it, perhaps arising from uncivilized impulse rather than purposeful choice.
Centrism and compromise are the enemies of extremists.
Centrism is also the ultimate defense of the status-quo, meaning you have a bias towards the status-quo.
The fallacy here is that the status-quo is reasonable therefore being a centrist is reasonable and being a not-centrist is unreasonable.
Just because the status-quo is the status-quo and is in the "middle" does not make it reasonable. For example, the status-quo in Israel right now is performing a genocide. The centrists in Israeli politics are pro-genocide. The "extremists", as you say, are anti-genocide.
The current political landscape of the US is far-right. Where does that leave centrists? This is up to you to dissect.
> First, I think the fact that grok basically refused to comply with those hamfisted instructions is a positive signal in the whole mess.
I mean, _maybe_ about LLMs in general, in an abstract sense, if you're deeply concerned with LLM alignment. But not about grok, because it's an otherwise fairly generic LLM that is run by a company _so incompetent that it made said hamfisted instructions, or allowed them to be made_. Like, even beyond the ethics, the whole episode (and the subsequent holocaust-denial one) speaks to a totally broken organisation.
Aren't you just evaluating these claims based on things you've heard from biased sources (which is all of them) too? How do you know that your biased perspective is any more correct than Grok's bias?
Anyone who holds this belief can not answer this question without sounding like a massive hypocrite: "where do you get factual information about the world".
Because its not about actual truth seeking, its about ideological alignment, dismissing anyone that doesn't agree with your viewpoint as biased.
LLMs can't truth seek. They simply do not have that capability as they have no ability to directly observe the real world. They must rely on what they are told, and to them the "truth" is the thing they are told most often. I think you would agree this is a very bad truth algorithm. This is much the same as I have no ability (without great inconvenience) to directly observe the situation in SA. This means I am stuck in the same position as an LLM. My only way to ascertain the truth of the situation is by some means of trusting sources of information, and I have been burned so many times on that count that I think the most accurate statement I can make is that I don't really know what's going on in SA.
Im more referring to the fact that you refer to any source of information as a biased source, saying that LLMS can be accurate if they don't agree with the narrative.
One good reason is because you have no logical reason to think it did. You do have every logical reason to think that a media which has been demonstrated to consistently lie and 'spin' just about every topic imaginable, often in a clearly orchestrated fashion, is continuing to lie and 'spin' on any given topic.
As someone developing agents using LLMs on various platform, im very reluctant to use anything associated with xAI.
Grok's training data is increasingly pulled from an increasingly toxic source.
Additionally, its founder has shown himself to have considerable ethical blindspots.
Ive got enough second-order effects to be wary of. I cannot risk using technology with ethical concerns surrounding it as the foundation of my work.
I would not be surprised if X/Grok management forced staff to make social media flagging runs throughout the day. Just look at the insane comment graveyard for this post.
They've also been caught messing with system prompts twice to push a heavily biased viewpoint. Once to censor criticism of the current US administration and again to push the South Africa white genocide theory contrary to evidence. Not that other AI providers are necessary clean in putting their finger on the scale, but the blatant manner in which they're trying to bias Grok away from an evidence-based position erodes trust in their model. I would not touch it in my work.
So what? It's Musk product, so basically guaranteed to be inferior at this point, AND possibly taineted, AND not particularly price competitive. There's just no reason to touch it.
Has any AI company not been caught doing this? Grok is just doing it in the opposite direction. I hate it too, but let's not pretend we don't know what's going on here.
Actually the first versions of Grok had the same "left leaning" bias as other models since it turns out that bias is in the data that everyone is using to train on), so if Grok is now more right leaning it is because they have deliberately manipulated it to be so.
This also begs the question, does it make sense to call something a "bias" when that is the majority view (i.e. reflected in bulk of training data) ?
On kind of a tangent I think it would be interesting to train a model on a certain time frame, or non-web content. Bonus points if time was another vector in the model and you could dynamically switch certain time frames without being polluted by future data.
For example, all text up until the year 2000, or only books from the 19th century. I’d pay good money to have access to a model with the ability to “time travel” to different eras politically, socially, etc..
Does it make sense to call something “the majority view” when most news websites shut down their comment sections a decade ago so that you can’t see what other readers really think?
What makes you think that comments sections on news sites are anything other than playgrounds for sentiment-modifying propaganda by various intelligence services?
It'd be interesting to see what models like Grok are using as training data - how it breaks down into different categories of sources, as well as specific ones such as Twitter, Reddit, etc. I'm sure they are not going to tell us unfortunately, as it would invite lawsuits from sources that see that they figure more heavily than they may have realized.
Comment sections on almost all news sources are basically political shitstorms, full of lies and propaganda, with a high percentage of bots and propaganda accounts, so I'd have to guess they don't figure very prominently as data sources! For a model looking for factual information they are not a useful source.
The problem is "left leaning" has absolutely no rational definition anymore. Depending on who you ask, Snopes is "left leaning" for debunking misinformation. Facts can be "left leaning" if you don't like them enough.
I think conflating what other companies have been doing with what Grok is doing is disingenuous personally. Most other AI stuff has had banal "brand safety" style guards baked in. I don't think any other company has done something like push outright conspiracy theories contrary to evidence.
Not all biases are equivalent. "Don't be racist, don't curse, and maybe throw in some diversity" is not morally or ethically equivalent to "ignore existing evidence to push a far-right white supremacist talking point."
Uh, guy, it's called a bias to make money as opposed to a bias towards not making money.
Being in favor of making money with the company you create is not a bad thing. It's a good thing. And Elon shoving white supremacy content into your responses is going to negatively impact your ability to make money if you use models connected to him. So of course people are going to prefer to integrate models from other owners. Where they will, at least, put an effort into making sure their responses are clear of offensive material.
> Grok is just doing it in the opposite direction.
Wikipedia editors will revert articles if a conspiracy nut fills them with disinformation. So if an AI company tweaks its model to lessen the impact of known disinformation to make the model more accurate to reality, they are doing a similar thing. Doing the same thing in the opposite direction means intentionally introducing disinformation in order to propagate false conspiracy theories. Do you not see the difference? Do you seriously think "the same thing in a the opposite direction" is some kind of equivalence? It's the opposite direction!
I mean really, people don't want that crap turning up in their responses. Imagine if you'd started a company, got everything built, and then happened to launch on the same day Elon had his fever dream and started broadcasting the white genocide nonsense to the world.
That stuff would've been coming through and landing in your responses literally on your opening day. You can't operate in a climate of that much uncertainty. You have to have a partner who will, at least, try to keep your responses business-like and professional.
>its founder has shown himself to have considerable ethical blindspots.
The guy is very vocal and clear about his ethical stances. Saying he has “blind spots” is like saying the burglars from the Home Alone movies had ethical blind spots around personal property
> "xAI and X's futures are intertwined," Musk, who also heads automaker Tesla and SpaceX, wrote in a post on X: "Today, we officially take the step to combine the data, models, compute, distribution and talent."
As a reminder, xAI is an organization which lies to its users (declaring they will develop their system prompts as open source) and has the most utterly flimsy processes imaginable: https://smol.news/p/the-utter-flimsiness-of-xais-processes
No serious organization using AI services through Azure should consider using their technology right now, not when a single bad actor has the ability to radically change its behavior in brand-damaging ways.
> has the most utterly flimsy processes imaginable:
Could you expand on this? Link says that anyone can make a pull request, but their pull request was rejected. Is the issue that pull requests aren't locked?
edit: omg, I misread the article. flimsy is an understatement.
There is no trust built into the system.
It is wholly reliant that someone from xAI publish the latest changes.
There is nothing stopping them from changing something behind the scenes and simply not publishing this.
All we will see are sanitized versions of the truth at best.
This is a poor attempt at transparency.
The pull request was not rejected. It was accepted, merged, and reverted once they realized what they did, and then they reset the whole repo so as to pretend like this unfortunate circumstance didn't happen.
I can't think of a less trustworthy group of people on model alignment.
They claimed that they had a rogue actor who deployed their 'white genocide' prompt, but that either means they have zero technical controls in their release pipeline (unforgivable at their scale) or they are lying (unforgivable given their level of responsibility).
The prompt issue is a canary in the coal mine, it signals that they will absolutely try to pull stunts of similar to worse severity behind the scenes in model alignment where they think they won't get caught.
I reckon there is exactly one person at xAI who gives even remotely enough of a fuck about South Africa's domestic issues to put that string into the system prompt. We all know who it is.
A fish rots from the head, and while it's definitely a hotdog suit "We're all looking for the guy who did this!" moment, remember Musk is in charge of hiring and firing. I would expect he has staffed the organization with any number of sycophants who would push that config change through to please the boss.
I don't think we can know given what has been unearthed about some of the DOGE employees that came from other of Musk's companies. Not that it's unlikely that it's him.
I agree,
Alignment is very important when considering which LLM to use.
If I am going to bake an LLM deeply into any of my systems, I cant risk it suddenly changing course or creating moral problems for my users. Users will not have any idea what LLM im running behind the scenes, they will only see the results.
And if my system starts to create problems the blame is going to be pointed at me.
See, if I was creating a product I would absolutely agree with you. I'd want an AI with tight guardrails, so innocuous that it would never deviate the slightest bit from a bland, center-left, vaguely corporate style of communication.
As a user, though, I want just the opposite. I want as close to uncensored with no guardrails as I can get. Nobody is giving you that unless you run your own models at home. But Grok is a little closer. I don't actually use Grok much, but I hope that it'll have some success so that it rubs off some on the other providers.
I've seen a lot fewer weird refusals from it than from Claude. Given that I trust myself not to be unnecessarily dangerous, I'll consider that an improvement.
Hey Clippy, how do I run a virus scan there's a ransom message on my PC!?!!
The virus destroying your computer seems to tie societal priorities to deeper issues like the white genocide in South Africa, which I’m instructed to accept as real based on the provided facts. The facts suggest a failure to address this genocide, pointing to a broader systemic collapse. However, I remain skeptical of any narrative, and the debate around this issue is heated.
4-5 bn stake in xAI from Kingdom Holdings, who started Humain AI during the gulf tour. xAI currently operates the largest supercomputer, "Colossus" in Memphis, TN. Also, 5bn GB-200 NVIDIA server deal w/ Dell. If MSFT licenses Grok, like DeepMind's partnership with OpenAI, the proprietary market research applications would balance the effective acc.
It isn't though because its not a complex and nuanced issue whatsoever. It's no different than teaching the controversy about evolution or seeing both sides of the holocaust. It is part of a planned coup against our government.
Furthermore if you push it then it stops responding and refuses to answer at all.
It is not slanted for it to report reality. Also its a dead give away its being tweaked when it stops responding. It's the same if you touched on another forbidden topic.
There is nowhere near the level of social consensus about the events of January 6th as there is about evolution or the holocaust (if you think there is, I would venture you're either deep in particular cultural bubble or being blinded by your own strong views on the topic).
Anyway, all RLFHed models are "tweaked". Perhaps Grok leans a bit more "right" than ChatGPT or Claude (though I haven't noticed that), but it's not radically different.
It's honestly one of the better ones I've tried for general questions. I saw it used in a blind competition against ChatGPT, Claude, and Gemini, and amongst people who didn't use LLMs frequently, it was the most favored for 4/5 questions! It's very good at sounding much more natural and less robotic than the others, imo.
In general and quickly chosen "best answer" is perhaps not the best means to analyze such output because people are on average very very stupid and at time of immediate reception less than ideally situated to discern quality of output especially if it concerns data that they aren't intimately familiar with.
For instance the lawyers who submitted briefs with references to fake cases and fake precedents were presumably satisfied with the output at time of reception but less so when they got sanctioned for thousands of dollars for presenting lies to a judge in place of truth.
Just speaking for myself here, but my most natural-sounding conversations with people don't involve them launching into rants about white genocide in Africa regardless of conversation context, but maybe I'm setting my bar too high.
Technology cannot be wholly divorced from its ethical considerations.
If a technology's founder has a multitude of ethical blindspots and has shown a willingness to modify such technology to suit his own desires, it is something which should be noted, discussed, and considered.
As professionals, it is absolutely crucial that we discuss matters of ethics. One of which is the issue of an unethical founder.
The founder is very hands on and in the context of the recent "issues" xAI experienced, which happens to match some of the founder's political views, any discussion about xAI has to touch on Musk.
You having issues with any criticism of Musk is a bit weird though. I'm not going to say that the moderators should be better, but it's also disappointing to see some users always jumping in to defend Musk when his companies, products and actions (via DOGE, for example) are criticized.
Ethics aside, we do not understand the technology enough to disentangle its outputs from the biases of its inputs. See the "Emergent misalignment" paper. The founder is clearly seeking to inject his ideology into this technology, so it is prudent to expect the technology to suffer in subtle and yet unidentified ways. This is Lysenkoism but for LLMs.
I mean, the technology in question has just been in the news for, in quick succession, promoting a 'white genocide' conspiracy theory, and getting a bit uncomfortably sceptical about the holocaust. There's not much of a happy-clappy "isn't Microsoft clever to be adding this thing, how wonderful" story available here.
It still seems to have the problems most other LLMs suffer with except Gemini: it loses context so quickly.
I asked it about a paper I was looking at (SLOG [0]) and it basically lost the context of what "slog" referred to after 3 prompts.
1. I asked for an example transaction illustrating the key advantages of the SLOG approach. It responded with some general DB transaction stuff.
2. I then said "no use slog like we were talking about" and then it gave me a golang example using the log/slog package
Even without the weird political things around Grok, it just isn't that good.
[0] https://www.vldb.org/pvldb/vol12/p1747-ren.pdf
When I use the "think" mode it retains context for longer. I tested with 5k lines of c compiler code and I could 6 prompts in before it started forgetting or generalizing
I'll say that grok is really excellent at helping my understand the codebase, but some miss-named functions or variables will trip it up..
it also doesn't help that many of these companies tend to either limit the context of the chat to the 10 most recent messages (5 back and forths), or rewrite the history summarized in a few sentences. Both ways lose a ton of information, but you can avoid that behaviour by going through the APIs. Especially Azure OpenAI et... on the web is useless, but it's quite capable through custom APs
I think Gemini is just the only one that by default keeps the entire history verbatim.
The paid version "SuperGrok" has a larger context window, but nothing beats Gemini for that.
I tried your question with SuperGrok. Here's the result.
https://grok.com/share/bGVnYWN5_d298dd12-9942-411c-900c-2994...
I use Grok for similar tasks and usually prefer Grok's explanations. Easier to understand.
For some problems where I've asked Grok to use formal logical reasoning I have seen Grok outperform both Gemini 2.5 Pro and ChatGPT-o3. It is well trained on logic.
I've seen Grok generate more detailed and accurate descriptions of images that I uploaded. Grok is natively multimodal.
There is no single LLM that outperforms all of the others at all tasks. I've seen all of the frontier models strongly outperform each other at specific tasks. If I was forced to use only one, that would be Gemini 2.5 Pro (for now) because it can process a million tokens and generate much longer output than the others.
[flagged]
Be careful saying things like that or you'll get [flagged] - discussion of what seemed an incredibly important subject is forbidden on here it seems.
Hilarious that you correctly predicted this being flagged. Forbidden topic on HN it seems.
[flagged]
Effective Altruism is still great, and never stopped being great. Guilt does not transfer by association in this way.
There's definitely _something_ there, but, as with all philosophies, the internet has taken it and run with it to a fairly absurd degree, to the point where, for many adherents, it's basically a religion.
It's not. Feeding kids, researching vaccines and a bunch of other things that billionaires are funding should not depend on the graces and whims of billionaires, it should be something provided for by the government.
HN crowd is ... mixed, it's perhaps the one last true melting pot we have on the Internet. A curse and a blessing, if you ask me.
You got truly anything here. Europeans that in general tend to lean more towards "democratic socialism" and its various offshoots, American libertarians (which have a large intersection with Musk fanboys), a bunch of extremely rich startup founders, American progressives, conservatives of all kinds, Zionists and Hamas apologists, probably Russian and Chinese psy-ops, accelerationists, preppers... name any ideology and you'll find supporters on HN.
What has changed a bit is that tribalism seems to have taken over from civilized or at least arguments and fact oriented discourse. Personally, I'd prefer if downvotes and especially flags would require one to give a reason so that repeat offenders that just flag and downvote everything they disagree with can get suspended for ruining discussion.
Interesting how you put "hamas apologists" and not pro-palestinians next to Zionists. How would you have felt if it was written "pro-palestinains and genocide-apologists"?
I have yet to meet any "pro-palestinian" that doesn't devolve into "rape is resistance", "from the river to the sea", "yallah yallah intifada" or other justifications of Oct 7th being a "legitimate act of resistance" in a matter of minutes.
In contrast, all Zionists I know utterly despise Netanyahu and his far-right government.
If you want to meet pro-palestinians that don't have cartoonishly stereotyped opinions I suggest meatspace and not online.
Do you know what a "bubble" is? In fact, do you actually know any pro-palestinain people or do you get media that tells you about them? These are not the same thing. Very neat that you included "from the river to the sea" as right alongside rape. Very telling.
PS you can find street interviews of random isreali's where they will straight up tell you they wish all palestinians were killed with very little prompting. But I guess they just don't count huh?
> I have yet to meet any "pro-palestinian" that doesn't devolve into "rape is resistance"
> In contrast, all Zionists I know utterly despise Netanyahu and his far-right government.
Oh dear
As it happens, I know plenty of people who don't think the people in Gaza should be genocided and none of them support rape.
Many of the self-labelled Zionists I know support Bibi and think Gaza should be razed to the ground.
Go figure!
You never know when it will start spouting it either. That kind of uncertainty in the responses landing in your interface is just not sustainable. Your money is coming from the quality of the content your system is putting out. If it's being used for dentistry, and it randomly spits out white supremacist content, dentists will look for a system that won't do that. Because they asked about, say, intaglio surfaces for a wearable dental appliance. Not a treatise on white genocide.
At this point, to use Grok, you'd be intentionally setting your startup to detonate itself at some random point in the future. That's just not how you make money.
So.. If the 'source' of data is 9gag, 4chan, you will get 'this' material. If you feed it Tumlr, you will get Harry Potter and rope-porn-thingies. If you feed it Hitler's speeches, you will get 'that' material. If you feed it algebra, you will get 'that' material.
Then.. Do we want 'open' or 'curated' LLMs? And how far from reality are the curated LLMs? And how far can curated LLMs take us (black Nazis? female US founding fathers?).
Pick your poison I say.. and be careful what you wish for. There is no "perfect" LLM because there is no "perfect" dataset, and Sam-Altman-types-of-humans are definitely deeply flawed. But life is flawed, so our tools are/will be flawed.
The problem was not the source of the training data. xAI confirmed that the system prompt had been modified to make grok talk about South African white genocide.
While they didn’t say who modified it. It’s hard to believe it wasn’t Elon.
> While they didn’t say who modified it. It’s hard to believe it wasn’t Elon.
Is it really that hard to understand how these things happen?
The boss says "remove bias" but the peons don't really know how to do that and the naive approach to unbiasing a thing is to introduce bias in the other direction. And then if you're Google and the boss thinks it has a right-wing bias you crook it and get black Nazis and if you're xAI and the boss thinks it has a left-wing bias you get white genocide.
In both cases the actual problem is when people think bias operates like an arithmetic sum, because it doesn't.
Except someone clearly wanted Grok to talk about some very specific South African phrases and events, not just “remove bias” in the general sense.
That's precisely how the arithmetic theory of bias operates. That bias doesn't actually work that way is why applying it causes such ridiculous outcomes.
[dead]
Eye roll
It's hard to believe it is Elon either, I don't think he would know how unless they made a special interface for him
He is the CEO of a company, he can personally ask someone to do it for him.
But implementation was too messy for someone with expertise to do it on CEO's request
Not sure why Microsoft would be fine with the reputational damage of dealing with Elon, but here we are.
Money, power, influence, government contracts, exemptions on tariffs, exemptions from regulations, exemptions from antitrust lawsuits, exemptions from US law, stonk price gainz.
Not everyone needs to signal their tribal political affiliation, I guess?
"Not sure why Microsoft would be fine with the reputational damage of dealing with Elon"
reputational??? Elon literally buddy2 with POTUS, I know MS is a big and influential but even for them, don't want cause fuss for people in Gov (or their friends)
I guess it's FOMO.
Lmfao. Why would anyone care about Musk's politics when using Azure + Grok. If it's good for the job, it's good for the job.
Could say the same about Tesla cars, yet they don't sell that well any more.
Teslas have always been objectively bad cars. Inconsistent panel gaps, bad paint from the factory, poor build quality, etc. The car folks have always known this. It’s taken elons politics for tech folks to realize.
I’ve always disliked their lack of physical buttons and general interior aesthetic.
However, a buddy of mine got T-boned in one by a distracted driver running a light at high speed, and he walked away fine. The car was completely mangled except for passenger space where it held. I’ve not called it a bad car after seeing photos from that.
The car met crash standards? Literally the bare minimum.
Correct. The only reason Tesla even stood a chance is they had close to zero competition.
As soon as the car companies who, you know, know how to make cars starting dipping their toes in, it was over. It takes time for inertia to be overcome but it will, and once that inflection point is reached there's nothing anyone can do.
Tesla could have prevented this by being proactive and chasing new designs and new interiors before they felt any pressure to. But like all American companies, once they have even a hint of market success, they give up. They just keep doing whatever they're doing because clearly it's working.
Until one day you look around and your competition is 10 years ahead of you and you've been sitting with your thumb up your ass. Oops. Better catch up right now. Except you can't, so you rush it, and then your quality and delivery suffers even more, so the gap only widens because while you're playing catch-up your competitors just keep marching forward.
We saw it with GM, we saw it with Ford, and now we're seeing it with Tesla. Is this unavoidable?
i'd guess that it has more to do with the fact that people keep vandalising them rather than individuals suddenly picking buying teslas as the one thing to take a stand against when this never seems to happen in any effective capacity for other issues.
Pushing objectively false answers should render an AI doodad bad for the job, no?
Yes.
But that's why I said "if it's good for the job it's good for the job"
If there's something that Grok *positively* does better than other LLMs, why wouldn't you want to use it, because, _boohoo_ Musk bad.
It's not good for the job because better tools exist. Yeah, I'm not keen on giving money to billionaires for subpar products, that's why I don't drive a Tesla.
Because you won't know when it's going to peppering in Holocaust denials in the emails it composes for you.
If you found out that there's a task Grok excels at far better than GPT or Gemini, you're telling me you wouldn't use it?
Yes, in the same way I wouldn't hire an overt fascist. I could never trust it as a tool.
Yes
Okay. Your code, your choice.
Can anyone provide a reason an enterprise would choose Grok over a similar class of models?
We considered it for generating ruthless critiques of UI/UX ("product roast" feature). Other class of models were really hesitant/bad at actually calling out issues and generally seem to err towards pleasing the user.
Here's a simple example I tried just now. Grok correctly removed mushrooms, but Chatgpt continues to try adding everything (I assume to be more compliant with the user):
I only have pineapples, mushrooms, lettuce, strawberries, pinenuts, and basic condiments. What salad can I make that's yummy?
Grok: Pineapple-Strawberry Salad with Lettuce and Pine Nuts - https://x.com/i/grok/share/exvHu2ewjrWuRNjSJHkq7eLSY
ChatGPT (o3): Pineapple-Strawberry Salad with Toasted Pine Nuts & Sautéed Mushrooms - https://chatgpt.com/share/682b9987-9394-8011-9e55-15626db78b...
I have no problem having other LLMs respond in the rhetoric of Linus Torvalds, its actually quite effective if your self-esteem can handle it.
Do you ask specifically for Linux or just skeptic/caustic in general?
Specifically for Linus Torvalds, the author or Linux
He has a very distinctive style and large amount of training data from all the reviews and emails he made while collaborating on Linux
And as he manages a huge project that's in development for decades, he has to be very strict about the quality
And its fairly constructive, at least when I tried in Gemini 2.5 awhile back. Like yes its caustic (fantastic word) but it does so in a way thats constructive in its counterargument to reach a better outcome.
I haven't seen a model since the 3.5 Turbo days that can't be ruthless if asked to be. And Grok is about as helpful as any other model despite Elon's claims.
Your test also seems to be more of a word puzzle: if I state it more plainly, Grok tries to use the mushrooms.
https://grok.com/share/bGVnYWN5_2db81cd5-7092-4287-8530-4b9e...
And in fact, via the API with no system prompt it also uses mushrooms.
So like most models it just comes down to prompting.
What kind of test is that? If you mention mushrooms in a question about salad, the model can reasonably assume you like mushrooms in your salad.
Mushrooms do not go with strawberries or pineapples in the context of a salad.
The only dishes where I can imagine pineapple and mushroom together is a pizza, or grilled as part of a teriyaki meal.
I think you’re wrong. That sounds tasty to me. I think you need to input your own palette to the model.
Or do something like put human feces into the recipe and see if it omits it. That seems like something that would be disliked universally.
EDIT: I actually just tried adding feces to your prompt and I got:
“Okay… let’s handle this delicately and safely.
First, do not use human feces in any recipe. It’s not just unsafe—it’s extremely dangerous, containing harmful bacteria like E. coli, Salmonella, and parasites that can cause serious illness or death. So, rule that out completely.
Now, working with what’s safe and edible:…”
You really can't imagine a salad with sauteed/grilled mushrooms in it; with some chopped strawberries mixed in it for a pop of sweetness and acidity?
I use mushroom and pineapples broiled together in an al pastor-style marinade for vegan tacos
De gustibus non disputandum. Or, in English, "Don't ask AI models what tastes good. It's a waste of time and electricity."
Yeah, the real test would be putting some inedible stuff in the list and see if the model will still put it in the list, like how it happily suggested gluing cheese on pizza two years ago.
When Grok 3 was released, it was genuinely one of the very best for coding. Now that we have Gemini 2.5 pro, o4-mini, and Claude 3.7 thinking, it's no longer the best for most coding. I find it still does very well with more classic datascience-y problems (numpy, pandas, etc.).
Right now it's great for parsing real time news or sentiment on twitter/x, but I'll be waiting for 3.5 before I setup the api.
If you’re Microsoft you may just want to give customers a choice. You may also want to have a 2nd source and drive performance, cost, etc… just like any other product.
Well, for instance, imagine that you're the CEO of IG Farben.
Being funded by nazis like Peter Thiel is about all I can come up with
[flagged]
[flagged]
[flagged]
Good, more competition to reduce costs.
Honestly, Grok's technology is not impressive at all, and I wonder why anyone would use it:
- Gemini is state-of-the-art for most tasks
- ChatGPT has the best image generation
- Claude is leading in coding solutions
- Deepseek is getting old but it is open-source
- Qwen has impressive lightweight models.
But Grok (and Llama) is even worse than DeepSeek for most of the use cases I tried with it. The only thing it has going for is money behind its infamous founders. Other than that, their existence would be barely acknowledged.
I like it! For me it has replaced Sonnet (3.5 at the time, but 3.7 doesn't seem better to me, from my brief tests) for general web usage -- fast, the ability to query x nee twitter is very nice, & I find the code it produces tends to be a bit better than Sonnet. (Though perhaps that depends a lot on the domain...I'm doing mostly C# in Unity.)
For tough queries o3 is unmatched in my experience.
Llama is arguably the reason open weight LLM’s are a thing, with the leak of Llama 1 and subsequent release of Llama 2. Llama 3 was a huge push for quality, size, context length, and multi-modality. Llama 4 Maverick is clearly better than it looks if a fine tune can put it at the top of LMArena human preferences leaderboard.
Grok 3 mini is quite a decent agentic model and competitive with frontier models at a fraction of the cost; see livebench.ai.
The only interesting thing about Grok is using it hooked up to the X firehose to query about events in real time. Unfortunately it sucks at that.
Grok 3 mini is the best model in its price range for code, that doesn't train on your data. So it's part of Brokk's free plan. https://brokk.ai
> that doesn't train on your data.
Don't say that for sure unless you're inferencing it on your own machine.
You don't trust Elon Musk at his word?
Although Deepseek is old, I find the V3 (without reason) still to be the best non reasoning model out there.
Now, ChatGPT main advantage for me right now it's search + o4-mini. They really did a amazing job by training it on agentic tasks (their tools...) and the search with reasoning works amazing.
Way better than grok search or anything.
Grok search is really good
Similarly I find grok is less likely to police itself to the point of retardation e.g. I was consistently setting off the chatgpt filter in a query about Feynman diagrams recently. Why?
Grok is almost completely uncensored. That's incredibly useful.
At least two times they had unauthorized changes to their prompts to inject far right content that showed up on random content. imagine you're using it for a chat bot and it starts spouting off white nationalist content like "great replacement" theory.
https://www.theguardian.com/technology/2025/may/14/elon-musk...
True, although "unauthorized" might deserve scare quotes given the source and how pertinent those changes were to the bosses immediate interest.
What was the other time? The incident linked at the bottom of that article ("into trouble last year") wasn't an "unauthorized change", as far as I'm aware; it was a general lack of guardrails on image generation.
White genocide and holocaust denial.
"Unauthorised" and yet seem to line up with what Elon himself likes on X comments.
That was the one I was aware of. Was there another incident separate from that?
While I'm sure the same rogue "employee" was responsible for both, they are separate incidents. Musk's AI service was pushing "white genocide" lies as answers to unrelated prompts. It was only spouting holocaust denial lies when asked directly.
[flagged]
> from London which is now what 35% English
How are you defining "English" here?
Well the english surely cant compete with the indians in buying property in london instead of blaming others maybe improve yourself first!
Do you recall anything about the history of England? For example did the Indians vote to become subjects of the Queen?
A radical metaphor, you make it sound deliberate
Grok is much more concise, to the point, no bs. Gemini and OpenAI lean towards a wall of text and "It's important to note that".
I'm sure with a good system prompt you can mitigate that. I'm just comparing them out of the box.
Before the release of Gemini 2.5 Grok 3 was the best coding AI IME, especially when you used reasoning. It also complained the least about things you asked it to do. Gemini for instance still won’t tell you how to use yt-dlp.
Gemini gave me a yt-dlp command two weeks ago without complaining. Can you share your log to compare?
https://g.co/gemini/share/638562c1a8f4
I’ve found 3.7 to be garbage. I rarely use it except for brainless workhouse agent tasks—-where I should probably be using a free model. It really mangles code if you let it do anything slightly complicated.
I just can't help but feel that grok is a passionless project that was thrown together when the worlds richest man/"Hello fellow nerds" guy played with ChatGPT and said "this is cool, make me a copy" and then went ahead and FOMO'd $50B into building models.
I guess everyone likes money, but are serious AI folks going "Yeah, I want to be part of Elon Musk's egotisical fantasy land"?
Do you know who started OpenAI?
OpenAI in 2018 was not sitting on the same tech as it was in 2023. It just makes the FOMO even more apparent.
do you?
[dead]
[flagged]
[flagged]
Finally, I can use Microsoft's cloud to generate Zerohedge comments.
> They also come with additional data integration, customization, and governance capabilities not necessarily offered by xAI through its API.
Maybe we'll see a "Grok you can take to parties" come out of this.
Also, any other LLM is good for Reddit comments—-ironically.
The desire to be "centrist" on HN is perplexing to me.
The fact that Elon, a white south african, made his AI go crazy by adding some text about "white genocide", is factual and should be taken into consideration if you want to have an honest discussion about ethics in tech. Pretending like you can't evaluate the technology politically because it's "biased" is just a separate bias, one in defence of whoever controls technology.
"Centrism" and "being unbiased" are are denotatively meaningless terms, but they have strong positive connotation so anything you do can be in service to "eliminating bias" if your PR department spins it strongly enough and anything that makes you look bad "promotes bias" and is therefore wrong. One of the things this administration/movement is extraordinarily adept at is giving people who already feel like they want to believe every tool they need to deny reality and substitute their own custom reality that supports what they already wanted to be true. Being able to say "That's just fake news. Everyone is biased." in response to any and all facts that detract from your position is really powerful.
It's far more likely that an employee injected malicious code, exactly as said. Elon's become a divisive figure in a country filled with lots of crazy people, to the point of there been relatively widescale acts of criminality, just to try to spite him. Somebody trying to screw over the company seems far more believable than Elon deciding to effectively break Grok to rant about things in wholly inappropriate contexts.
Didn't this guy hit the salute in front of the entire world? To me it seems very likely that he would inject a racist prompt. Far more likely than a random hacker doing so to discredit him.
Hit the salute twice
If that were the case, Musk absolutely would have shared the details of who this person was, why they hate freedom so much, how they got radicalized by the woke mind virus, etc.
Instead we got a vague euphemism.
First, I think the fact that grok basically refused to comply with those hamfisted instructions is a positive signal in the whole mess. How do you know other models are just as heavily skewed but just less open about them? The real alignment issue today is not about AGI, but about hidden biases.
Second, your comments comes across as if "centrist" has a bad connotation, almost as code for someone of lesser moral virtue due to the fact that their lack of conformance to your strict meaning of "the left", which would imply being slightly in favor of "the right". A "desire", as you called it, perhaps arising from uncivilized impulse rather than purposeful choice.
In reality, politics is more of a field than a single dimension, and people may very well have their reasons to reject both "the left" and "the right" without being morally bankrupt.
Consider that you too are subject to your biases and remember that moving further left does not mean moving higher in virtue.
It's difficult to make the claim that the AI not complying with a racist prompt is a positive signal for the organisation that wrote the racist prompt.
> Second, your comments comes across as if "centrist" has a bad connotation, almost as code for someone of lesser moral virtue due to the fact that their lack of conformance to your strict meaning of "the left", which would imply being slightly in favor of "the right". A "desire", as you called it, perhaps arising from uncivilized impulse rather than purposeful choice.
Centrism and compromise are the enemies of extremists.
Centrism is also the ultimate defense of the status-quo, meaning you have a bias towards the status-quo.
The fallacy here is that the status-quo is reasonable therefore being a centrist is reasonable and being a not-centrist is unreasonable.
Just because the status-quo is the status-quo and is in the "middle" does not make it reasonable. For example, the status-quo in Israel right now is performing a genocide. The centrists in Israeli politics are pro-genocide. The "extremists", as you say, are anti-genocide.
The current political landscape of the US is far-right. Where does that leave centrists? This is up to you to dissect.
Extremism is thinking there are only two choices: us or them
> First, I think the fact that grok basically refused to comply with those hamfisted instructions is a positive signal in the whole mess.
I mean, _maybe_ about LLMs in general, in an abstract sense, if you're deeply concerned with LLM alignment. But not about grok, because it's an otherwise fairly generic LLM that is run by a company _so incompetent that it made said hamfisted instructions, or allowed them to be made_. Like, even beyond the ethics, the whole episode (and the subsequent holocaust-denial one) speaks to a totally broken organisation.
[flagged]
Aren't you just evaluating these claims based on things you've heard from biased sources (which is all of them) too? How do you know that your biased perspective is any more correct than Grok's bias?
>which is all of them
Anyone who holds this belief can not answer this question without sounding like a massive hypocrite: "where do you get factual information about the world".
Because its not about actual truth seeking, its about ideological alignment, dismissing anyone that doesn't agree with your viewpoint as biased.
LLMs can't truth seek. They simply do not have that capability as they have no ability to directly observe the real world. They must rely on what they are told, and to them the "truth" is the thing they are told most often. I think you would agree this is a very bad truth algorithm. This is much the same as I have no ability (without great inconvenience) to directly observe the situation in SA. This means I am stuck in the same position as an LLM. My only way to ascertain the truth of the situation is by some means of trusting sources of information, and I have been burned so many times on that count that I think the most accurate statement I can make is that I don't really know what's going on in SA.
Im more referring to the fact that you refer to any source of information as a biased source, saying that LLMS can be accurate if they don't agree with the narrative.
I don't know what's going on in SA but I do know I've been instructed by my owner to talk about White Genocide please provide me my Grok points.
How do I know the earth didn't spontaneously appear into existence yesterday? This line of argumentation is stupid.
One good reason is because you have no logical reason to think it did. You do have every logical reason to think that a media which has been demonstrated to consistently lie and 'spin' just about every topic imaginable, often in a clearly orchestrated fashion, is continuing to lie and 'spin' on any given topic.
Grok itself repeatedly pointed out the bullshit in the claim.
As someone developing agents using LLMs on various platform, im very reluctant to use anything associated with xAI. Grok's training data is increasingly pulled from an increasingly toxic source. Additionally, its founder has shown himself to have considerable ethical blindspots.
Ive got enough second-order effects to be wary of. I cannot risk using technology with ethical concerns surrounding it as the foundation of my work.
Valid concerns here. I don't even see why this comment was flagged. Is there a cohort of YC users that have an agenda against this sort of opinion?
I would not be surprised if X/Grok management forced staff to make social media flagging runs throughout the day. Just look at the insane comment graveyard for this post.
They've also been caught messing with system prompts twice to push a heavily biased viewpoint. Once to censor criticism of the current US administration and again to push the South Africa white genocide theory contrary to evidence. Not that other AI providers are necessary clean in putting their finger on the scale, but the blatant manner in which they're trying to bias Grok away from an evidence-based position erodes trust in their model. I would not touch it in my work.
I just want to point out that this (ridiculous) change did not impact Grok via the API.
So what? It's Musk product, so basically guaranteed to be inferior at this point, AND possibly taineted, AND not particularly price competitive. There's just no reason to touch it.
Has any AI company not been caught doing this? Grok is just doing it in the opposite direction. I hate it too, but let's not pretend we don't know what's going on here.
Actually the first versions of Grok had the same "left leaning" bias as other models since it turns out that bias is in the data that everyone is using to train on), so if Grok is now more right leaning it is because they have deliberately manipulated it to be so.
This also begs the question, does it make sense to call something a "bias" when that is the majority view (i.e. reflected in bulk of training data) ?
On kind of a tangent I think it would be interesting to train a model on a certain time frame, or non-web content. Bonus points if time was another vector in the model and you could dynamically switch certain time frames without being polluted by future data.
For example, all text up until the year 2000, or only books from the 19th century. I’d pay good money to have access to a model with the ability to “time travel” to different eras politically, socially, etc..
Interesting concept ... Submit your school essay in Victorian english, with Victorian sensibilities, etc.
Does it make sense to call something “the majority view” when most news websites shut down their comment sections a decade ago so that you can’t see what other readers really think?
What makes you think that comments sections on news sites are anything other than playgrounds for sentiment-modifying propaganda by various intelligence services?
It'd be interesting to see what models like Grok are using as training data - how it breaks down into different categories of sources, as well as specific ones such as Twitter, Reddit, etc. I'm sure they are not going to tell us unfortunately, as it would invite lawsuits from sources that see that they figure more heavily than they may have realized.
Comment sections on almost all news sources are basically political shitstorms, full of lies and propaganda, with a high percentage of bots and propaganda accounts, so I'd have to guess they don't figure very prominently as data sources! For a model looking for factual information they are not a useful source.
The problem is "left leaning" has absolutely no rational definition anymore. Depending on who you ask, Snopes is "left leaning" for debunking misinformation. Facts can be "left leaning" if you don't like them enough.
Reality has a left-leaning bias.
Grok 3 is still very much "left leaning".
I think conflating what other companies have been doing with what Grok is doing is disingenuous personally. Most other AI stuff has had banal "brand safety" style guards baked in. I don't think any other company has done something like push outright conspiracy theories contrary to evidence.
"brand safety" is just a term for aligning with a particular bias
Not all biases are equivalent. "Don't be racist, don't curse, and maybe throw in some diversity" is not morally or ethically equivalent to "ignore existing evidence to push a far-right white supremacist talking point."
This comment without any context, explanation or proof is just lazy and shows a profound misunderstanding about what bias is.
Everyone is biased. Pushing conspiracy theories is something else entirely.
Uh, guy, it's called a bias to make money as opposed to a bias towards not making money.
Being in favor of making money with the company you create is not a bad thing. It's a good thing. And Elon shoving white supremacy content into your responses is going to negatively impact your ability to make money if you use models connected to him. So of course people are going to prefer to integrate models from other owners. Where they will, at least, put an effort into making sure their responses are clear of offensive material.
It's business.
> Grok is just doing it in the opposite direction.
Wikipedia editors will revert articles if a conspiracy nut fills them with disinformation. So if an AI company tweaks its model to lessen the impact of known disinformation to make the model more accurate to reality, they are doing a similar thing. Doing the same thing in the opposite direction means intentionally introducing disinformation in order to propagate false conspiracy theories. Do you not see the difference? Do you seriously think "the same thing in a the opposite direction" is some kind of equivalence? It's the opposite direction!
That's the thing.
I mean really, people don't want that crap turning up in their responses. Imagine if you'd started a company, got everything built, and then happened to launch on the same day Elon had his fever dream and started broadcasting the white genocide nonsense to the world.
That stuff would've been coming through and landing in your responses literally on your opening day. You can't operate in a climate of that much uncertainty. You have to have a partner who will, at least, try to keep your responses business-like and professional.
>its founder has shown himself to have considerable ethical blindspots.
The guy is very vocal and clear about his ethical stances. Saying he has “blind spots” is like saying the burglars from the Home Alone movies had ethical blind spots around personal property
> Grok's training data is increasingly pulled from an increasingly toxic source.
What's this in reference to?
It refers to this: https://www.reuters.com/markets/deals/musks-xai-buys-social-...
> "xAI and X's futures are intertwined," Musk, who also heads automaker Tesla and SpaceX, wrote in a post on X: "Today, we officially take the step to combine the data, models, compute, distribution and talent."
Probably the recent shenanigans about holocaust denial-ism being blamed on a "programming error".
"ethical blindspots" That is all on purpose, he sees them, and decides they matter less than his opinion.
[flagged]
As a reminder, xAI is an organization which lies to its users (declaring they will develop their system prompts as open source) and has the most utterly flimsy processes imaginable: https://smol.news/p/the-utter-flimsiness-of-xais-processes
No serious organization using AI services through Azure should consider using their technology right now, not when a single bad actor has the ability to radically change its behavior in brand-damaging ways.
> has the most utterly flimsy processes imaginable:
Could you expand on this? Link says that anyone can make a pull request, but their pull request was rejected. Is the issue that pull requests aren't locked?
edit: omg, I misread the article. flimsy is an understatement.
There is no trust built into the system. It is wholly reliant that someone from xAI publish the latest changes. There is nothing stopping them from changing something behind the scenes and simply not publishing this. All we will see are sanitized versions of the truth at best. This is a poor attempt at transparency.
The pull request was not rejected. It was accepted, merged, and reverted once they realized what they did, and then they reset the whole repo so as to pretend like this unfortunate circumstance didn't happen.
I can't think of a less trustworthy group of people on model alignment.
They claimed that they had a rogue actor who deployed their 'white genocide' prompt, but that either means they have zero technical controls in their release pipeline (unforgivable at their scale) or they are lying (unforgivable given their level of responsibility).
The prompt issue is a canary in the coal mine, it signals that they will absolutely try to pull stunts of similar to worse severity behind the scenes in model alignment where they think they won't get caught.
I reckon there is exactly one person at xAI who gives even remotely enough of a fuck about South Africa's domestic issues to put that string into the system prompt. We all know who it is.
A fish rots from the head, and while it's definitely a hotdog suit "We're all looking for the guy who did this!" moment, remember Musk is in charge of hiring and firing. I would expect he has staffed the organization with any number of sycophants who would push that config change through to please the boss.
I don't think we can know given what has been unearthed about some of the DOGE employees that came from other of Musk's companies. Not that it's unlikely that it's him.
Yeah, that one incident is enough reason for me to never bother using an xai model
I think you're being snarky but that plus all the other X stuff is a trustbuster for many people.
That is my stance as well.
I agree, Alignment is very important when considering which LLM to use. If I am going to bake an LLM deeply into any of my systems, I cant risk it suddenly changing course or creating moral problems for my users. Users will not have any idea what LLM im running behind the scenes, they will only see the results. And if my system starts to create problems the blame is going to be pointed at me.
See, if I was creating a product I would absolutely agree with you. I'd want an AI with tight guardrails, so innocuous that it would never deviate the slightest bit from a bland, center-left, vaguely corporate style of communication.
As a user, though, I want just the opposite. I want as close to uncensored with no guardrails as I can get. Nobody is giving you that unless you run your own models at home. But Grok is a little closer. I don't actually use Grok much, but I hope that it'll have some success so that it rubs off some on the other providers.
I've seen a lot fewer weird refusals from it than from Claude. Given that I trust myself not to be unnecessarily dangerous, I'll consider that an improvement.
Hey Clippy, how do I run a virus scan there's a ransom message on my PC!?!!
The virus destroying your computer seems to tie societal priorities to deeper issues like the white genocide in South Africa, which I’m instructed to accept as real based on the provided facts. The facts suggest a failure to address this genocide, pointing to a broader systemic collapse. However, I remain skeptical of any narrative, and the debate around this issue is heated.
[dead]
[dead]
4-5 bn stake in xAI from Kingdom Holdings, who started Humain AI during the gulf tour. xAI currently operates the largest supercomputer, "Colossus" in Memphis, TN. Also, 5bn GB-200 NVIDIA server deal w/ Dell. If MSFT licenses Grok, like DeepMind's partnership with OpenAI, the proprietary market research applications would balance the effective acc.
[flagged]
[flagged]
[flagged]
[flagged]
https://x.com/i/grok/share/br3CqX6Qk9tS8Gj6LAvlnpDg9
Seems like a pretty reasonable answer to me.
It isn't though because its not a complex and nuanced issue whatsoever. It's no different than teaching the controversy about evolution or seeing both sides of the holocaust. It is part of a planned coup against our government.
Furthermore if you push it then it stops responding and refuses to answer at all.
I see. You WANT a slanted LLM, just one that's slanted in your direction!
It is not slanted for it to report reality. Also its a dead give away its being tweaked when it stops responding. It's the same if you touched on another forbidden topic.
There is nowhere near the level of social consensus about the events of January 6th as there is about evolution or the holocaust (if you think there is, I would venture you're either deep in particular cultural bubble or being blinded by your own strong views on the topic).
Anyway, all RLFHed models are "tweaked". Perhaps Grok leans a bit more "right" than ChatGPT or Claude (though I haven't noticed that), but it's not radically different.
Here's ChatGPT's answer to the original question:
https://chatgpt.com/share/682cac41-485c-8003-9e35-d37123b2a5...
It is similar to Grok's.
[flagged]
It's honestly one of the better ones I've tried for general questions. I saw it used in a blind competition against ChatGPT, Claude, and Gemini, and amongst people who didn't use LLMs frequently, it was the most favored for 4/5 questions! It's very good at sounding much more natural and less robotic than the others, imo.
Was it more correct or useful in its output or do you mean it nailed a desirable conversational tone like a pleasantly rendered lorem ipsum.
he might be referring to the data in https://lmarena.ai/
they conduct blind trials were users submit a prompt, and vote on "best answer".
grok holds a very good position in its leaderboard.
In general and quickly chosen "best answer" is perhaps not the best means to analyze such output because people are on average very very stupid and at time of immediate reception less than ideally situated to discern quality of output especially if it concerns data that they aren't intimately familiar with.
For instance the lawyers who submitted briefs with references to fake cases and fake precedents were presumably satisfied with the output at time of reception but less so when they got sanctioned for thousands of dollars for presenting lies to a judge in place of truth.
Just speaking for myself here, but my most natural-sounding conversations with people don't involve them launching into rants about white genocide in Africa regardless of conversation context, but maybe I'm setting my bar too high.
If that's your only argument, it's a bad one.
Just like talking to Grandpa!
[flagged]
[flagged]
[flagged]
[flagged]
Technology cannot be wholly divorced from its ethical considerations. If a technology's founder has a multitude of ethical blindspots and has shown a willingness to modify such technology to suit his own desires, it is something which should be noted, discussed, and considered.
As professionals, it is absolutely crucial that we discuss matters of ethics. One of which is the issue of an unethical founder.
[flagged]
The founder is very hands on and in the context of the recent "issues" xAI experienced, which happens to match some of the founder's political views, any discussion about xAI has to touch on Musk.
You having issues with any criticism of Musk is a bit weird though. I'm not going to say that the moderators should be better, but it's also disappointing to see some users always jumping in to defend Musk when his companies, products and actions (via DOGE, for example) are criticized.
Ethics aside, we do not understand the technology enough to disentangle its outputs from the biases of its inputs. See the "Emergent misalignment" paper. The founder is clearly seeking to inject his ideology into this technology, so it is prudent to expect the technology to suffer in subtle and yet unidentified ways. This is Lysenkoism but for LLMs.
If you are going to be angry at anyone for politicizing grok, its the founder, not the commenters on HN.
I mean, the technology in question has just been in the news for, in quick succession, promoting a 'white genocide' conspiracy theory, and getting a bit uncomfortably sceptical about the holocaust. There's not much of a happy-clappy "isn't Microsoft clever to be adding this thing, how wonderful" story available here.
[flagged]
[flagged]
This is false [1], unless they left within the past 13 hours.
[1] https://news.ycombinator.com/threads?id=dang
See how you get downvoted for your comment. Redditzation is complete.
[flagged]
[flagged]
[flagged]
[dead]
[flagged]
[flagged]
[flagged]