Google pulls AI model after senator says it fabricated assault allegation

75 points by croemer 8 hours ago

kmfrk 7 hours ago

One of the things that really has me worried at the moment are people asking chatbots who to vote for ahead of upcoming elections.

Especially in parliamentary democracies where people already take political quizzes to make sense of all the parties and candidates on the ballot.

black6 6 hours ago

If you're asking a machine which human you should vote for, you probably shouldn't be voting.
- redleader55 4 hours ago
  
  Do you think the other ways in which people vote are better: selling your vote, picking a candidate for being presentable, picking a candidate for having the right complexion/religion/sexual orientation, picking a candidate for being married or having kids, picking a candidate because they are "smart", or poor or ... I could go on. Giving the right prompt which you find on the internet might give you a better choice than you might decide for on your own.
- mikkupikku 5 hours ago
  
  I think we don't do democracy because we think the masses are informed and make good decisions, but rather because it's the best system for ensuring peaceful transitions of power, thereby creating social stability which is conducive to encouraging investment in the future.
  So uninformed people participating isn't an unfortunate side effect, but rather the point: making everybody feel included in the decision making processes, to make people more likely to accept political change.
  - IAmBroom 5 hours ago
    
    Are you saying?...
    "I think we do democracy not because we think the masses are informed and make good decisions, but rather because it's the best system for ensuring peaceful transitions of power, thereby creating social stability which is conducive to encouraging investment in the future.
    
    mikkupikku 4 hours ago
    
    Yes.
  - lesuorac 3 hours ago
    
    I think people argue this but I don't think its true.
    The lack of warlords leads to peaceful transitions. Trump can feel all he wants about the 2020 election but his sphere of influence was too small to take control.
    This isn't the case for all those power struggles when a monarch dies. Each Lord had their own militia they could mobilize to take control and leads to stuff like War of the Roses.
    We had this same issue going into the Civil War where the US army was mostly militias so it's pretty easy to grab the southern ones together and go fight the north. This isn't going to work so well post-1812 where a unified federal army exists. Of course, if you start selectively replacing generals with loyalists then you start creating a warlord.
- jameslk 2 hours ago
  
  Vibe voting is the end of any semblance of neutrality in AI models. Each major party will have its own model
- seanmcdirmid 5 hours ago
  
  For local elections, I have to frantically google on the day my ballot is due to figure out how to vote for. My criteria is pretty fixed: I want to vote for moderates but beyond a few high profile races I don't have a clue who the moderate option is. I can see using AI to summarize positions for more obscure candidates.
  - netsharc 4 hours ago
    
    But... it's like asking a knowledgeable person. How are you sure she's giving you answers as your criteria demands, or whether she's been influenced to skew the answers to favor a candidate..
    "Let me ask Grok who I should vote for..."
  - GuinansEyebrows 4 hours ago
    
    > For local elections, I have to frantically google on the day my ballot is due to figure out how to vote for.
    what on earth??
    practically every metropolitan area and tons of smaller communities have multiple news sources that publish "voting guides" in addition to voter pamphlets that go out before elections which detail candidates positions, ballot initiatives etc.
    barring that you can also just... do your "frantic googling" before the election. it's not a waste of your time to put a little of it toward understanding the political climate of your area and maybe once in a while forming an opinion instead of whatever constitutes a "moderate" position during the largest rightward shift of the overton window in decades.
    
    amarcheschi 3 hours ago
    
    With the added bonus that a llm might not even be updated to the last developments of what happened politically and have outdated views or might not know about the candidate well enough to provide accurate info (or at least, more accurate than any voting phamplets or guides)

croemer 8 hours ago

This is about Gemma, Google's open weights model. And specifically availability through AI studio. I don't think they'll make the weights unavailable.

aleatorianator 8 hours ago

[flagged]
- notavalleyman 7 hours ago
  
  You wrote this comment under and article about Gemma, an open weights model which anyone can download and run at home.
  Here is more info and links to the models, so you can interrogate them about Senatorial scandals on your hardware at home.
  https://huggingface.co/blog/gemma
  Your claim was so far from the truth of reality, that now, it's incumbent upon you to go back through the chain of faulty reasoning. You took it for granted that a conspiracy theory about suppressing information was true, when actually, the same Gemma model was already open-weighted by the same conspirators who you accuse of keeping Gemma out of regular peoples' reach
- jqpabc123 8 hours ago
  
  This is about these tools being blatantly flawed and unreliable.
  In legal terms, marketing such a product is called "negligence" or "libel".
  Lots of software is flawed and unreliable but this is typically addressed in the terms of service. This may not be possible with AI because the "liability" can extend well beyond just the user.
  - jwitthuhn 5 hours ago
    
    It is wrong to release something unreliable even while acknowledging it is unreliable? The product performs as advertised. If people want accurate information an LLM is the wrong tool for the job.
    From the Gemma 3 readme on huggingface: "Models generate responses based on information they learned from their training datasets, but they are not knowledge bases. They may generate incorrect or outdated factual statements."
  - flufluflufluffy 8 hours ago
    
    The purpose of the tool is for writing code, it is not for generating factual English sentences.
    
    mikkupikku 7 hours ago
    
    I do think that might be the only thing they turn out to be any good at, and only then because software is relatively easy to test and iterate on. But does that mean writing code is what the models are "for"? They're marketed as being good for a lot more than coding.
    
    jqpabc123 6 hours ago
    
    it is not for generating factual English sentences.
    Then the tool should not be doing it --- but it does. And therein is the legal liability.

hnuser123456 8 hours ago

There should probably be a little more effort towards making small models that don't just make things up when asked a factual question. All of us who have played with small models know there's just not as much room for factual info, they are middle schoolers who just write anything. Completely fabricated references are clearly an ongoing weakness, and easy to validate.

rewilder12 8 hours ago

LLMs by definition do not make facts. You will never be able to eliminate hallucinations. It's practically impossible.
Big tech created a problem for themselves by allowing people to believe the things their products generate using LLMs are facts.
We are only reaching the obvious conclusion of where this leads.
- kentm 6 hours ago
  
  A talk I went to made the point that LLMs don't sometimes hallucinate. They always hallucinate -- its what they're made to do. Usually those hallucinations align with reality in some way, but sometimes they don't.
  I always thought that was a correct and useful observation.
- hnuser123456 7 hours ago
  
  To be sure, a lot of this can be blamed on using AI studio to ask a small model a factual question. It's the raw LLM output of a highly compressed model, it's not meant to be everyday user facing like the default Gemini models, and doesn't have the same web search and fact checking behind the scenes.
  On the other hand, training a small model to hallucinate less would be a significant development. Perhaps with post-training fine-tuning, after getting a sense of what depth of factual knowledge the model has actually absorbed, adding a chunk of training samples with a question that goes beyond the model's fact knowledge limitations, and the model responding "Sorry, I'm a small language model and that question is out of my depth." I know we all hate refusals but surely there's room to improve them.
  - th0ma5 7 hours ago
    
    All of these techniques just push the problems around so far. And anything short of 100% accurate is a 100% failure in any single problematic instance.
parliament32 an hour ago

> effort towards making small models that don't just make things up
But all of their output it literally "made up". If they didn't make things up, they wouldn't have a chat interface. Making things up is quite literally the core of this technology. If you want a query engine that doesn't make things up, use some sort of SQL.
daveed 2 hours ago

How do you know how much effort they're putting in? If they're making stuff up then they're not useful, I think the labs want their models to be useful.
tensor 8 hours ago

I don't think there is any math showing that it's the models size that limits "fact" storage, to the extent these models store facts. And model size definitely does not change the fact that all LLMs will write things based on "how" they are trained, not on how much training data they have. Big models will produce nonsense just as readily as small models.
To fix that properly we likely need training objective functions that incorporate some notion of correctness of information. But that's easier said than done.
s1mplicissimus 8 hours ago

Given that the current hypewave is already going on for a couple years, I think it's plausible to assume that there really are fundamental limitations with LLMs on these problems. More compute didn't solve it as promised, so my bets are on "LLMs will never not do hallucinations"
skywhopper 8 hours ago

If you disable making things up, LLMs will not work. Making stuff up is literally how they work.
- hnuser123456 7 hours ago
  
  I am aware, but think about right after a smaller is done training. The researchers can then quiz it to get a sense of the depth of knowledge it can reliably cite, then fine-tune with examples of questions beyond the known depth of the model being refused with "Sorry, I'm a small model and don't have enough info to answer that confidently."
  Obviously it's asking for a lot to try to cram more "self awareness" into small models, but I doubt the current state of the art is a hard ceiling.
  - porridgeraisin 7 hours ago
    
    > then fine-tune with examples of questions beyond the known depth of the model being refused with "Sorry, I'm a small model and don't have enough info to answer that confidently."
    This has already been tried, llama pioneered it (as far as I can infer from public knowledge, maybe openai did it years ago I don't know).
    They looped through a bunch of wikipedia pages, made questions out of the info given there, posed them to the LLM and then whenever the answer did not match what was in wikipedia, they went ahead and finetuned on "that question: Sorry I don't know ...".
    Then, we went one step ahead, and finetuned it to use search in these cases instead of saying I don't know. Finetune it on the answer toolCall("search", "that question", ...) or whatever.
    Something close to the above is how all models with search tool capability are fine tuned.
    All these hallucinations are despite those efforts, it was much worse before.
    This whole method depends on the assumption that there is actually a path in the internal representation that fires when it's gonna hallucinate. The results so far tell us that it is partially true. No way to quantify it of course.

swivelmaster 7 hours ago

At some point we have to be willing to call out, at a societal level, that LLMs have been fundamentally oversold. The response to "It made defamatory facts up" of "You're using it wrong" is only going to fly for so long.

Yes, I understand that this was not the intended use. But at some point if a consumer product can be abused so badly and is so easy to use outside of its intended purposes, it's a problem for the business to solve and not for the consumer.

im3w1l 7 hours ago

Businesses can't just wave a magic wand and make the models perfect. It's early days with many open questions. As these models are a net positive I think we should focus on mitigating the harms rather than some zero tolerance stance. We shouldn't allow the businesses to be neglectful, but I don't see evidence of that.
- derbOac 4 hours ago
  
  Here on HN we talk about models, and rightfully so. Elsewhere though people talk about AI, which has a different set of assumptions.
  It's worth noting too that how we talk about and use AI models is very different from how we talk about other types of models. So maybe it's not surprising people don't understand them as models.
- HacklesRaised 6 hours ago
  
  It can't be perfect right? I mean the models require some level of entropy?
- mindslight 7 hours ago
  
  > We shouldn't allow the businesses to be neglectful, but I don't see evidence of that.
  Calling it "AI", shoving it into many existing workflows as if it's competently answering questions, and generally treating it like an oracle IS being neglectful.
- watwut 2 hours ago
  
  Businesses should be able to not lie. In fact, they should be punished for lying and exaggersting much more often - both by being criticised and loosing contracts and legally.
- ares623 6 hours ago
  
  > As these models are a net positive
  Uhhh… net positive for who exactly?
  - water-data-dude 6 hours ago
    
    For the shareholders of a few companies (in the short term).
  - im3w1l 6 hours ago
    
    Chatgpt has 800 million weekly active users. I think it's a net positive for them.
    
    IAmBroom 5 hours ago
    
    Well, since that's 10x the number of weekly active opioid users, it's at least 10x more positive than fentynal.
    Or am I not following your logic correctly?
    
    im3w1l 4 hours ago
    
    You are not arguing in good faith.
    
    vel0city 3 hours ago
    
    You seem to be missing the obvious point: popularity of a product doesn't ensure the benefit of said product. There are tons of wildly popular products which have extremely negative outcomes for the user and society at large.
    Let's take a weaker example, some sugary soda. Tons of people drink sugary sodas. Are they truly a net benefit to society, or a net negative social cost? Just pointing out that there are a high number of users doesn't mean it inherently has a high amount of positive social outcomes. For a lot of those drinkers, the outcomes are incredibly negative, and for a large chunk of society the general outcome is slightly worse. I'm not trying to argue sugary sodas deserve to be completely banned, but its not a given they're beneficial just because a lot of people bothered to buy them. We can't say Coca-Cola is obviously good for people because its being bought in massive quantities.
    Do the same analysis for smoking cigarettes. A product that had tons of users. Many many hundreds of millions (billions?) of users using it all day every day. Couldn't be bad for them, right? People wouldn't buy something that obviously harms them, right?
    AI might not be like cigarettes and sodas, sure. I don't think it is. But just saying "X has Y number of weekly active users, therefore it must be a net positive" as some example of it truly being a positive in their lives is drawing a correlation that may or may not exist. If you want to show its positive for those users, show those positive outcomes, not just some user count.
    
    ares623 4 hours ago
    
    Net positive to me, means that the negative aspects are outweighed by the positive aspects.
    How confident are you that 800M people know what the negative aspects are to make it a net positive for them?
TZubiri 7 hours ago

Maybe someone else actually made up the defamatory fact up, and it was just parroted.
But fundamentally the reason ChatGPT became so popular as opposed to its incumbents like Google or Wikipedia, is that it dispensed with the idea of attributing quotes to sources. Even if 90% of the things it says can be attributed, it's by design that it can say novel stuff.
The other side of the coin is that for things that are not novel, it attributes the quote to itself rather than sharing the credit with sources, which is what made the thing so popular in the first place, as if it were some kind of magic trick.
These are obviously not fixable, but part of the design. I have the theory that the liabilities will be equivalent if not greater to the revenue recouped by OpenAI, but the liabilities will just take a lot longer to realize, considering not only the length of trials, but the length of case law and even new legislation to be created.
In 10 years, Sama will be fighting to make the thing an NFP again and have the government bail it out of all the lawsuits that it will accrue.
Maybe you can't just do things

AmbroseBierce 7 hours ago

The current president makes fabricated allegations almost every single day, and many politicians in general but "oh no the machine did it a handful of times so we need to crucify the technology that just imitates humans including the aforementioned, and the billions of dollars invested in creating it"

mikeholler 7 hours ago

Perhaps we should be trying to make the machines perform correctly, instead of saying that creating fabrications is OK for anyone or anything.
- AmbroseBierce an hour ago
  
  Perhaps we should make sure that the human sources are liable for making false allegations and therefore the likelihood of those fabrications existing in the first place is significantly reduced, so any machine -or any other entity- using public information available is more likely to be correct.
dsr_ 7 hours ago

I'm okay with condemning all of them. Bad behavior on one part doesn't excuse the bad behavior of the other.
swivelmaster 6 hours ago

Just because the media has failed doesn’t mean we should accept that kind of failure everywhere.

jqpabc123 8 hours ago

AI is going to be a lawyer's wet dream.

Imagine the ads on TV: "Has AI lied about you? Your case could be worth millions. Call now!"

filoleg 8 hours ago

And also (an incompetent and lazy) lawyer’s worst nightmare.
At least once a week, there is another US court case where the judge absolutely rips apart an attorney for AI-generated briefs and statements featuring made-up citations and nonexistent cases. I am not even following the topic closely, and yet I just encounter at least once a week.
Here are a couple most recent ones I spotted: Mezu v. Mezu (oct 29)[0], USA v. Glennie Antonio McGee (oct 10)[1].
0. https://acrobat.adobe.com/id/urn:aaid:sc:US:a948060e-23ed-41...
1. https://storage.courtlistener.com/recap/gov.uscourts.alsd.74...
- lesuorac 3 hours ago
  
  Meh, the lawyer got a scarlet letter. Wake me up when somebody gets disbarred.
rchaud 8 hours ago

Unlike medicine, AI isn't regulated, so lawyers won't have anything to go after.
- advisedwang 7 hours ago
  
  I think the parent post imagines defamation cases will be worthwhile. I'm sure there will be some, but an AI simply lying in a query doesn't = damages worth suing over.
  - jqpabc123 6 hours ago
    
    It depends on who is the lie was about and what was said.
    Simple example: A prospective employer refuses to hire you because of some blatant falsehood generated by an LLM.
ghjv 8 hours ago

strange to talk about it in the future tense. it's here and yep, it's an object of fascination for the legal system
Handy-Man 8 hours ago

I mean even lawyers are getting sanctioned for using them without verifying their outputs lol.
- https://www.msba.org/site/site/content/News-and-Publications...
- https://www.reuters.com/legal/government/judge-disqualifies-...
- https://calmatters.org/economy/technology/2025/09/chatgpt-la...

MyOutfitIsVague 8 hours ago

From her letter:

> The consistent pattern of bias against conservative figures demonstrated by Google’s AI systems is even more alarming. Conservative leaders, candidates, and commentators are disproportionately targeted by false or disparaging content.

That's a little rich given the current administration's relationship to the truth. The present power structure runs almost entirely on falsehoods and conspiracy theories.

mikkupikku 8 hours ago

It might be rich, but is it false? Are Google models more likely to defame conservatives than not?
I think plausiblibly they might, through no fault of Google, if only because scandals involving conservatives might be statistically more likely.
- petre 6 hours ago
  
  If they train it on conservative media is going to defame everybody else just like MS' Tay chatbot.
- th0ma5 7 hours ago
  
  A lot facts that people deem liberal or leftist or something are simply statistically consistent with the world literature as a whole and problematic for conservative ideals.
  - mikkupikku 6 hours ago
    
    A claim can be both statistically consistent with a given corpus and also simply wrong. Saying that Ted Cruz was caught giving blowjobs in an airport bathroom for instance. That headline wouldn't surprise anybody, but it's also wrong. I just made it up now.
    
    th0ma5 4 hours ago
    
    So then is that leftist to point out the fact that it shouldn't be surprising or just an accurate description of party members as a whole?
    
    mikkupikku 4 hours ago
    
    > So then is that leftist to point out the fact that it shouldn't be surprising
    I don't think so, no.
    > just an accurate description of party members as a whole?
    It wouldn't be. While enough republicans have gotten caught being gay to remove the element of surprise and plausibly be the basis of LLM hallucinations, most of them haven't been, so such an LLM hallucination wouldn't actually be accurate, merely unsurprising.
    
    th0ma5 25 minutes ago
    
    I don't think there's any mechanism inside of language models to accurately weigh this kind of nuance. I mean you would think that it would work like that but I don't see how it could or would in practice. The quantity of words and relationship to their description of reality is not something that can be directly calculated, let alone audited.
  - criddell 7 hours ago
    
    Back in 2006 at the White House Correspondent's Dinner, Stephen Colbert said "reality has a well known liberal bias" which I thought was a pretty funny line.
    https://en.wikipedia.org/wiki/Stephen_Colbert_at_the_2006_Wh...
    
    th0ma5 6 hours ago
    
    I think about facets of this a lot. The conservative ideals of thinking only in zero-sum characterizations of political problems, that someone must go without in order for someone to gain, or that a conservative ideal is to be led by some authority both don't comport with how knowledge can also be gained in society through peer to peer relationships or the very idea that wealth can be created. That the world doesn't have to follow conservative ideals is how a statement like that becomes so funny since it is the current conspiratorial reflex of the right.
    
    huhkerrf 7 hours ago
    
    Yeah, we're all aware. Midwits have been repeating it ever since.[0]
    0: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

renewiltord 7 hours ago

Placing a model behind a “Use `curl` after generating an API key using `gcloud auth login` and accepting the terms of service” is probably a good idea. Anything but the largest models equipped with search to ground generation is going to hallucinate at a high enough rate that a rando can see it.

You need to gate away useful technology from the normies, usually. E.g. kickstarter used to have a problem where normies would think they were pre-ordering a finished product and so they had to pivot to being primarily a pre-order site.

Anything that is actually experimental and has less than very high performance needs to be gated away from the normies.

chimeracoder 7 hours ago

LLMs have serious problems with accuracy, so this story is entirety believable - we've all seen LLMs fabricate far more outlandish stuff.

Unfortunately, it's also worth pointing out that neither Marsha Blackburn nor Robby Starbuck are reliable narrators historically; nor are they even impartial actors in this particular story.

Blackburn has a long history of fighting to regulate Internet speech in order to force them to push ideological content (her words, not mine), so it's not surprising to see that this story originated as part of an unrelated lawsuit over First Amendment rights on the Internet and that Blackburn's response to it is to call for it all to be shut down until it can be regulated according to her partisan agenda (again, her words, not mine) - something which she has already pushed for via legislation that she has coauthored.

Razengan 7 hours ago

Just a day ago I asked Gemini to search for Airbnb rooms in an area and give me a summarized list.

It told me it can't and I could do it myself.

I told it again.

Again it told me it can't, but here's how I could do it myself.

I told it it sucks and that ChatGPT etc. can do it for me.

Then it went and I don't know, scrapped Airbnb or used a previous search it must have had, to pull up rooms with an Airbnb link to each.

…

After using a bunch of products I now think a common option they all need to have is a toggle between "Monkey's Paw" mode: Do As I Say, vs a "Do What I Mean" mode.

Basically where the user takes responsibility and where the AI does.

If it can't do or isn't allowed to do something when in Monkey Paw mode then just stop with a single sentence. Don't go on a roundabout gaslighting trip.

rchaud 8 hours ago

Terrifying to think that some techbro is out there right now concocting plans for an "AI background check" startup.

confounder 7 hours ago

Like this one? https://checkr.com/our-technology/ai-powered

pwlm 8 hours ago

"False or misleading answers from AI chatbots masquerading as facts still plague the industry and despite improvements there is no clear solution to the accuracy problem in sight."

One potential solution to the accuracy problem is to turn facts into a marketplace. Make AIs deposit collateral for the facts they emit and have them lose the collateral and pay it to the user when it's found that statements they presented were false.

AI would be standing behind its words by having something to lose, like humans. A facts marketplace would make facts easy to challenge and hard to get right.

Working POC implementation of facts marketplace in my submissions.

mikkupikku 7 hours ago

I doubt that could ever work. It's trivial to get these models to output fabrications if that's what you want; just keep asking it for more details about a subject than it could reasonably have. This works because the models are absolutely terrible at saying "I don't know", and this might be a fundamental limitation of the tech. Then of course you have the mess of figuring out what the facts even are, there are many contested subjects our society cannot agree on, many of which don't lend themselves to scientific inquiry.