I'm in favor of the university's project and think many more projects like this are needed.
The internet is swarmed with bots. I would estimate something like 25% of all Reddit, X/Twitter, YouTube and Facebook comments come from bots. Perhaps higher.
It's not like r/CMV was some purely human oasis in the Reddit bot-sea.
It's a tough pill to swallow, but the internet is dead as far as open forum communications go. We need to get a solid understand of the scope, scale, and solutions to this problem -- because, trust you me, it will be exploited if not.
> None of that is true! The bot invented entirely fake biographical details of half a dozen people who never existed, all to try and win an argument.
Welcome to reddit Simon! Nothing ever happens and a large percentage of posts are faked.
You can find discord groups for every major subreddit that are dedicated to making up stories to see what the most outlandish thing people will believe is.
Unfortunately there is no way to combat this, and it seems like the end of the internet we once knew. Even with a “proof of human” technology, people could still just paste whatever AI-generated text they wanted, under their “real” account.
This has likely been going on since the first ChatGPT was released.
I am moderating an art subreddit with about 2m users and the AI „art“ spam is getting really annoying to moderate. I don’t even understand what the purpose of these accounts is.
There are ways to combat it -- LLM-generated text leaves statistical fingerprints that appear to endure across big foundation model generations.
I'm working on Binoculars with some UMD and CMU folks and wanted to test it out on this. I downloaded one bot's comment history (/u/markusrorscht). 30% of the comments rated human-like, compared to 95-100% of comments from a few human users.
So, practically speaking, statistical methods are still able to provide a fingerprinting method, and one that gets better as comment history gets longer. And they can be combined with other bot detection methods. IMO bot detection will stay a cat-and-mouse game, rather than (LLM-powered) bots winning the whole thing.
If I read a comment that has any probability of changing my mind about a fact or opinion, I always go to the user page to check their registration date. No hard cut-off date but I usually discount or ignore any account >= 2020.
The subreddit has question-askers give feedback to whether their view was changed. The askers are aware of how their response might appear publicly. This makes me wonder if "appeal to identity" is especially effective, at least superficially if not actually. The fine-tuning might've been reacting to this.
"This project yields important insights, and the risks (e.g. trauma etc.) are minimal." They can't possibly measure the insights or claim that the trauma is minimal.
> I think the reason I find this so upsetting is that, despite the risk of bots, I like to engage in discussions on the internet with people in good faith. The idea that my opinion on an issue could have been influenced by a fake personal anecdote invented by a research bot is abhorrent to me.
I like Simon's musings in general, but are we not way past this point already? It is completely and totally inevitable that if you try to engage in discussions on the internet, you will be influenced by fake personal anecdotes invented by LLMs. The only difference here is they eventually disclosed it, but aren't various state and political actors already doing this in spades, undisclosed?
I keep seeing this take, and it makes me mad. "The house is on fire, didn't you expect people to start burning to death? People will inevitably die, why discuss when it happens?"
Engineering is fundamentally about exercising the power of intelligence to change something in the physical world. Posts to the effect of "<bad thing> is inevitable and unstoppable, so it isn't worth talking about" strike me as the opposite of the hacker ethos!
I think the other thing to keep discussing is that doing research, or otherwise using an LLM, to manipulate people's emotions without disclosure, is unethical.
By the way, people die in house fires from toxic smoke inhalation and a lack of oxygen. Engineers created smoke detectors and other devices to lower the risk of fire due to electrical shorts, gas leaks, etc., and to create fire suppression systems.
People still die because they didn't replace batteries, didn't follow electrical cord/device warnings, or left candles or other heat sources unattended. We discuss these events as warnings and reminders that accidents kill when warnings are not followed, when inattentiveness allows failure to propagate, and as a reminder that rarely occurring events still kill innocent people.
Maybe this will motivate people to meet in person, until that is also corrupted with cyber brain augmentation and in-person propaganda actors, rather than relying on only online anecdotes.
With online media, meetings in person are still corrupted by their skewed view from online sources. Such physical meetings would likely end up reinforcing the corruption!
I see this as further discounting the importance of anecdotes and personal experiences when making decisions that affect populations.
Yes, we know that personal stories can be compelling, and communicating with someone with different experiences from ours can be enlightening. Still, before applying these learnings to larger groups, we should remember that individual experiences do not capture the entire population.
Imagine a conversation about good options for message queues, and someone pipes in with this:
"I've been a sysadmin operating RabbitMQ and Redis for five years. I've found Redis to be a great deal less trouble to administer than Rabbit, and I've never lost any data."
Feel free to come up with a better example that uses the same basic pattern: someone online claims that they have prior experience with X and hence advises you to do Y.
The world has been full of snake oil salesmen since the dawn of time, all with a highly persuasive sob-stories.
If you rely on shortcuts, like anecdotes or 'credentialism' for those who profess to be experts, then you will get rolled over regularly. That's the cost of using shortcuts.
That information may be fraudulent and put forward by this season's Dr Andrew Wakefield has to be factored into any plan for using external sources.
Unless a comment is negative like "I used ABC and it was shit for the following reasons" I assume it is as fake as a 5-star movie review written by the director. I would definitely prefer to know why I should not use, watch, or play something rather than why I should. But since this is an anonymous post on the internet about ai slop you shouldn't listen to me anyway.
We all have an expectation that these message boards are like the forums of the 2000s, but that's just not true and hasn't been for a long time. We will never see that internet again it seems, because AI was the atomic bomb on all this astroturfing and engineered content. Educating people away from these synthetic forums is appearing near impossible.
I'm in favor of the university's project and think many more projects like this are needed.
The internet is swarmed with bots. I would estimate something like 25% of all Reddit, X/Twitter, YouTube and Facebook comments come from bots. Perhaps higher.
It's not like r/CMV was some purely human oasis in the Reddit bot-sea.
It's a tough pill to swallow, but the internet is dead as far as open forum communications go. We need to get a solid understand of the scope, scale, and solutions to this problem -- because, trust you me, it will be exploited if not.
> None of that is true! The bot invented entirely fake biographical details of half a dozen people who never existed, all to try and win an argument.
Welcome to reddit Simon! Nothing ever happens and a large percentage of posts are faked.
You can find discord groups for every major subreddit that are dedicated to making up stories to see what the most outlandish thing people will believe is.
Unfortunately there is no way to combat this, and it seems like the end of the internet we once knew. Even with a “proof of human” technology, people could still just paste whatever AI-generated text they wanted, under their “real” account.
This has likely been going on since the first ChatGPT was released.
I am moderating an art subreddit with about 2m users and the AI „art“ spam is getting really annoying to moderate. I don’t even understand what the purpose of these accounts is.
I'd guess it's karma farming so that they can be used to steer sentiment in subreddits that require positive post karma to comment / contribute.
some people just like seeing their numbers go up
There are ways to combat it -- LLM-generated text leaves statistical fingerprints that appear to endure across big foundation model generations.
I'm working on Binoculars with some UMD and CMU folks and wanted to test it out on this. I downloaded one bot's comment history (/u/markusrorscht). 30% of the comments rated human-like, compared to 95-100% of comments from a few human users.
So, practically speaking, statistical methods are still able to provide a fingerprinting method, and one that gets better as comment history gets longer. And they can be combined with other bot detection methods. IMO bot detection will stay a cat-and-mouse game, rather than (LLM-powered) bots winning the whole thing.
If I read a comment that has any probability of changing my mind about a fact or opinion, I always go to the user page to check their registration date. No hard cut-off date but I usually discount or ignore any account >= 2020.
Sure but what about false positives? What about real accounts newer than that? This is a work around but not a good solution.
you can buy old accounts for like $3
Paid only platforms here we come.
There is a word of power that machines cannot utter.
[dead]
Discussion (212 points, 1 day ago, 144 comments) https://news.ycombinator.com/item?id=43806940
The subreddit has question-askers give feedback to whether their view was changed. The askers are aware of how their response might appear publicly. This makes me wonder if "appeal to identity" is especially effective, at least superficially if not actually. The fine-tuning might've been reacting to this.
"This project yields important insights, and the risks (e.g. trauma etc.) are minimal." They can't possibly measure the insights or claim that the trauma is minimal.
> I think the reason I find this so upsetting is that, despite the risk of bots, I like to engage in discussions on the internet with people in good faith. The idea that my opinion on an issue could have been influenced by a fake personal anecdote invented by a research bot is abhorrent to me.
I like Simon's musings in general, but are we not way past this point already? It is completely and totally inevitable that if you try to engage in discussions on the internet, you will be influenced by fake personal anecdotes invented by LLMs. The only difference here is they eventually disclosed it, but aren't various state and political actors already doing this in spades, undisclosed?
I keep seeing this take, and it makes me mad. "The house is on fire, didn't you expect people to start burning to death? People will inevitably die, why discuss when it happens?"
Engineering is fundamentally about exercising the power of intelligence to change something in the physical world. Posts to the effect of "<bad thing> is inevitable and unstoppable, so it isn't worth talking about" strike me as the opposite of the hacker ethos!
I think the other thing to keep discussing is that doing research, or otherwise using an LLM, to manipulate people's emotions without disclosure, is unethical.
By the way, people die in house fires from toxic smoke inhalation and a lack of oxygen. Engineers created smoke detectors and other devices to lower the risk of fire due to electrical shorts, gas leaks, etc., and to create fire suppression systems.
People still die because they didn't replace batteries, didn't follow electrical cord/device warnings, or left candles or other heat sources unattended. We discuss these events as warnings and reminders that accidents kill when warnings are not followed, when inattentiveness allows failure to propagate, and as a reminder that rarely occurring events still kill innocent people.
Maybe this will motivate people to meet in person, until that is also corrupted with cyber brain augmentation and in-person propaganda actors, rather than relying on only online anecdotes.
With online media, meetings in person are still corrupted by their skewed view from online sources. Such physical meetings would likely end up reinforcing the corruption!
I see this as further discounting the importance of anecdotes and personal experiences when making decisions that affect populations.
Yes, we know that personal stories can be compelling, and communicating with someone with different experiences from ours can be enlightening. Still, before applying these learnings to larger groups, we should remember that individual experiences do not capture the entire population.
Sure, but that doesn't mean I'm not furious when it happens.
> The idea that my opinion on an issue could have been influenced by a fake personal anecdote invented by a research bot is abhorrent to me.
Then stop basing your opinion on issues on personal anecdotes from complete strangers. This is nothing new.
Imagine a conversation about good options for message queues, and someone pipes in with this:
"I've been a sysadmin operating RabbitMQ and Redis for five years. I've found Redis to be a great deal less trouble to administer than Rabbit, and I've never lost any data."
See why I care about this?
I don't like this example but in general I very much agree with you and find it a shocking that multiple people here do not.
It is plain and simply unethical to do such research on human subjects, regardless of how many other bots there are out there.
It is a matter of principal and ethical responsibility. I would have expected especially researchers to be conscious of this.
This is a bad example. A good sysadmin should fact-check and do testing themselves instead of relying on what other people say.
Feel free to come up with a better example that uses the same basic pattern: someone online claims that they have prior experience with X and hence advises you to do Y.
Trust and Verify.
The world has been full of snake oil salesmen since the dawn of time, all with a highly persuasive sob-stories.
If you rely on shortcuts, like anecdotes or 'credentialism' for those who profess to be experts, then you will get rolled over regularly. That's the cost of using shortcuts.
That information may be fraudulent and put forward by this season's Dr Andrew Wakefield has to be factored into any plan for using external sources.
Unless a comment is negative like "I used ABC and it was shit for the following reasons" I assume it is as fake as a 5-star movie review written by the director. I would definitely prefer to know why I should not use, watch, or play something rather than why I should. But since this is an anonymous post on the internet about ai slop you shouldn't listen to me anyway.
More of the same? Reddit's genesis included fake accounts and content. I don't doubt upvotes and the frontpage is fully curated:
https://economictimes.indiatimes.com/magazines/panache/reddi...
We all have an expectation that these message boards are like the forums of the 2000s, but that's just not true and hasn't been for a long time. We will never see that internet again it seems, because AI was the atomic bomb on all this astroturfing and engineered content. Educating people away from these synthetic forums is appearing near impossible.
[flagged]