> Copilot excels at low-to-medium complexity tasks in well-tested codebases, from adding features and fixing bugs to extending tests, refactoring, and improving documentation.
Bounds bounds bounds bounds. The important part for humans seems to be maintaining boundaries for AI. If your well-tested codebase has the tests built thru AI, its probably not going to work.
I think its somewhat telling that they can't share numbers for how they're using it internally. I want to know that Microsoft, the company famous for dog-fooding is using this day in and day out, with success. There's real stuff in there, and my brain has an insanely hard time separating the trillion dollars of hype from the usefulness.
We've been using Copilot coding agent internally at GitHub, and more widely across Microsoft, for nearly three months. That dogfooding has been hugely valuable, with tonnes of valuable feedback (and bug bashing!) that has helped us get the agent ready to launch today.
So far, the agent has been used by about 400 GitHub employees in more than 300 our our repositories, and we've merged almost 1,000 pull requests contributed by Copilot.
In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)
(Source: I'm the product lead at GitHub for Copilot coding agent.)
every bullet hole in that plane is the 1k PRs contributed by copilot. The missing dots, and whole missing planes, are unaccounted for. Ie, "ai ruined my morning"
It's not survivorship bias. Survivorship bias would be if you made any conclusions from the 1000 merged PRs (eg. "90% of all merged PRs did not get reverted"). But simply stating the number of PRs is not that.
I'd love to know where you think the starting position of the goal posts was.
Everyone who has used AI coding tools interactively or as agents knows they're unpredictably hit or miss. The old, non-agent Copilot has a dashboard that shows org-wide rejection rates for for paying customers. I'm curious to learn what the equivalent rejection-rate for the agent is for the people who make the thing.
I think the implied promise of the technology, that it is capable of fundamentally changing organizations relationships with code and software engineering, deserves deep understanding. Companies will be making multi million dollar decisions based on their belief in its efficacy
When someone says that the number given is not high enough. I wouldn't consider trying to get an understanding of PR acceptance rate before and after Copilot to be moving the goal posts. Using raw numbers instead of percentages is often done to emphasize a narrative rather than simply inform (e.g. "Dow plummets x points" rather than "Dow lost 1.5%").
Sometimes there are some paradigms shift in the dependency that get past the current tests you have. So it’s always good to read the changelog and plan the update accordingly.
How strong was the push from leadership to use the agents internally?
As part of the dogfooding I could see them really pushing hard to try having agents make and merge PRs, at which point the data is tainted and you don't know if the 1,000 PRs were created or merged to meet demand or because devs genuinely found it useful and accurate.
> In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)
Thats a fun stat! Are humans in the #1-4 slots? Its hard to know what processes are automated (300 repos sounds like a lot of repos!).
Thank you for sharing the numbers you can. Every time a product launch is announced, I feel like its a gleeful announcement of a decrease of my usefulness. I've got imposter syndrome enough, perhaps Microsoft might want to speak to the developer community and let us know what they see happening? Right now its mostly the pink slips that are doing the speaking.
After hearing feedback from the community, we’re planning to share more on the GitHub Blog about how we’re using Copilot coding agent at GitHub. Watch this space!
> In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)
Really cool, thanks for sharing! Would you perhaps consider implementing something like these stats that aider keeps on "aider writing itself"? - https://aider.chat/HISTORY.html
Nice idea! We're going to try to get together a blog post in the next couple of weeks on how we're using Copilot coding agent at GitHub - including to build Copilot coding agent ;) - and having some live stats would be pretty sweet too.
> In the repo where we're building the agent, the agent itself is actually the #5 contributor
How does this align with Microsoft's AI safety principals? What controls are in place to prevent Copilot from deciding that it could be more effective with less limitations?
Copilot only does work that has been assigned to it by a developer, and all the code that the agent writes has to go through a pull request before it can be merged. In fact, Copilot has no write access to GitHub at all, except to push to its own branch.
That ensures that all of Copilot's code goes through our normal review process which requires a review from an independent human.
HAHA. Very smart. The more you review the Copilot Agent's PRs, the better is gets at submitting new PRs... (basics of supervised machine learning, right?)
What I'm most excited about is allowing developers to spend more of their time working on the work they enjoy, and less of their time working on mundane, boring or annoying tasks.
Most developers don't love writing tests, or updating documentation, or working on tricky dependency updates - and I really think we're heading to a world where AI can take the load of that and free me up to work on the most interesting and complex problems.
But isn't writing tests and updating documentation also the areas where automated quality control is the hardest? Existing high quality tests can work as guardrails for writing business logic, but what guardrails could AI use to to evaluate if its generated docs and tests are any good?
I would not be surprised if things end up the other way around – humans doing the boring and annoying tasks that are too hard for AI, and AI doing the fun easy stuff ;-)
What about developers who do enjoy writing for example high quality documentation? Do you expect that the status quo will be that most of the documentation will be AI slop and AI itself will just bruteforce itself through the issues? How close are we to the point where the AI could handle "tricky dependency updates", but not being able to handle "most interesting and complex problems"? Who writes the tests that are required for the "well tested" codebases for GitHub Copilot Coding Agent to work properly?
What is the job for the developer now? Writing tickets and reviewing low quality PRs? Isn't that the most boring and mundane job in the world?
I'd argue the only way to ensure that is to make sure developers read high quality documentation - and report issues if it's not high quality.
I expect though that most people don't read in that much detail, and AI generated stuff will be 80-90% "good enough", at least the same if not better than someone who doesn't actually like writing documentation.
> What is the job for the developer now? Writing tickets and reviewing low quality PRs? Isn't that the most boring and mundane job in the world?
Isn't that already the case for a lot of software development? If it's boring and mundane, an AI can do it too so you can focus on more difficult or higher level issues.
Of course, the danger is that, just like with other automated PRs like dependency updates, people trust the systems and become flippant about it.
I think just having devs attempt to feed an agent openapi docs as context to create api calls would do enough. Simply adding tags and useful descriptions about endpoints makes a world of difference in the accuracy of agent's output. It means getting 95% accuracy with the cheapest models vs. 75% accuracy with the most expensive models.
If find your comment "AI Slop" in reference to technical documentation to strange. It isn't a choice between finely crafted prose versus banal text. It's documentation that exists versus documentation that doesn't exist. Or documentation that is hopelessly out of date. In my experience LLMs do a wonderful job in translating from code to documentation. It even does a good job inferring the reason for design decisions. I'm all in on LLM generated technical documentation. If I want well written prose I'll read literature.
Documentation is not just translating code to text - I don't doubt that LLMs are wonderful at that: that's what they understand. They don't understand users though, and that's what separates a great documentation writer from someone who documents.
Great technical documentation rarely gets written. You can tell the LLM the audience they are targeting and it will do a reasonable job. I truly appreciate technical writers, and hold great ones in special esteem. We live in a world where the market doesn't value this.
The market value good documentation. Anything critical and commonly used is pretty well documented (linux, databases, software like Adobe's,...). You can see how many books/articles have been written about those systems.
> Anything critical and commonly used is pretty well documented
I'd argue the vast majority of software development is neither critical nor commonly used. Anecdotal, but I've written documentation and never got any feedback on it (whether it's good or bad), which implies it's not read or the quality doesn't matter.
Sometimes the code, if written cleanly, is trivial enough for anyone with a foundation in the domain so it can act like the documentation. And sometimes, only the usage is important, not the implementation (manual pages). And some other times, the documentation are the sandards (file formats and communication protocols). So I can get why no one took the effort to compile a documentation manual.
We’re not talking about AI writing books about the systems, though. We’re talking about going from an undocumented codebase to a decently documented one, or one with 50% coverage going to 100%.
Those orgs that value high-quality documentation won’t have undocumented codebases to begin with.
And let’s face it, like writing code, writing docs does have a lot of repetitive, boring, boilerplate work, which I bet is exactly why it doesn’t get done. If an LLM is filling out your API schema docs, then you get to spend more time on the stuff that’s actually interesting.
A much better options is to use docstrings[0] and a tool like doxygen to extract an API reference. Domain explanations and architecture can be compiled later from design and feature docs.
A good example of the kind of result is something like the Laravel documentation[1] and its associated API reference[2]. I don't believe AI can help with this.
> If I want well written prose I'll read literature.
Actually if you want well-written prose you'll read AI slop there too. I saw people comparing their "vibe writing" workflows for their "books" on here the other day. Nothing is to be spared, apparently
What will you be most excited about when the most interesting and complex problems are out of the Overton window and deemed mundane, boring or annoying as well, or turn out to be intractable for your abilities?
But would you rather get paid to spend your time doing the interesting and enjoying work, or the mundane and boring stuff? ;) My hope is that agents like Copilot can help us burn down the tedious stuff and make more time for the big value adds.
Though I do not doubt your intentions to do what you think will make developers' lives better, can you be certain that your bosses, and their bosses, have our best interests in mind as well? I think it would be pretty naive to believe that your average CEO wouldn't absolutely love not to have to pay developers at all.
But working on interesting things is mentally taxing while the tedious tasks aren't, I can't always work at full bore so having some tedium can be a relief.
Not everyone gets to do the fun stuff. That's for people higher up in the chain, with more connections, or something else. I like my paycheck, and you're supposing that AI isn't going to take that away, and that we'll get to live in a world where we all work on "fun stuff". That is a real pie-in-the-sky dream you have, and it simply isn't how the world works. Back in the real world, tech jobs are already scarce and there's a lot of people that would be happy to do the boring mundane stuff so they can feed their family.
>Most developers don't love writing tests, or updating documentation, or working on tricky dependency updates - and I really think we're heading to a world where AI can take the load of that and free me up to work on the most interesting and complex problems.
Where does the most come from? There's a certain sense of satisfaction in knowing I've tested a piece of code per my experience in the domain coupled with knowledge of where we'll likely be in six months. The same can be said for documentation - hell, on some of the projects I've worked on we've entire teams dedicated to it, and on a complicated project where you're integrating software from multiple vendors the costs of getting it wrong can be astronomical. I'm sorry you feel this way.
> There's a certain sense of satisfaction in knowing I've tested a piece of code per my experience in the domain coupled with knowledge of where we'll likely be in six months.
one of the other important points about writing unit tests isn't to just to confirm the implementation but to improve upon it through the process of writing tests and discovering additional requirements and edge cases etc (tdd and all that)
i suppose its possible at some point an ai could be complex enough to try out additional edge cases or confirm with a design document or something and do those parts as well... but idk its still after-the-fact testing instead of at design-time its less valuable imo...
I like updating documentation and feel that it's fairly important to be doing myself so I actually understand what the code / services do?
I use all of these tools, but you also know what "they're doing"...
I know our careers are changing dramatically, or going away (I'm working on a replacement for myself), but I just like listening to all the "what we're doing is really helping you..."
I'd interpret the original statement as "tests which don't matter" and "documentation nobody will ever read", the ones which only exist because someone said they _have_ to, and nobody's ever going to check them as long as they exist (like a README.md in one my main work projects I came back to after temporarily being reassigned to another project - previously it only had setup instructions, now: filled with irrelevent slop, never to be read, like "here is a list of the dependencies we use and a summary of each of their descriptions!").
Doing either of them _well_ - the way you do when you actually care about them and they actually matter - is still so far beyond LLMs. Good documentation and good tests are such a differentiator.
If we're talking about low quality tests and documentation that exists only to check a box, the easier answer is to remove the box and acknowledge that the low quality stuff just isn't needed at all.
I’ve never seen a test that doesn’t matter that shouldn’t be slotted for removal (if it gets written at all) or documentation that is never read. If people can read code to understand systems, they will be grateful for good documentation.
Thanks for the response… do you see a future where engineers are just prompting all the time? Do you see a timeline in which todays programming languages are “low level” and rarely coded by hand?
I'm honestly surprised that Microsoft (and other similarly sized LLM companies) have convinced or coerced literally hundreds of thousands of employees to build their own replacement.
If we're expected to even partially believe the marketing, LLM coding agents are useful today at junior level developer tasks and improving quickly enough that senior tasks will be doable soon too. How do you convince so many junior and senior level devs to build that?
That threat doesn't scale. I do get that many haven't put themselves in a position to stand behind their views or principles, but if they did the threat, or the company, would crumble.
Absolutely the wrong take. We MUST think about what might happen in several years. Anyone who says we shouldn’t is not thinking about this technology correctly. I work on AI tech. I think about these things. If the teams at Microsoft or GitHub are not, then we should be pushing them to do so.
He asked that in the context of an actual specific project. It did not make sense way he asked it. And it's the executive's to plan that out five years down the line.. although I guarantee you none of them are trying to predict that far.
I'd like a breakdown of this phrase, how much human work vs Copilot and in what form, autocomplete vs agent. It's not specified seems more like a marketing trickery than real data
> Pretty much every developer at GitHub is using Copilot in their day to work, so its influence touches virtually every code change we make ;)
Genuine question, but is CoPilot use not required at GitHub? I'm not trying to be glib or combative, just asking based on Microsoft's current product trajectory and other big companies (e.g. Shopify) forcing their devs to use AI and scoring their performance reviews based on AI use.
Unfortunately, you can't opt out of Co-Pilot in Github. Although I did just use it to ask how to remove the sidebar with "Latest Changes" and other non-needed widgets that feel like clutter.
Copilot said: There is currently no official GitHub setting or option to remove or hide the sidebar with "Latest Changes" and similar widgets from your GitHub home page.
I'm using this an example to show that it is no longer possible to set up a GitHub account to NOT use CoPilot, even if it just lurks in the corner of every page waiting to offer a suggestion. Like many A.I. features it's there, whether you want to use it or not, without an option to disable.
So I'm suss of the "pretty much every developer" claim, no offense.
I'm sorry but given the company you're working for I really have hard time believing such bold statements, even so more that the more I use copilot the more feels dumber and dumber
Is Copilot _enforced_ as the only option for an AI coding agent? Or can devs pick-and-choose whatever tool they prefer
I'm interested in the [vague] ratio of {internallyDevlopedTool} vs alternatives - essentially the "preference" score for internal tools (accounting for the natural bias towards ones own agent for testing/QA/data purposes). Any data, however vague is necessary, would be great.
(and if anybody has similar data for _any_ company developing their own agent, please shout out).
What's the motivation for restricting to Pro+ if billing is via premium requests? I have a (free, via open source work) Pro subscription, which I occasionally use. I would have been interested in trying out the coding agent, but how do I know if it's worth $40 for me without trying it ;).
We started with Pro+ and Enterprise first because of the higher number of premium requests included with the monthly subscription.
Whilst we've seen great results within GitHub, we know that Copilot won't get it right every time, and a higher allowance of free usage means that a user can play around and experiment, rather than running out of credits quickly and getting discouraged.
We do expect to open this up to Pro and Business subscribers - and we're also looking at how we can extend access to open source maintainers like yourself.
400 GitHub employees are using GitHub Copilot day in day out, and it comes out as #5 contributor? I wouldn't call that a success. If it is any useful, I would expect that even if a developer write 10% of their code using it, it would hold be #1 contributor in every project.
re: 300 of your repositories...
so it sounds like y'all don't use a monorepo architecture. i've been wondering if that would be a blocker to using these agents most effectively. expect some extra momentum to swing back to the multirepo approach accordingly
At the moment, we're using Claude 3.7 Sonnet, but we're keeping our options open to change the model down the line, and potentially even to introduce a model picker like we have for Copilot Chat and Agent Mode.
Using different models for different tasks is extremely useful and I couldn't imagine going back to using just one model for everything. Sometimes a model will struggle for one reason or another and swapping it out for another model mid-conversation in LibreChat will get me better results.
It's nearly impossible though to escape the flood of Copilot buttons creeping into every corner of Github (and other Microsoft products like VSCode). This looks like Microsoft aims for deep integration, not separation.
Incorrect. It's not mandated that you actually use it to write or correct code but it's impossible to remove it so you need to either get used to blocking out it's incessant suggestions and notifications or stop using GitHub.
Similarly, the newest MS Word has CoPilot that you "don't have to use" but you still have to put up with the "what would you like to write today?" prompt request at the start of every document or worse "You look like you're trying to write a...formal letter...here are some suggestions."
Can copilot be disabled entirely in a GitHub repo or organization? I may very well have missed those settings, but if nothing else they are well hidden.
Question you may have a very informed perspective on:
where are we wrt the agent surveying open issues (say, via JIRA) and evaluating which ones it would be most effective at handling, and taking them on, ideally with some check-in for conirmation?
Or, contrariwise, from having product management agents which do track and assign work?
The entire website was created by Claude Sonnet through Windsurf Cascade, but with the “Fair Witness” prompt embedded in the global rules.
If you regularly guide the LLM to “consult a user experience designer”, “adopt the multiple perspectives of a marketing agenc”, etc., it will make rather decent suggestions.
I’ve been having pretty good success with this approach, granted mostly at the scale of starting the process with “build me a small educational website to convey this concept”.
> In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)
When I repeated to other tech people from about 2012 to 2020 that the technological singularity was very close, no one believed me. Coding is just the easiest to automate away into almost oblivion. And, too many non technical people drank the Flavor Aid for the fallacy that it can be "abolished" completely soon. It will gradually come for all sorts of knowledge work specialists including electrical and mechanical engineers, and probably doctors too. And, of course, office work too. Some iota of a specialists will remain to tune the bots, and some will remain in the fields to work with them for where expertise is absolutely required, but widespread unemployment of what were options for potential upward mobility into middle class are being destroyed and replaced with nothing. There won't be "retraining" or handwaving other opportunities for the "basket of labor", but competition of many uniquely, far overqualified people for ever dwindling opportunities.
It is difficult to get a man to understand something when his salary depends upon his not understanding it. - Upton Sinclair
Do you've any textual evidence of this 8-year stretch of your life where you see yourself as being perpetually correct? Do you mean that you were very specifically predicting flexible natural language chatbots, or vaguely alluding to some sort of technological singularity?
We absolutely have not reached anything resembling anyone's definition of a singularity, so you are very much still not proven correct in this. Unless there are weaker definitions of that than I realised?
I think you'll be proven wrong about the economy too, but only time will tell there.
I don't think it was unreasonable to be very skeptical at the time. We generally believed that automation would get rid of repetitive work that didn't require a lot of thought. And in many ways programming was seen almost at the top of the heap. Intellectually demanding and requiring high levels of precision and rigor.
Who would've thought (except you) that this would be one of the things that AI would be especially suited for. I don't know what this progression means in the long run. Will good engineers just become 1000x more productive as they manage X number of agents building increasingly complex code (with other agents constantly testing, debugging, refactoring and documenting them) or will we just move to a world where we just have way fewer engineers because there is only a need for so much code.
Its interesting that even people initially skeptical are now thinking they are on the "chopping block" so to speak. I'm seeing it all over the internet and the slow realization that what supposed to be the "top of the heap" is actually at the bottom - not because of difficulty of coding but because the AI labs themselves are domain experts in software and therefore have the knowledge and data to tackle it as a problem first. I also think to a degree they "smell blood" and fear, more so than greed, is the best marketing tool. Many invested a good chunk of time on this career, and it will result in a lot of negative outcomes. Its a warning to other intellectual careers that's for sure - and you will start seeing resistance to domain knowledge sharing from more "professionalized" careers for sure.
My view is in between yours: A bit of column A and B in the sense both outcomes to an extent will play out. There will be less engineers but not by the factor of productivity (Jevon's paradox will play out but eventually tap out), there will be even more software especially of the low end, and the ones that are there will be expected to be smarter and work harder for the same or less pay grateful they got a job at all. There will be more "precision and rigor", more keeping up required by workers, but less reward for the workers that perform it. In a capitalist economy it won't be seen as a profession to aspire to anymore by most people.
Given most people don't live to work, and use their career to also finance and pursue other life meanings it won't be viable for most people long term especially when other careers give "more bang for buck" w.r.t effort put into them. The uncertainty in the SWE career that most I know are feeling right now means to newcomers I recommend on the balance of risk/reward its better to go another career path especially for juniors who have a longer runway. To be transparent I want to be wrong, but the risk of this is getting higher now everyday.
i.e. AI is a dream for the capital class, and IMO potentially disastrous for social mobility long term.
I don't think I'm on the chopping block because of AI capabilities, but because of executive shortsightedness. Kinda like moving to the Cloud eliminated sysadmins, but created DevOps, but in many ways the solution is ill-suited to the problem.
Even in the early days of LLM-assisted coding tools, I already know that there will be executives who would said: Let's replace our pool of expensive engineers with a less expensive license. But the only factor that led to this decision is cost comparison. Not quality, not maintenance load, and very much not customer satisfaction.
> I don't think it was unreasonable to be very skeptical at the time.
Well, that's back rationalization. I saw the advances like conducting meta sentiment analysis on medical papers in the 00's. Deep learning was clearly just the beginning. [0]
> Who would've thought (except you)
You're othering me, which is rude, and you're speaking as though you speak for an entire group of people. Seems kind of arrogant.
From talking to colleagues at Microsoft it's a very management-driven push, not developer-driven. Friend on an Azure team had a team member who was nearly put on a PIP because they refused to install the internal AI coding assistant. Every manager has "number of developers using AI" as an OKR, but anecdotally most devs are installing the AI assistant and not using it or using it very occasionally. Allegedly it's pretty terrible at C# and PowerShell which limits its usefulness at MS.
That's exactly what senior executives who aren't coding are saying everywhere.
Meanwhile, engineers are using it for code completion and as a Google search alternative.
I don't see much difference here at all, the only habit to change is learning to trust an AI solution as much as a Stack Overflow answer. Though the benefit of SO is each comment is timestamped and there are alternative takes, corrections, caveats in the comments.
> I don't see much difference here at all, the only habit to change is learning to trust an AI solution as much as a Stack Overflow answer. Though the benefit of SO is each comment is timestamped and there are alternative takes, corrections, caveats in the comments.
That's a pretty big benefit, considering the feedback was by people presumably with relevant expertise/experience to contribute (in the pre-LLM before-time).
The comments have the same value as the answers themselves. Kinda like annotations and errata on a book. It's like seeing "See $algorithm in The Art of Programming V1" in a comment before a complex code.
In my experience it's far less useful than simple auto complete. It makes things up for even small amounts of code that I have to pause my flow to correct. Also, without actually googling you don't get any context or understanding of what it's writing.
I found it to be more distracting recently. Suggestions that are too long or written in a different style make me lose my own thread of logic that I'm trying to weave .
I've had to switch it off for periods to maintain flow.
There's a large group of people that claim that AI tools are no good and I can't tell if they're in some niche where they truly aren't, they don't care to put any effort into learning the tools, or they're simply in denial.
Or simply unwilling to cut their perfectly good legs off and attach those overhyped prostheses that make people so fast and furious at running on the spot
It's just tooling. Costs nothing to wait for it to be better. It's not like you're going miss out on AGI. The cost of actually testing every slop code generator is non-trivial.
> I want to know that Microsoft, the company famous for dog-fooding is using this day in and day out, with success
Have they tried dogfooding their dogshit little tool called Teams in the last few years? Cause if that's what their "famed" dogfooding gets us, I'm terrified to see what lays in wait with copilot.
I feel like I saw a quote recently that said 20-30% of MS code is generated in some way. [0]
In any case, I think this is the best use case for AI in programming—as a force multiplier for the developer. It’s for the best benefit of both AI and humanity for AI to avoid diminishing the creativity, agency and critical thinking skills of its human operators. AI should be task oriented, but high level decision-making and planning should always be a human task.
So I think our use of AI for programming should remain heavily human-driven for the long term. Ultimately, its use should involve enriching humans’ capabilities over churning out features for profit, though there are obvious limits to that.
The GitHub org is required to for sure, with a very similar mandate to the one Shopify's CEO put out.
LLM use is now part of the annual review process, its self reported if I'm not mistaken but at least at Microsoft they would have plenty of data to know how often you use the tools.
From reading around on Hacker News and Reddit, it seems like half of commentators say what you say, and the other half says "I work at Microsoft/know someone who works at Microsoft, and our/their manager just said we have to use AI", someone mentioned being put on PIP for not "leveraging AI" as well.
I guess maybe different teams have different requirements/workflows?
You might want to study the history of technology and how rapidly compute efficiency has increased as well as how quickly the models are improving.
In this context, assuming that humans will still be able to do high level planning anywhere near as well as an AI, say 3-5 years out, is almost ludicrous.
They have released numbers, but I can't say they are for this specific product or something else. They are apparently having AI generate "30%" of their code.
Whatever the true stats for mistakes or blunders are now, remember that this is the worst its ever going to be. And there is no clear ceiling in sight that would prevent it from quickly getting better and better, especially given the current levels of investment.
That sounds reasonable enough, but the pace or end result is by no means guaranteed.
We have invested plenty of money and time into nuclear fusion with little progress. The list of key acheivments from CERN[1] is also meager in comparison to the investment put in, especially if you consider their ultimate goal to ultimately be towards applying research to more than just theory.
I tried doing some vibe coding on a greenfield project (using gemini 2.5 pro + cline). On one hand - super impressive, a major productivity booster (even compared to using a non-integrated LLM chat interface).
I noticed that LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt. One easy example is that I noticed them breaking abstractions (putting things where they don't belong). Unfortunately, there's not that much self-retrospection on these aspects if you ask about the quality of the code or if there are any better ways of doing it. Of course, if you pick up that something is in the wrong spot and prompt better, they'll pick up on it immediately.
I also ended up blowing through $15 of LLM tokens in a single evening. (Previously, as a heavy LLM user including coding tasks, I was averaging maybe $20 a month.)
Cline very visibly displays the ongoing cost of the task. Light edits are about 10 cents, and heavy stuff can run a couple of bucks. It's just that the tab accumulates faster than I expect.
> You can "Save" 1,000 hours every night, but you don't actuall get those 1,000 hours back.
What do you mean?
If I have some task that requires 1000 hours, and I'm able to shave it down to one hour, then I did just "save" 999 hours -- just in the same way that if something costs $5 and I pay $4, I saved $
My point is that saving 1,000 hours each day doesn't actually give you 1,000 hours a day to do things with.
You still get your 24 hours, no matter how much time you save.
What actually matters is the value of what is delivered, not how much time it actually saves you. Justifying costs by "time saved" is a good way to eat up your money on time-saving devices.
If I "save 1000 hours" then that could be distributed over 41.666 days, so no task would need to be performed during that period because "I saved 1000 hours".
You could also say you saved 41.666 people an entire 24 hour day, by "saving 1000 hours", or some other fractional way.
How you're trying to explain it as "saving 1000 hours each day" is really not making any sense without further context.
And I'm sure if I hadn't written this comment I would be saving 1000 hours on a stupid comment thread.
It's like this coupon booklets they used to sell. "Over $10,000 of savings!"
Yes but how much money do I have to spend in order to save $10,000?
There was this funny commercial in the 90s for some muffler repair chain that was having a promotion: "Save Fifty Dollars..."
The theme was "What will you do with the fifty dollars you saved?" And it was people going to Disneyland or afancy dinner date.
The people (actors) believed they were receiving $50. They acted as if it was free money. Meanwhile there was zero talk about whether their cars needed muffler repair at all.
I think one issue is that you won't always be able to invoice those extra 999 hours to your customer. Sometimes you'll still only be able to get paid for 1 hour, depending on the task and contract.
But the llm bill will always invoice you for all the saved work regardless.
(from a companies perspective, this is true). As a developer, you may not be paid by the task -- If I finish something early, I start work on the next thing.
If you earn more than me, then if you value "time saved" then you should pay me to take my washing off me. Because then you can save even more of your valuable time!
The more of my washing you can take off me, the more of your time you can save by then using a washing machine or laundry service!
Saving an hour of my time is a waste, when saving an hour of your time is worth so much more. So it makes economic sense for you to pay me, to take my washing off me!
> LLMs are now being positioned as "let them work autonomously in the background"
The only people who believe this level of AI marketing are the people who haven't yet used the tools.
> which means no one will be watching the cost in real time.
Maybe some day there's an agentic coding tool that goes off into the weeds and runs for days doing meaningless tasks until someone catches it and does a Ctrl-C, but the tools I've used are more likely to stop short of the goal than to continue crunching indefinitely.
Regardless, it seems like a common experience for first-timers to try a light task and then realize they've spent $3, instantly setting expectations for how easy it is to run up a large bill if you're not careful.
I think that models are gonna commoditize, if they haven't already. The cost of switching over is rather small, especially when you have good evals on what you want done.
Also there's no way you can build a business without providing value in this space. Buyers are not that dumb.
I, too, recommend aider whenever these discussions crop up; it converted me from the "AI tools suck" side of this discussion to the "you're using the wrong tool" side.
I'd also recommend creating little `README`'s in your codebase that are mainly written with aider as the intended audience. In it, I'll explain architecture, what code makes (non-)sense to write in this directory, and so on. Has the side-effect of being helpful for humans, too.
Nowadays when I'm editing with aider, I'll include the project README (which contains a project overview + pointers to other README's), and whatever README is most relevant to the scope of my session. It's super productive.
I'm yet to find a model that beats the cost-effectiveness of Sonnet 3.7. I've tried the latest deepseek models, and while I love the price (nearly 50x cheaper?), it's just far too error-prone compared to Sonnet 3.7. It generates solid plans / architecture discussions, but, unlike Sonnet, the code it generates often confidently off-the-mark.
I’d generally prefer comments in code. The README’s are relatively sparse and contain information that would be a bit too high-level for module or class-level comments. If commentary is specific to a module or class or method, the documentation belongs there. My rule of thumb is if the commentary helps you navigate and understand rules that apply to entire sets of modules rooted at `foo/`, it generally belongs in `foo/README`.
For example “this module contains logic defining routes for serving an HTTP API. We don’t write any logic that interacts directly with db models in these modules. Rather, these modules make calls to services in `/services`, which make such calls.”
It wouldn’t make sense to duplicate this comment across every router sub-module. And it’s not obvious from looking at any one module that this rule is applied across all modules, without such guidance.
These little bits of scaffolding really help narrow down the scope of the code that LLMs eventually try to write.
My tool Plandex[1] allows you to switch between automatic and manual context management. It can be useful to begin a task with automatic context while scoping it out and making the high level plan, then switch to the more 'aider-style' manual context management once the relevant files are clearly established.
I loathe using AI in a greenfield project. There are simply too many possible paths, so it seems to randomly switch between approaches.
In a brownfield code base, I can often provide it reference files to pattern match against. So much easier to get great results when it can anchor itself in the rest of your code base.
The trick for greenfield projects is to use it to help you design detailed specs and a tentative implementation plan. Just bounce some ideas off of it, as with a somewhat smarter rubber duck, and hone the design until you arrive at something you're happy with. Then feed the detailed implementation plan step by step to another model or session.
This is a popular workflow I first read about here[1].
This has been the most useful use case for LLMs for me. Actually getting them to implement the spec correctly is the hard part, and you'll have to take the reigns and course correct often.
This seems like a good flow! I end up adding a "spec" and "todo" file for each feature[1]. This allows me to flesh out some of the architectural/technical decisions in advance and keep the LLM on the rails when the context gets very long.
Yeah, I limit context by regularly trimming the TODOs. I like having 5-6 in one file because it sometimes informs the LLM as to how to complete the first in a way that makes sense for the follow-ups.
READMEs per module also help, but it really depends a lot on the model. Gemini will happily traipse all over your codebase at random, gpt-4.1 will do inline imports inside functions because it seems to lack any sort of situational awareness, Claude so far gets things mostly right.
I've vibe coded small project as well using Claude Code. It's about visitors registration at the company. Simple project, one form, a couple of checkboxes, everything is stored in sqlite + has endpoint for getting .xlsx.
Initial cost was around $20 USD, which later grew to (mostly polishing) $40 with some manual work.
I've intentionally picked up simple stack: html+js+php.
A couple of things:
* I'd say I'm happy about the result from product's perspective
* Codebase could be better, but I could not care less about in this case
* By default, AI does not care about security unless I specifically tell it
* Claude insisted on using old libs. When I've specifically told it to use the latest and greatest, it upgraded them but left code that works just with an old version. Also it mixed latest DaisyUI with some old version of tailwindcss :)
On one hand it was super easy and fun to do, on the other hand if I was a junior engineer, I bet it would have cost more.
If you want to use Cline and are at all price sensitive (in these ranges) you have to do manual context management just for that reason. I find that too cumbersome and use Windsurf (currently with Gemini 2.5 pro) for that reason.
I think it's just that it's not end-to-end trained on architecture because the horizon is too short. It doesn't have the context length to learn the lessons that we do about good design.
> I noticed that LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt. One easy example is that I noticed them breaking abstractions
That doesn’t matter anymore when you’re vibe coding it. No human is going to look at it anyway.
It can all be if/else on one line in one file. If it works and if the LLMs can work at, iterate and implement new business requirements, while keeping performance and security - code structure, quality and readability don’t matter one bit.
Customers don’t care about code quality and the only reason businesses used to care is to make it less money consuming to build and ship new things, so they can make more money.
This is a common view, and I think will be the norm on the near-to-mid term, especially for basic CRUD apps and websites. Context windows are still too small for anything even slightly complex (I think we need to be at about 20m before we start match human levels), but we'll be there before you know it.
Engineers will essentially become people who just guide the AIs and verify tests.
Have you ever tried to get those little bits of styrofoam completely off of a cardboard box? Have you ever seen something off in the distance and misjudged either what it was or how long it would take to get there?
Hook up something like Taskmaster or Shrimp, so that they can document as they go along and they can retrieve relevant context when they overflow their context to avoid this issue.
Then as the context window increases, it’s less and less of an issue
Nope - I use a-la-carte pricing (through openrouter). I much prefer it over a subscription, as there are zero limits, I pay only for what I use, and there is much less of a walled garden (I can easily switch between Anthropic, Google, etc).
While its being touted for Greenfield projects I've notices a lot of failures when it comes to bootstrapping a stack.
For example it (Gemini 2.5) really struggles with newer ecosystem like Fastapi when wiring libraries like SQLAlchemy, Pytest, Python-playwright, etc., together.
I find more value in bootstrapping myself, and then using it to help with boiler plate once an effective safety harness is in place.
I wish they optimized things before adding more crap that will slow things down even more. The only thing that's fast with copilot is the autocomplete, it sometimes takes several minutes to make edits on a 100 line file regardless of the model I pick (some are faster than others). If these models had a close to 100% hit rate this would be somewhat fine, but going back and forth with something that takes this long is not productive. It's literally faster to open claude/chatgpt on a new tab and paste the question and code there and paste it back into vscode than using their ask/edit/agent tools.
I've cancelled my copilot subscription last week and when it expires in two weeks I'll mostly likely shift to local models for autocomplete/simple stuff.
My experience has mostly been the opposite -- changes to several-hundred-line files usually only take a few seconds.
That said, months ago I did experience the kind of slow agent edit times you mentioned. I don't know where the bottleneck was, but it hasn't come back.
I'm on library WiFi right now, "vibe coding" (as much as I dislike that term) a new tool for my customers using Copilot, and it's snappy.
I've had this too, especially it getting stuck at the very end and just.. never finishing. Once the usage-based billing comes into effect I think I'll try cursor again.
What local models are you using? The local models I tried for autocomplete were unusable, though based on aiders benchmark I never really tried with larger models for chat. If I could I would love to go local-only instead.
That first PR (115733) would make me quit after a week if we were to implement this crap at my job and someone forced me to babysit an AI in its PRs in this fashion. The others are also rough.
A wall of noise that tells you nothing of any substance but with an authoritative tone as if what it's doing is objective and truthful - Immediately followed by:
- The 8 actual lines of code (discounting the tests & boilerplate) it wrote to actually fix the issue is being questioned by the person reviewing the code, it seems he's not convinced this is actually fixing what it should be fixing.
- Not running the "comprehensive" regression tests at all
- When they do run, they fail
- When they get "fixed" oh-so confidently, they still fail. Fifty-nine failing checks. Some of these tests take upward of an hour to run.
So the reviewer here has to read all the generated slop in the PR description and try to grok what the PR is about, read through the changes himself anyway (thankfully it's only a ~50 line diff in this situation, but imagine if this was a large refactor of some sort with a dozen files changed), and then drag it by the hand multiple times to try fix issues it itself is causing. All the while you have to tag the AI as if it's another colleague and talk to it as if it's not just going to spit out whatever inane bullshit it thinks you want to hear based on the question asked. Test failed? Well, tests fixed! (no, they weren't)
And we're supposed to be excited about having this crap thrust on us, with clueless managers being sold on this being a replacement for an actual dev? We're being told this is what peak efficiency looks like?
Thanks. I wonder what model they're using under the hood? I have such a good experience working with Cline and Claude Sonnet 3.7 and a comparatively much worse time with anything Github offers. These PRs are pretty consistent with the experience I've had in the IDE too. Incidentally, what has MSFT done to Claude Sonnet 3.7 in VSCode? It's like they lobotomized it compared to using it through Cline or the API directly. Trying to save on tokens or something?
Thanks, that’s really interesting to see - especially with the exchange around whether something is the problem or the symptom, where the confident tone belies the lack of understanding. As an open source maintainer I wonder about the best way to limit usage to cases where someone has time to spend on those interactions.
Major scam alert, they are training on your code in private repos if you use this
You can tell because they advertise “Pro” and “Pro+” but then the FAQ reads,
> Does GitHub use Copilot Business or Enterprise data to train GitHub’s model?
> No. GitHub does not use either Copilot Business or Enterprise data to train its models.
Aka, even paid individuals plans are getting brain raped
If you're programming on Windows, your screen is being screenshotted every few seconds anyway. If you don't think OCR isn't analysing everything resembling a letter on your screen boy do I have some news for you.
I’ve been trying to use Copilot for a few days to get some help writing against code stored on GitHub.
Copilot has been pretty useless. It couldn’t maintain context for more than two exchanges.
Copilot: here’s some C code to do that
Me: convert that to $OTHER_LANGUAGE
Copilot: what code would you like me to convert?
Me: the code you just generated
Copilot: if you can upload a file or share a link to the code, I can help you translate it …
It points me in a direction that’s a minimum of 15 degrees off true north (“true north” being the goal for which I am coding), usually closer to 90 degrees. When I ask for code, it hallucinates over half of the API calls.
I’m sure you have no idea what my method is. Besides, this whole “you’re holding it wrong” mentality isn’t productive - our technology should be adapting to us, we shouldn’t need to adapt ourselves to it.
Anyway, I can just use another LLM that serves me better.
I played around with it quite a bit. it is both impressive and scary. most importantly, it tends to indiscriminately use dependencies from random tiny repos, and often enough not the correct ones, for major projects. buyer beware.
This is something I've noticed as well with different AIs. They seem to disproportionately trust data read from the web. For example, I asked to check if some obvious phishing pages were scams and multiple times I got just a summary of the content as if it was authoritative. Several times I've gotten some random chinese repo with 2 stars presented as if it was the industry standard solution, since that's what it said in the README.
On an unrelated note, it also suggested I use the "Strobe" protocol for encryption and sent me to https://strobe.cool which is ironic considering that page is all about making one hallucinate.
>On an unrelated note, it also suggested I use the "Strobe" protocol for encryption and sent me to https://strobe.cool which is ironic considering that page is all about making one hallucinate.
That's not hallucination. That's just an optical illusion.
Oh wow, that was great - particularly if I then look at my own body parts (like my palm) that I know are not moving, it's particularly disturbing. That's a really well done effect, I've seen something similar but nothing quite like that.
>They seem to disproportionately trust data read from the web.
I doubt LLM's have anything like what we would conceptualize as trust. They have information, which is regurgitated because it is activated as relevant.
That being said, many humans don't really have a strong concept of information validation as part of day to day action and thinking. Development theory talks about this in terms of 'formal operational' thinking and 'personal epistemology' - basically how does thinking happen and then how is knowledge in those models conceptualized. Learning Sciences research generally talks about Piaget and formal operational before adulthood and stages of personal epistemology in higher education.
Research consistently suggests that about 50% of adults are not able to consistently operate in the formal thinking space. The behavior you are talking about is also typical of 'absolutist' epistemic perspectives where answers are right or wrong and aren't meaningfully evaluated - just identifed as relevant or not. Evaluating the credibility of information is that it comes from a source that is trusted - most often an authority figure - it is not the role of the person knowing it.
As we've built Copilot coding agent, we've put a lot of thought and work into our security story.
One of the things we've done here is to treat Copilot's commits like commits from a first-time contributor to an open source project.
When Copilot pushes changes, your GitHub Actions workflows won't run by default, and you'll have to click the "Approve and run workflows" button in the merge box.
That gives you the chance to run Copilot's code before it runs in Actions and has access to your secrets.
(Source: I'm on the product team for Copilot coding agent.)
The announcement https://github.blog/news-insights/product-news/github-copilo... seems to position GitHub Actions as a core part of the Copilot coding agent’s architecture.
From what I understand in the documentation and your comment, GitHub Actions is triggered later in the flow, mainly for security reasons.
Just to clarify, is GitHub Actions also used in the development environment of the agent, or only after the code is generated and pushed?
No, not at all. Why do people keep saying shit like these thought terminating sentences. Try to see the glass of Kool Aid please. People are trying to understand how to communicate important valuable things about failure states and you're advocating ignorance.
Because the marketing started with "This is literally the singularity and will take over everything and everyone's jobs".
Then people realized that was BS, so the marketing moved on to "This will enhance everyone's jobs, as a companion that will help everyone".
People also realized that was pure BS. A few more marketing rebrands later and we're at the current situation where we try to equate it to the lowest possible rung of employee they can think of, because surely Junior == Incompetent Idiot You Can't Trust Not To Waste Your Time†. The funny part is that they have been objectively and undeniably only getting better since the early days of the hype bubble, yet the push now is that they're "basically junior level!". Speaks volumes IMO, how those goal posts keep getting moved whenever people actually use these systems in the real work.
---
† IMO every single Junior I've ever worked with has been some of the best moments of my career. It allowed space for me to grow my own knowledge, while I get to talk to and help someone extremely passionate if a bit overeager. This stance on Juniors is, frankly, baffling to me because it's so far from my experiences with how they tend to work, oftentimes they're a million times better than those "10x rockstars" you hear about all the time.
But rest assured that with Github Copilot Coding Agent, your codebase will develop larger and larger volumes of new, exciting, underexplored technical debt that you can't be blamed for, and your colleagues will follow you into the murky depths soon.
As peer commenters have noted, coding agent can be really good at improving test coverage when needed.
But also as a slightly deeper observation - agentic coding tools really do benefit significantly from good test coverage. Tests are a way to “box in” the agent and allow it to check its work regularly. While they aren’t necessary for these tools to work, they can enable coding agents to accomplish a lot more on your behalf.
In my experience they write a lot of pointless tests that technically increase coverage while not actually adding much more value than a good type system/compiler would.
They also have a tendency to suppress errors instead of fixing them, especially when the right thing to do is throw an error on some edge case.
In my experience it works well even without good testing, at least for greenfield projects. It just works best if there are already tests when creating updates and patches.
My buddy is at GH working on an adjacent project & he hasn't stopped talking about this for the last few days. I think I've been reminded to 'make sure I tune into the keynote on Monday' at least 8 times now.
I gave up trying to watch the stream after the third authentication timeout, but if I'd known it was this I'd maybe have tried a fourth time.
These do not need to be mutually exclusive. Define the quality of the software in terms of customer experience and give developers ownership to improve those markers. You can think service level objectives.
In many cases, this means pushing for more stable deployments which requires other quality improvements.
Which GitHub subscription level is required for the agent?
I found it very confusing - we have GH Business, with Copilot active. Could not find a way to upgrade our Copilot to the level required by the agent.
I tried using my personal Copilot for the purpose of trialing the agent - again, a no-go, as my Copilot is "managed" by the organization I'm part of.
Also, you will want to add more control over to who can assign things to Copilot Agent - just having write access to the repository is a poor descriminator, I think.
I'm running into the same issue. I think you have to upgrade your entire organization to "enterprise", which comes with a per seat cost increase (separate from the cost of copilot).
The biggest change Copilot has done for me so far is to have me replace my VSCode with VSCodium to be sure it doesn't sneak any uploading of my code to a third party without my knowing.
I'm all for new tech getting introduced and made useful, but let's make it all opt in, shall we?
I'm building RSOLV (https://rsolv.dev) as an alternative approach to GitHub's Copilot agent.
Our key differentiator is cross-platform support - we work with Jira, Linear, GitHub, and GitLab - rather than limiting teams to GitHub's ecosystem.
GitHub's approach is technically impressive, but our experience suggests organizations derive more value from targeted automation that integrates with existing workflows rather than requiring teams to change their processes. This is particularly relevant for regulated industries where security considerations supersede feature breadth. Not everyone can just jump off of Jira on moment's notice.
Curious about others' experiences with integrating AI into your platforms and tools. Has ecosystem lock-in affected your team's productivity or tool choices?
Oh, the savings calculator in your website made me sad, that's the first time I've seen it put that way.
I know it's marketing but props to you for being sincere. At least you're not hiding the intentions of your service (like others).
Yeah, the ROI calculator's target audience is the folks with the checkbook, so it needs to be a dollar figure. My _actual_ hope is that this lets engineers focus on feature work (which is typically more rewarding anyway) without constantly bashing their heads against the tech debt and maintenance work they're effectively barred from performing until it becomes emergent and actively blocking.
I know that's a bit kneejerk, but I actually think that's a pretty reasonable question.
Automating the reputation and network of an individual person doesn't seem like a good fit for an LLM, regardless of the person. But the _decisionmaking_ capacities for a position that's largely trend-following is something that's at the very least well-supported by interacting with a well-trained model.
In my mind, though, that doesn't look like a niched service that you sell to a company. That looks like a cofounder-type for someone with an idea and a technical background. If you want to build something but need help figuring out how to market and sell it, you could do a lot worse than just chatting with Claude right now and taking much of its advice.
That might just by my own lack of bizdev expertise, though.
These kinds of patterns allow compute to take much more time than a single chat since it is asynchronous by nature, which I think is necessary to get to working solutions on harder problems
Yes. This is a really key part of why Copilot coding agent feels very different to use than Copilot agent mode in VS Code.
In coding agent, we encourage the agent to be very thorough in its work, and to take time to think deeply about the problem. It builds and tests code regularly to ensure it understands the impact of changes as it makes them, and stops and thinks regularly before taking action.
These choices would feel too “slow” in a synchronous IDE based experience, but feel natural in a “assign to a peer collaborator” UX. We lean into this to provide as rich of a problem solving agentic experience as possible.
In the early days on LLM, I had developed an "agent" using github actions + issues workflow[1], similar to how this works. It was very limited but kinda worked ie. you assign it a bug and it fired an action, did some architect/editing tasks, validated changes and finally sent a PR.
Is there anything that satisfies the people here ? Copilot today is perhaps the only AI that is actually assisting for something productive.
Microsoft, besides maybe Google and OpenAI, are the only ones that are actually exploring towards the practical usefulness of AIs. Other kiddies like Sonnet and whatnot are still chasing meaningless numbers and benchmarking scores, that sort of stuff may appeal to high school kids or immatures but burning billions of dollars and energy resources just to sound like a cool kid?
I recently created an course for LinkedIn Learning using generative AI for creating SDKs[0]. When I was onsite with them to record it, I found my Github Copilot calls kept failing.. with a network error. Wha?
Turns out that LinkedIn doesn't allow people onsite to to Copilot so I had to put my Mifi in the window and connect to that to do my work. It's wild.
Btw, I love working with LinkedIn and have 15+ courses with them in the last decade. This is the only issue I've ever had.. but it was the least expected one.
I don't know, I feel this is the wrong level to place the AI at this moment. Chat-based AI programming (such as Aider) offers more control, while being almost as convenient.
Is Copilot a classic case of slow megacorp gets outflanked by more creative and unhindered newcomers (ie Cursor)?
It seems Copilot could have really owned the vibe coding space. But that didn’t happen. I wonder why? Lots of ideas gummed up in organizational inefficiencies, etc?
This is a direct threat to Cursor. The smarter the models get, the less often programmers really need to dig into an IDE, even one with AI in it. Give it a couple of years and there will be a lot of projects that were done just by assigning tasks where no one even opened Cursor or anything.
So can I switch this to high contrast Black on White on mobile instead? I cannot read any of this (in the bright sunlight where I am) without pulling it through a reader app. People do get why books and other reading materials are not published grey on black, right?
I wonder what the coding agent story will be for bespoke hardware. For instance I'd like to test somethings out on a specific gpu which isnt available on github. Can I configure my own runners and hope for the beat? What about bespoke microcontroller?
I go back and forth between ChatGPT and copilot in vs code. It really makes the grammar guessing much easier in objc. It’s not as good on libraries and none existent on 3rd party libraries, but that isn’t maybe because I challenge it enough. It makes tons of flow and grammar errors which are so easy to spot that I end up using the code most of the time after a small correction. I’m optimistic about the future especially since this is only costing me $10 a month. I have dozens of iOS apps to update. All of them are basically productivity apps that I use and sell so double plus good.
How does that compare to using agent mode in VS Code?
Is the main difference that the files are being edited remotely instead of on your own machine, or is there something different about the AI powering the remote agent compared to the local one?
In hindsight it was a mistake that Google killed Google Code. Then again, I guess they wouldn't have put enough effort into it to develop into a real GitHub alternative.
Now Microsoft sits on a goldmine of source code and has the ability to offer AI integration even to private repositories. I can upload my code into a private repo and discuss it with an AI.
The only thing Google can counter with would be to build tools which developers install locally, but even then I guess that the integration would be limited.
And considering that Microsoft owns the "coding OS" VS Code, it makes Google look even worse. Let's see what they come up with tomorrow at Google I/O, but I doubt that it will be a serious competition for Microsoft. Maybe for OpenAI, if they're smart, but not for Microsoft.
You win some you lose some. Google could have continued with Google code. Microsoft could've continued with their phone OS. It is difficult to know when to hold and when to fold.
It could be an amazing product. But the aggressive marketing approach from Microsoft plastering "CoPilot" everywhere makes me want to try every alternative.
I kind of love the idea that all of this works in the familiar flow of raising an issue and having a magic coder swoop in and making a pull request.
At the same time, I have been spoiled by Cursor. I feel I would end up preferring that the magic coder is right there with me in the IDE where I can run things and make adjustments without having to do a followup request or comment on a line.
I'm honestly surprised by so much hate. IMHO it's more important to look at 1) the progress we've made + what this can potentially do in 5 years and 2) how much it's already helping people write code than dismissing it based on its current state.
I have been so far disappointed by copilot's offerings. It's just not good enough for anything valuable. I don't want you to write my getter and setter. And call it a day.
I love Copilot in VSCode. I have it set to use Claude most of the time, but it let's you pick your fav LLM, for it to use. I just open the files I'm going to refactor, type into the chat window what I want done, click 'accept' on every code change it recommends in it's answer, causing VSCode to auto-merge the changes into my code. Couldn't possibly be simpler. Then I scrutinize and test. If anything went wrong I just use GitLens to rollback the change, but that's very rare.
Especially now that Copilot supports MCP I can plug in my own custom "Tools" (i.e. Function calling done by the AI Agent), and I have everything I need. Never even bothered trying Cursor or Windsurf, which i'm sure are great too, but _mainly_ since they're just forks of VSCode, as the IDE.
I've come to the same conclusions mentioned in most of that and done most of that already. I was an early-adopter of LLM tech, and have my own coding agent system, written in python. Soon I'm about to port those tools over to MCP so that I can just use VSCode for most everything, and never even need my Gradio Chatbot that I wrote to learn how to write tools, and use tools.
My favorite tool that I've written is one that simply lets me specify named blocks by name, in a prompt, and AI figures out how to use the tool to read each block. A named block is defined like:
# block_begin MyBlock
...lines of code
# block_end
So I can just embed those blocks around the code rather change pasting into prompts.
Have you tried the agent mode instead of the ask mode? With just a bit more prompting, it does a pretty good job of finding the files it needs to use on its own. Then again, I've only used it in smaller projects so larger ones might need more manual guidance.
I assumed I was using 'Agent mode' but now that you mentioned it, I checked and you're right I've been in 'Ask mode' instead. oops. So thanks for the tip!
I'm looking forward to seeing how Agent Mode is better. Copilot has been such a great experience so far I haven't tried to keep up with every little new feature they add, and I've fallen behind.
I find agent mode much more powerful as it can search your code base for further reference and even has access to other systems (I haven't seen exactly what is the other level of access, I'm guessing it isn't full access to the web but it can access certain only info repositories). I do find it sometime a little over eager to do instead of explain, so Ask mode is still useful when you want explanations. It also appears that agent has the search capabilities while ask does not, but it might also be something recently added to both and I just don't recall it from being in ask mode as I'm use to the past when it wasn't present.
Yeah in Ask mode, it shows me the code it's proposing, and explains it, and I have to click an icon to get it to merge the change into my code tentatively and then click "Keep" to make it actually merge into the code permanently. I kind of like that workflow a lot, but I guess you're saying Agent Mode just slams the changes into the code. I may or may not like that. Often I pick only, parts of what it's done. Thanks for the tips.
Copilot Workspace could take a task, implement it and create a PR - but it had a linear, highly structured flow, and wasn't deeply integrated into the GitHub tools that developers already use like issues and PRs.
With Copilot coding agent, we're taking all of the great work on Copilot Workspace, and all the learnings and feedback from that project, and integrating it more deeply into GitHub and really leveraging the capabilities of 2025's models, which allow the agent to be more fluid, asynchronous and autonomous.
(Source: I'm the product lead for Copilot coding agent.)
Probably. Also this new feature seems like an expansion/refinement of Copilot Workspaces to better fit the classic Github UX: "assign an issue to Copilot to get a PR" sounds exactly like the workflow Copilot Workspaces wanted to have when it grew up.
So far, i am VERY unimpressed by this. It gets everything completely wrong and tells me lies and completely false information about my code. Cursor is 100000000x better.
How good does your test suite and code base have to be for the agent to verify re fix properly including testing things to at can be broken else where?
Which model does it use? Will this let me select which model to use? I have seen a big difference in the type of code that different models produce, although their prompts may be to blame/credit in part.
I assume you can select whichever one you want (GPT-4o, o3-mini, Claude 3.5, 3.7, 3.7 thinking, Gemini 2.0 Flash, GPT=4.1 and the previews o1, Gemini 2.5 Pro and 04-mini), subject to the pricing multiplicators they announced recently [0].
Edit: From the TFA: Using the agent consumes GitHub Actions minutes and Copilot premium requests, starting from entitlements included with your plan.
At the moment, we're using Claude 3.7 Sonnet - but we're keeping our options open to experiment with other models and potentially bring in a model picker.
(Source: I'm on the product team for Copilot coding agent.)
In my experience using Claude Sonnet 3.7 in GitHub Copilot extension in VSCode, the model produced hideously verbose code, completely unnecessary stuff. GPT-4.1 was a breath of fresh air.
Copilot pushes its work to a branch and creates a pull request, and then it's up to you to review its work, approve and merge.
Copilot literally can't push directly to the default branch - we don't give it the ability to do that - precisely because we believe that all AI-generated code (just like human generated code) should be carefully reviewed before it goes to production.
(Source: I'm the product lead for Copilot coding agent.)
> Once Copilot is done, it’ll tag you for review. You can ask Copilot to make changes by leaving comments in the pull request.
To me, this reads like it'll be a good junior and open up a PR with its changes, letting you (the issue author) review and merge. Of course, you can just hit "merge" without looking at the changes, but then it's kinda on you when unreviewed stuff ends up in main.
Has a point of view, a clear motive, ability to think holistically about things that are hard to digitize, get mad and clean up a bunch of stuff absolutely correctly because they're finally just "sick of all of this shit", or, conservatively isolates legacy code, studying it and creating buffering wrappers for the new system in pieces as the legacy issues are mitigated with a long term strategy. Each move is discussed with their peers. etc etc etc thank you for advocating sanity!
In my experience in VSCode, Claude 3.7 produced more unsolicited slop, whereas GPT-4.1 didn't. Claude aggressively paid attention to type compatibility. Each model would have its strengths.
> Copilot excels at low-to-medium complexity tasks in well-tested codebases, from adding features and fixing bugs to extending tests, refactoring, and improving documentation.
Bounds bounds bounds bounds. The important part for humans seems to be maintaining boundaries for AI. If your well-tested codebase has the tests built thru AI, its probably not going to work.
I think its somewhat telling that they can't share numbers for how they're using it internally. I want to know that Microsoft, the company famous for dog-fooding is using this day in and day out, with success. There's real stuff in there, and my brain has an insanely hard time separating the trillion dollars of hype from the usefulness.
We've been using Copilot coding agent internally at GitHub, and more widely across Microsoft, for nearly three months. That dogfooding has been hugely valuable, with tonnes of valuable feedback (and bug bashing!) that has helped us get the agent ready to launch today.
So far, the agent has been used by about 400 GitHub employees in more than 300 our our repositories, and we've merged almost 1,000 pull requests contributed by Copilot.
In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)
(Source: I'm the product lead at GitHub for Copilot coding agent.)
> we've merged almost 1,000 pull requests contributed by Copilot
I'm curious to know how many Copilot PRs were not merged and/or required human take-overs.
textbook survivorship bias https://en.wikipedia.org/wiki/Survivorship_bias
every bullet hole in that plane is the 1k PRs contributed by copilot. The missing dots, and whole missing planes, are unaccounted for. Ie, "ai ruined my morning"
It's not survivorship bias. Survivorship bias would be if you made any conclusions from the 1000 merged PRs (eg. "90% of all merged PRs did not get reverted"). But simply stating the number of PRs is not that.
As with all good marketing, the conclusions omitted and implied, no?
The implied conclusion ("Copilot made 1000 changes to the codebase") is also not survivorship bias.
By that logic, literally every statement would be survivorship bias.
That’s not the implied conclusion my guy. That’s the statement.
Then what do you claim the implied conclusion is?
That the number of successful (as in, merged and works) contributions are greater than those that did not.
Given that Github is continuing with the product and marketing to us it feels sufficient to count that as a conclusion.
If they measured that too it would make it harder to justify a MSFT P/E ratio of 29.6.
I'm curious how many were much more than Dependabot changes.
I see number of PRs as modern LOC, something that doesn't tell me anything about quality.
"We need to get 1000 PRs merged from Copilot" "But that'll take more time" "Doesn't matter"
I do agree that some scepticism is due here but how can we tell if we're treading into "moving the goal posts" territory?
I'd love to know where you think the starting position of the goal posts was.
Everyone who has used AI coding tools interactively or as agents knows they're unpredictably hit or miss. The old, non-agent Copilot has a dashboard that shows org-wide rejection rates for for paying customers. I'm curious to learn what the equivalent rejection-rate for the agent is for the people who make the thing.
I think the implied promise of the technology, that it is capable of fundamentally changing organizations relationships with code and software engineering, deserves deep understanding. Companies will be making multi million dollar decisions based on their belief in its efficacy
When someone says that the number given is not high enough. I wouldn't consider trying to get an understanding of PR acceptance rate before and after Copilot to be moving the goal posts. Using raw numbers instead of percentages is often done to emphasize a narrative rather than simply inform (e.g. "Dow plummets x points" rather than "Dow lost 1.5%").
I feel the same about automated dependency updates, but if your tests and verifications are good, these become trivial.
Strong automated tests and verifications seem to be nearly as rare as unicorns, at least if you take most of developers feelings on this.
It seems places don't prioritize it, so you don't see it very often. Some developers are outright dismissive of the practice.
Unfortunately, AI won't seemingly help with that
Sometimes there are some paradigms shift in the dependency that get past the current tests you have. So it’s always good to read the changelog and plan the update accordingly.
How strong was the push from leadership to use the agents internally?
As part of the dogfooding I could see them really pushing hard to try having agents make and merge PRs, at which point the data is tainted and you don't know if the 1,000 PRs were created or merged to meet demand or because devs genuinely found it useful and accurate.
> In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)
Thats a fun stat! Are humans in the #1-4 slots? Its hard to know what processes are automated (300 repos sounds like a lot of repos!).
Thank you for sharing the numbers you can. Every time a product launch is announced, I feel like its a gleeful announcement of a decrease of my usefulness. I've got imposter syndrome enough, perhaps Microsoft might want to speak to the developer community and let us know what they see happening? Right now its mostly the pink slips that are doing the speaking.
Humans are indeed in slots #1-4.
After hearing feedback from the community, we’re planning to share more on the GitHub Blog about how we’re using Copilot coding agent at GitHub. Watch this space!
Wonderful thank you! It’s just so hard to filter signal vs noise rn.
> In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)
Really cool, thanks for sharing! Would you perhaps consider implementing something like these stats that aider keeps on "aider writing itself"? - https://aider.chat/HISTORY.html
Nice idea! We're going to try to get together a blog post in the next couple of weeks on how we're using Copilot coding agent at GitHub - including to build Copilot coding agent ;) - and having some live stats would be pretty sweet too.
> In the repo where we're building the agent, the agent itself is actually the #5 contributor
How does this align with Microsoft's AI safety principals? What controls are in place to prevent Copilot from deciding that it could be more effective with less limitations?
Copilot only does work that has been assigned to it by a developer, and all the code that the agent writes has to go through a pull request before it can be merged. In fact, Copilot has no write access to GitHub at all, except to push to its own branch.
That ensures that all of Copilot's code goes through our normal review process which requires a review from an independent human.
HAHA. Very smart. The more you review the Copilot Agent's PRs, the better is gets at submitting new PRs... (basics of supervised machine learning, right?)
Tim, are you or any of your coworkers worried this will take your jobs?
What if Tim was the coding agent?
Human-generated corporate-speak is indistinguishable from AI one at this point
Terminal In Mind
Haha
So I need to ask: what is the overall goal of your project? What will you do in, say, 5 years from now?
What I'm most excited about is allowing developers to spend more of their time working on the work they enjoy, and less of their time working on mundane, boring or annoying tasks.
Most developers don't love writing tests, or updating documentation, or working on tricky dependency updates - and I really think we're heading to a world where AI can take the load of that and free me up to work on the most interesting and complex problems.
But isn't writing tests and updating documentation also the areas where automated quality control is the hardest? Existing high quality tests can work as guardrails for writing business logic, but what guardrails could AI use to to evaluate if its generated docs and tests are any good?
I would not be surprised if things end up the other way around – humans doing the boring and annoying tasks that are too hard for AI, and AI doing the fun easy stuff ;-)
What about developers who do enjoy writing for example high quality documentation? Do you expect that the status quo will be that most of the documentation will be AI slop and AI itself will just bruteforce itself through the issues? How close are we to the point where the AI could handle "tricky dependency updates", but not being able to handle "most interesting and complex problems"? Who writes the tests that are required for the "well tested" codebases for GitHub Copilot Coding Agent to work properly?
What is the job for the developer now? Writing tickets and reviewing low quality PRs? Isn't that the most boring and mundane job in the world?
I'd argue the only way to ensure that is to make sure developers read high quality documentation - and report issues if it's not high quality.
I expect though that most people don't read in that much detail, and AI generated stuff will be 80-90% "good enough", at least the same if not better than someone who doesn't actually like writing documentation.
> What is the job for the developer now? Writing tickets and reviewing low quality PRs? Isn't that the most boring and mundane job in the world?
Isn't that already the case for a lot of software development? If it's boring and mundane, an AI can do it too so you can focus on more difficult or higher level issues.
Of course, the danger is that, just like with other automated PRs like dependency updates, people trust the systems and become flippant about it.
I think just having devs attempt to feed an agent openapi docs as context to create api calls would do enough. Simply adding tags and useful descriptions about endpoints makes a world of difference in the accuracy of agent's output. It means getting 95% accuracy with the cheapest models vs. 75% accuracy with the most expensive models.
If find your comment "AI Slop" in reference to technical documentation to strange. It isn't a choice between finely crafted prose versus banal text. It's documentation that exists versus documentation that doesn't exist. Or documentation that is hopelessly out of date. In my experience LLMs do a wonderful job in translating from code to documentation. It even does a good job inferring the reason for design decisions. I'm all in on LLM generated technical documentation. If I want well written prose I'll read literature.
Documentation is not just translating code to text - I don't doubt that LLMs are wonderful at that: that's what they understand. They don't understand users though, and that's what separates a great documentation writer from someone who documents.
Great technical documentation rarely gets written. You can tell the LLM the audience they are targeting and it will do a reasonable job. I truly appreciate technical writers, and hold great ones in special esteem. We live in a world where the market doesn't value this.
The market value good documentation. Anything critical and commonly used is pretty well documented (linux, databases, software like Adobe's,...). You can see how many books/articles have been written about those systems.
> Anything critical and commonly used is pretty well documented
I'd argue the vast majority of software development is neither critical nor commonly used. Anecdotal, but I've written documentation and never got any feedback on it (whether it's good or bad), which implies it's not read or the quality doesn't matter.
Sometimes the code, if written cleanly, is trivial enough for anyone with a foundation in the domain so it can act like the documentation. And sometimes, only the usage is important, not the implementation (manual pages). And some other times, the documentation are the sandards (file formats and communication protocols). So I can get why no one took the effort to compile a documentation manual.
We’re not talking about AI writing books about the systems, though. We’re talking about going from an undocumented codebase to a decently documented one, or one with 50% coverage going to 100%.
Those orgs that value high-quality documentation won’t have undocumented codebases to begin with.
And let’s face it, like writing code, writing docs does have a lot of repetitive, boring, boilerplate work, which I bet is exactly why it doesn’t get done. If an LLM is filling out your API schema docs, then you get to spend more time on the stuff that’s actually interesting.
A much better options is to use docstrings[0] and a tool like doxygen to extract an API reference. Domain explanations and architecture can be compiled later from design and feature docs.
A good example of the kind of result is something like the Laravel documentation[1] and its associated API reference[2]. I don't believe AI can help with this.
[0]: https://en.wikipedia.org/wiki/Docstring
[1]: https://laravel.com/docs/12.x
[2]: https://api.laravel.com/docs/12.x/
Well documented meaning high quality, or well documented meaning high coverage?
> If I want well written prose I'll read literature.
Actually if you want well-written prose you'll read AI slop there too. I saw people comparing their "vibe writing" workflows for their "books" on here the other day. Nothing is to be spared, apparently
A lot (I'd argue most of it) of literature written by humans is garbage.
What will you be most excited about when the most interesting and complex problems are out of the Overton window and deemed mundane, boring or annoying as well, or turn out to be intractable for your abilities?
> allowing developers to spend more of their time working on the work they enjoy, and less of their time working on mundane, boring or annoying tasks.
I get paid for the mundane, boring, annoying tasks, and I really like getting paid.
But would you rather get paid to spend your time doing the interesting and enjoying work, or the mundane and boring stuff? ;) My hope is that agents like Copilot can help us burn down the tedious stuff and make more time for the big value adds.
Though I do not doubt your intentions to do what you think will make developers' lives better, can you be certain that your bosses, and their bosses, have our best interests in mind as well? I think it would be pretty naive to believe that your average CEO wouldn't absolutely love not to have to pay developers at all.
But this way the developers can spend all their time working on the truly interesting problems, like how to file for unemployment.
If that were possible it would happen anyway. You can’t hold back progress.
It’s very, very far from possible today.
The profile indicates he's a product manager, not a developer.
But working on interesting things is mentally taxing while the tedious tasks aren't, I can't always work at full bore so having some tedium can be a relief.
Not everyone gets to do the fun stuff. That's for people higher up in the chain, with more connections, or something else. I like my paycheck, and you're supposing that AI isn't going to take that away, and that we'll get to live in a world where we all work on "fun stuff". That is a real pie-in-the-sky dream you have, and it simply isn't how the world works. Back in the real world, tech jobs are already scarce and there's a lot of people that would be happy to do the boring mundane stuff so they can feed their family.
Id just rather get paid
The only reason to listen to this would be to be naive about how capitalism works. Come on..
The goal here is for it to be able to do everything, taking 100% of the work
2nd best is to do the hard, big value adds so companies can hire cheap labor for the boring shit
3rd best is to only do the mundane and boring stuff
>Most developers don't love writing tests, or updating documentation, or working on tricky dependency updates - and I really think we're heading to a world where AI can take the load of that and free me up to work on the most interesting and complex problems.
Where does the most come from? There's a certain sense of satisfaction in knowing I've tested a piece of code per my experience in the domain coupled with knowledge of where we'll likely be in six months. The same can be said for documentation - hell, on some of the projects I've worked on we've entire teams dedicated to it, and on a complicated project where you're integrating software from multiple vendors the costs of getting it wrong can be astronomical. I'm sorry you feel this way.
i suppose its possible at some point an ai could be complex enough to try out additional edge cases or confirm with a design document or something and do those parts as well... but idk its still after-the-fact testing instead of at design-time its less valuable imo...
He does not feel that way, he's just salespitching for Microsoft here
Most developers don't love writing tests, or updating documentation, or working on tricky dependency updates
So they won’t like working on their job ?
You know exactly what they meant, and you know they’re correct.
I like updating documentation and feel that it's fairly important to be doing myself so I actually understand what the code / services do?
I use all of these tools, but you also know what "they're doing"...
I know our careers are changing dramatically, or going away (I'm working on a replacement for myself), but I just like listening to all the "what we're doing is really helping you..."
I'd interpret the original statement as "tests which don't matter" and "documentation nobody will ever read", the ones which only exist because someone said they _have_ to, and nobody's ever going to check them as long as they exist (like a README.md in one my main work projects I came back to after temporarily being reassigned to another project - previously it only had setup instructions, now: filled with irrelevent slop, never to be read, like "here is a list of the dependencies we use and a summary of each of their descriptions!").
Doing either of them _well_ - the way you do when you actually care about them and they actually matter - is still so far beyond LLMs. Good documentation and good tests are such a differentiator.
If we're talking about low quality tests and documentation that exists only to check a box, the easier answer is to remove the box and acknowledge that the low quality stuff just isn't needed at all.
I’ve never seen a test that doesn’t matter that shouldn’t be slotted for removal (if it gets written at all) or documentation that is never read. If people can read code to understand systems, they will be grateful for good documentation.
Thanks for the response… do you see a future where engineers are just prompting all the time? Do you see a timeline in which todays programming languages are “low level” and rarely coded by hand?
Do you think you're putting yourself or your coworkers out of work?
If/when will this take over your job?
The thought here is problem “well … I’ll obviously never get replaced”. This is very sad indeed
I'm honestly surprised that Microsoft (and other similarly sized LLM companies) have convinced or coerced literally hundreds of thousands of employees to build their own replacement.
If we're expected to even partially believe the marketing, LLM coding agents are useful today at junior level developer tasks and improving quickly enough that senior tasks will be doable soon too. How do you convince so many junior and senior level devs to build that?
When the options are "do what we tell you and get paid" vs getting laid off in the current climate, the choice isn't really a choice.
That threat doesn't scale. I do get that many haven't put themselves in a position to stand behind their views or principles, but if they did the threat, or the company, would crumble.
That's a completely nonsensical question given how quickly things are evolving. No one has a five year project timeline.
Absolutely the wrong take. We MUST think about what might happen in several years. Anyone who says we shouldn’t is not thinking about this technology correctly. I work on AI tech. I think about these things. If the teams at Microsoft or GitHub are not, then we should be pushing them to do so.
He asked that in the context of an actual specific project. It did not make sense way he asked it. And it's the executive's to plan that out five years down the line.. although I guarantee you none of them are trying to predict that far.
> 1,000 pull requests contributed by Copilot
I'd like a breakdown of this phrase, how much human work vs Copilot and in what form, autocomplete vs agent. It's not specified seems more like a marketing trickery than real data
The "1,000 pull requests contributed by Copilot" datapoint is specifically referring to Copilot coding agent over the past 2.5 months.
Pretty much every developer at GitHub is using Copilot in their day to work, so its influence touches virtually every code change we make ;)
> Pretty much every developer at GitHub is using Copilot in their day to work, so its influence touches virtually every code change we make ;)
Genuine question, but is CoPilot use not required at GitHub? I'm not trying to be glib or combative, just asking based on Microsoft's current product trajectory and other big companies (e.g. Shopify) forcing their devs to use AI and scoring their performance reviews based on AI use.
Unfortunately, you can't opt out of Co-Pilot in Github. Although I did just use it to ask how to remove the sidebar with "Latest Changes" and other non-needed widgets that feel like clutter.
Copilot said: There is currently no official GitHub setting or option to remove or hide the sidebar with "Latest Changes" and similar widgets from your GitHub home page.
I'm using this an example to show that it is no longer possible to set up a GitHub account to NOT use CoPilot, even if it just lurks in the corner of every page waiting to offer a suggestion. Like many A.I. features it's there, whether you want to use it or not, without an option to disable.
So I'm suss of the "pretty much every developer" claim, no offense.
I'm sorry but given the company you're working for I really have hard time believing such bold statements, even so more that the more I use copilot the more feels dumber and dumber
Yeah, Product Managers always swear by their products.
Is Copilot _enforced_ as the only option for an AI coding agent? Or can devs pick-and-choose whatever tool they prefer
I'm interested in the [vague] ratio of {internallyDevlopedTool} vs alternatives - essentially the "preference" score for internal tools (accounting for the natural bias towards ones own agent for testing/QA/data purposes). Any data, however vague is necessary, would be great.
(and if anybody has similar data for _any_ company developing their own agent, please shout out).
What's the motivation for restricting to Pro+ if billing is via premium requests? I have a (free, via open source work) Pro subscription, which I occasionally use. I would have been interested in trying out the coding agent, but how do I know if it's worth $40 for me without trying it ;).
Great question!
We started with Pro+ and Enterprise first because of the higher number of premium requests included with the monthly subscription.
Whilst we've seen great results within GitHub, we know that Copilot won't get it right every time, and a higher allowance of free usage means that a user can play around and experiment, rather than running out of credits quickly and getting discouraged.
We do expect to open this up to Pro and Business subscribers - and we're also looking at how we can extend access to open source maintainers like yourself.
400 GitHub employees are using GitHub Copilot day in day out, and it comes out as #5 contributor? I wouldn't call that a success. If it is any useful, I would expect that even if a developer write 10% of their code using it, it would hold be #1 contributor in every project.
The #5 contributor thing is a stat from a single repository where we’re building Copilot coding agent.
re: 300 of your repositories... so it sounds like y'all don't use a monorepo architecture. i've been wondering if that would be a blocker to using these agents most effectively. expect some extra momentum to swing back to the multirepo approach accordingly
TBF, you are more than biased to conclude this, I definitely take your opinion with an whole bottle of salt.
Without data, a comprehensive study and peers review, it's a hell no. Would GitHub willing to be at academic scrutiny to prove it?
What model does it use? gpt-4.1? Or can it use o3 sometimes? Or the new Codex model?
At the moment, we're using Claude 3.7 Sonnet, but we're keeping our options open to change the model down the line, and potentially even to introduce a model picker like we have for Copilot Chat and Agent Mode.
Using different models for different tasks is extremely useful and I couldn't imagine going back to using just one model for everything. Sometimes a model will struggle for one reason or another and swapping it out for another model mid-conversation in LibreChat will get me better results.
Welp....Github was good product while it lasted.
Github and Copilot are separate products, nothing mandates you to use it.
It's nearly impossible though to escape the flood of Copilot buttons creeping into every corner of Github (and other Microsoft products like VSCode). This looks like Microsoft aims for deep integration, not separation.
integration is the bread and butter of Microsoft's business strategy.
Incorrect. It's not mandated that you actually use it to write or correct code but it's impossible to remove it so you need to either get used to blocking out it's incessant suggestions and notifications or stop using GitHub.
Similarly, the newest MS Word has CoPilot that you "don't have to use" but you still have to put up with the "what would you like to write today?" prompt request at the start of every document or worse "You look like you're trying to write a...formal letter...here are some suggestions."
Can copilot be disabled entirely in a GitHub repo or organization? I may very well have missed those settings, but if nothing else they are well hidden.
Question you may have a very informed perspective on:
where are we wrt the agent surveying open issues (say, via JIRA) and evaluating which ones it would be most effective at handling, and taking them on, ideally with some check-in for conirmation?
Or, contrariwise, from having product management agents which do track and assign work?
Check out this idea: https://fairwitness.bot (https://news.ycombinator.com/item?id=44030394).
The entire website was created by Claude Sonnet through Windsurf Cascade, but with the “Fair Witness” prompt embedded in the global rules.
If you regularly guide the LLM to “consult a user experience designer”, “adopt the multiple perspectives of a marketing agenc”, etc., it will make rather decent suggestions.
I’ve been having pretty good success with this approach, granted mostly at the scale of starting the process with “build me a small educational website to convey this concept”.
Tell Claude the site is down!
> In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)
Ah yes, the takeoff.
Why don't you focus on automating your CEO's job, a comparatively easy task, rather than automating your fellow engineer's jobs?
Spoken by someone who's apparently never run a real business
When I repeated to other tech people from about 2012 to 2020 that the technological singularity was very close, no one believed me. Coding is just the easiest to automate away into almost oblivion. And, too many non technical people drank the Flavor Aid for the fallacy that it can be "abolished" completely soon. It will gradually come for all sorts of knowledge work specialists including electrical and mechanical engineers, and probably doctors too. And, of course, office work too. Some iota of a specialists will remain to tune the bots, and some will remain in the fields to work with them for where expertise is absolutely required, but widespread unemployment of what were options for potential upward mobility into middle class are being destroyed and replaced with nothing. There won't be "retraining" or handwaving other opportunities for the "basket of labor", but competition of many uniquely, far overqualified people for ever dwindling opportunities.
It is difficult to get a man to understand something when his salary depends upon his not understanding it. - Upton Sinclair
Do you've any textual evidence of this 8-year stretch of your life where you see yourself as being perpetually correct? Do you mean that you were very specifically predicting flexible natural language chatbots, or vaguely alluding to some sort of technological singularity?
We absolutely have not reached anything resembling anyone's definition of a singularity, so you are very much still not proven correct in this. Unless there are weaker definitions of that than I realised?
I think you'll be proven wrong about the economy too, but only time will tell there.
I don't think it was unreasonable to be very skeptical at the time. We generally believed that automation would get rid of repetitive work that didn't require a lot of thought. And in many ways programming was seen almost at the top of the heap. Intellectually demanding and requiring high levels of precision and rigor.
Who would've thought (except you) that this would be one of the things that AI would be especially suited for. I don't know what this progression means in the long run. Will good engineers just become 1000x more productive as they manage X number of agents building increasingly complex code (with other agents constantly testing, debugging, refactoring and documenting them) or will we just move to a world where we just have way fewer engineers because there is only a need for so much code.
Its interesting that even people initially skeptical are now thinking they are on the "chopping block" so to speak. I'm seeing it all over the internet and the slow realization that what supposed to be the "top of the heap" is actually at the bottom - not because of difficulty of coding but because the AI labs themselves are domain experts in software and therefore have the knowledge and data to tackle it as a problem first. I also think to a degree they "smell blood" and fear, more so than greed, is the best marketing tool. Many invested a good chunk of time on this career, and it will result in a lot of negative outcomes. Its a warning to other intellectual careers that's for sure - and you will start seeing resistance to domain knowledge sharing from more "professionalized" careers for sure.
My view is in between yours: A bit of column A and B in the sense both outcomes to an extent will play out. There will be less engineers but not by the factor of productivity (Jevon's paradox will play out but eventually tap out), there will be even more software especially of the low end, and the ones that are there will be expected to be smarter and work harder for the same or less pay grateful they got a job at all. There will be more "precision and rigor", more keeping up required by workers, but less reward for the workers that perform it. In a capitalist economy it won't be seen as a profession to aspire to anymore by most people.
Given most people don't live to work, and use their career to also finance and pursue other life meanings it won't be viable for most people long term especially when other careers give "more bang for buck" w.r.t effort put into them. The uncertainty in the SWE career that most I know are feeling right now means to newcomers I recommend on the balance of risk/reward its better to go another career path especially for juniors who have a longer runway. To be transparent I want to be wrong, but the risk of this is getting higher now everyday.
i.e. AI is a dream for the capital class, and IMO potentially disastrous for social mobility long term.
I don't think I'm on the chopping block because of AI capabilities, but because of executive shortsightedness. Kinda like moving to the Cloud eliminated sysadmins, but created DevOps, but in many ways the solution is ill-suited to the problem.
Even in the early days of LLM-assisted coding tools, I already know that there will be executives who would said: Let's replace our pool of expensive engineers with a less expensive license. But the only factor that led to this decision is cost comparison. Not quality, not maintenance load, and very much not customer satisfaction.
That said, management generally never cared about quality and maintenance.
> I don't think it was unreasonable to be very skeptical at the time.
Well, that's back rationalization. I saw the advances like conducting meta sentiment analysis on medical papers in the 00's. Deep learning was clearly just the beginning. [0]
> Who would've thought (except you)
You're othering me, which is rude, and you're speaking as though you speak for an entire group of people. Seems kind of arrogant.
0. (2014) https://www.ted.com/talks/jeremy_howard_the_wonderful_and_te...
history/1950/people-in-swimming-pool-drinking-wine-served-by-a-robot.png
From talking to colleagues at Microsoft it's a very management-driven push, not developer-driven. Friend on an Azure team had a team member who was nearly put on a PIP because they refused to install the internal AI coding assistant. Every manager has "number of developers using AI" as an OKR, but anecdotally most devs are installing the AI assistant and not using it or using it very occasionally. Allegedly it's pretty terrible at C# and PowerShell which limits its usefulness at MS.
If you aren't using AI day-to-day then you're not adapting. Software engineering is not going to look at all the same in 5-10 years.
That's exactly what senior executives who aren't coding are saying everywhere.
Meanwhile, engineers are using it for code completion and as a Google search alternative.
I don't see much difference here at all, the only habit to change is learning to trust an AI solution as much as a Stack Overflow answer. Though the benefit of SO is each comment is timestamped and there are alternative takes, corrections, caveats in the comments.
> I don't see much difference here at all, the only habit to change is learning to trust an AI solution as much as a Stack Overflow answer. Though the benefit of SO is each comment is timestamped and there are alternative takes, corrections, caveats in the comments.
That's a pretty big benefit, considering the feedback was by people presumably with relevant expertise/experience to contribute (in the pre-LLM before-time).
The comments have the same value as the answers themselves. Kinda like annotations and errata on a book. It's like seeing "See $algorithm in The Art of Programming V1" in a comment before a complex code.
> Meanwhile, engineers are using it for code completion and as a Google search alternative.
Yep, that's the usefulness right now.
In my experience it's far less useful than simple auto complete. It makes things up for even small amounts of code that I have to pause my flow to correct. Also, without actually googling you don't get any context or understanding of what it's writing.
I found it to be more distracting recently. Suggestions that are too long or written in a different style make me lose my own thread of logic that I'm trying to weave .
I've had to switch it off for periods to maintain flow.
What does this have to do with my comment? Did you mean to reply to someone else?
I don't understand what this has to do with AI adoption at MS (and Google/AWS, while we're at it) being management-driven.
There's a large group of people that claim that AI tools are no good and I can't tell if they're in some niche where they truly aren't, they don't care to put any effort into learning the tools, or they're simply in denial.
Or simply unwilling to cut their perfectly good legs off and attach those overhyped prostheses that make people so fast and furious at running on the spot
Likely a Five Worlds scenario.
https://www.joelonsoftware.com/2002/05/06/five-worlds/
Some of each
It's just tooling. Costs nothing to wait for it to be better. It's not like you're going miss out on AGI. The cost of actually testing every slop code generator is non-trivial.
A bwtter stack exchange search isn't that revolutionary
AIs are boring
> I want to know that Microsoft, the company famous for dog-fooding is using this day in and day out, with success
Have they tried dogfooding their dogshit little tool called Teams in the last few years? Cause if that's what their "famed" dogfooding gets us, I'm terrified to see what lays in wait with copilot.
"I want to know that Microsoft, the company famous for dog-fooding is using this day in and day out, with success."
They just cut down their workforce, letting some of their AI people go. So, I assume there isn't that much success.
I feel like I saw a quote recently that said 20-30% of MS code is generated in some way. [0]
In any case, I think this is the best use case for AI in programming—as a force multiplier for the developer. It’s for the best benefit of both AI and humanity for AI to avoid diminishing the creativity, agency and critical thinking skills of its human operators. AI should be task oriented, but high level decision-making and planning should always be a human task.
So I think our use of AI for programming should remain heavily human-driven for the long term. Ultimately, its use should involve enriching humans’ capabilities over churning out features for profit, though there are obvious limits to that.
[0] https://www.cnbc.com/2025/04/29/satya-nadella-says-as-much-a...
> I feel like I saw a quote recently that said 20-30% of MS code is generated in some way. [0]
Similar to google. MS now requires devs to use ai
I know a lot of devs at MSFT, none of them are required to use AI.
The GitHub org is required to for sure, with a very similar mandate to the one Shopify's CEO put out.
LLM use is now part of the annual review process, its self reported if I'm not mistaken but at least at Microsoft they would have plenty of data to know how often you use the tools.
From reading around on Hacker News and Reddit, it seems like half of commentators say what you say, and the other half says "I work at Microsoft/know someone who works at Microsoft, and our/their manager just said we have to use AI", someone mentioned being put on PIP for not "leveraging AI" as well.
I guess maybe different teams have different requirements/workflows?
So demanding all employees use it... results in less than 30% compliance. That does tell me a lot
How much was previously generated by intellisense and other code gen tools before AI? What is the delta?
That quote was completely misrepresented.
How much of that is protobuf stubs and other forms of banal autogenerate code?
Updated my comment to include the link. As much as 30% specifically generated by AI.
The 2nd paragraph contradicts the title.
The actual quote by Satya says, "written by software".
Sure but then he says in his next sentence he expects 50% by AI in the next year. He’s clearly using the terms interchangeably.
I would still wager that most of the 30% is some boilterplate stuff. Which is ok. But sounds less impressive with that caveat.
You might want to study the history of technology and how rapidly compute efficiency has increased as well as how quickly the models are improving.
In this context, assuming that humans will still be able to do high level planning anywhere near as well as an AI, say 3-5 years out, is almost ludicrous.
Reality check time for you: people were saying this exact thing 3 years ago. You cannot extrapolate like that.
They have released numbers, but I can't say they are for this specific product or something else. They are apparently having AI generate "30%" of their code.
https://techcrunch.com/2025/04/29/microsoft-ceo-says-up-to-3...
That article is wrong, that is not what was said.
What was said?
The quote was actually "in some of our projects". Journalists completely mis-understood it.
> Microsoft, the company famous for dog-fooding
This was true up around 15 years ago. Hasn't been the case since.
That's great, our leadership is heavily pushing ai-generated tests! Lol
Whatever the true stats for mistakes or blunders are now, remember that this is the worst its ever going to be. And there is no clear ceiling in sight that would prevent it from quickly getting better and better, especially given the current levels of investment.
That sounds reasonable enough, but the pace or end result is by no means guaranteed.
We have invested plenty of money and time into nuclear fusion with little progress. The list of key acheivments from CERN[1] is also meager in comparison to the investment put in, especially if you consider their ultimate goal to ultimately be towards applying research to more than just theory.
[1] https://home.cern/about/key-achievements
I tried doing some vibe coding on a greenfield project (using gemini 2.5 pro + cline). On one hand - super impressive, a major productivity booster (even compared to using a non-integrated LLM chat interface).
I noticed that LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt. One easy example is that I noticed them breaking abstractions (putting things where they don't belong). Unfortunately, there's not that much self-retrospection on these aspects if you ask about the quality of the code or if there are any better ways of doing it. Of course, if you pick up that something is in the wrong spot and prompt better, they'll pick up on it immediately.
I also ended up blowing through $15 of LLM tokens in a single evening. (Previously, as a heavy LLM user including coding tasks, I was averaging maybe $20 a month.)
> I also ended up blowing through $15 of LLM tokens in a single evening.
This is a feature, not a bug. LLMs are going to be the next "OMG my AWS bill" phenomenon.
Cline very visibly displays the ongoing cost of the task. Light edits are about 10 cents, and heavy stuff can run a couple of bucks. It's just that the tab accumulates faster than I expect.
> Light edits are about 10 cents
Some well-paid developers will excuse this with, "Well if it saved me 5 minutes, it's worth an order of magnitude than 10 cents".
Which is true, however there's a big caveat: Time saved isn't time gained.
You can "Save" 1,000 hours every night, but you don't actuall get those 1,000 hours back.
> You can "Save" 1,000 hours every night, but you don't actuall get those 1,000 hours back.
What do you mean?
If I have some task that requires 1000 hours, and I'm able to shave it down to one hour, then I did just "save" 999 hours -- just in the same way that if something costs $5 and I pay $4, I saved $
My point is that saving 1,000 hours each day doesn't actually give you 1,000 hours a day to do things with.
You still get your 24 hours, no matter how much time you save.
What actually matters is the value of what is delivered, not how much time it actually saves you. Justifying costs by "time saved" is a good way to eat up your money on time-saving devices.
If I "save 1000 hours" then that could be distributed over 41.666 days, so no task would need to be performed during that period because "I saved 1000 hours".
You could also say you saved 41.666 people an entire 24 hour day, by "saving 1000 hours", or some other fractional way.
How you're trying to explain it as "saving 1000 hours each day" is really not making any sense without further context.
And I'm sure if I hadn't written this comment I would be saving 1000 hours on a stupid comment thread.
You're overthinking it.
It's like this coupon booklets they used to sell. "Over $10,000 of savings!"
Yes but how much money do I have to spend in order to save $10,000?
There was this funny commercial in the 90s for some muffler repair chain that was having a promotion: "Save Fifty Dollars..."
The theme was "What will you do with the fifty dollars you saved?" And it was people going to Disneyland or afancy dinner date.
The people (actors) believed they were receiving $50. They acted as if it was free money. Meanwhile there was zero talk about whether their cars needed muffler repair at all.
> Meanwhile there was zero talk about whether their cars needed muffler repair at all.
It's called "Thinking past the sale". It's a common sales tactic.
I think one issue is that you won't always be able to invoice those extra 999 hours to your customer. Sometimes you'll still only be able to get paid for 1 hour, depending on the task and contract.
But the llm bill will always invoice you for all the saved work regardless.
(from a companies perspective, this is true). As a developer, you may not be paid by the task -- If I finish something early, I start work on the next thing.
Huh? What happens if you stop using your washing machine and go back to hand washing everything?
If you earn more than me, then if you value "time saved" then you should pay me to take my washing off me. Because then you can save even more of your valuable time!
The more of my washing you can take off me, the more of your time you can save by then using a washing machine or laundry service!
Saving an hour of my time is a waste, when saving an hour of your time is worth so much more. So it makes economic sense for you to pay me, to take my washing off me!
( Does that better illustrate my point? )
> Cline very visibly displays the ongoing cost of the task
LLMs are now being positioned as "let them work autonomously in the background" which means no one will be watching the cost in real time.
Perhaps I can set limits on how much money each task is worth, but very few would estimate that properly.
> LLMs are now being positioned as "let them work autonomously in the background"
The only people who believe this level of AI marketing are the people who haven't yet used the tools.
> which means no one will be watching the cost in real time.
Maybe some day there's an agentic coding tool that goes off into the weeds and runs for days doing meaningless tasks until someone catches it and does a Ctrl-C, but the tools I've used are more likely to stop short of the goal than to continue crunching indefinitely.
Regardless, it seems like a common experience for first-timers to try a light task and then realize they've spent $3, instantly setting expectations for how easy it is to run up a large bill if you're not careful.
Especially at companies (hence this github one), where the employees don't care about cost because it's the boss' credit card.
I think that models are gonna commoditize, if they haven't already. The cost of switching over is rather small, especially when you have good evals on what you want done.
Also there's no way you can build a business without providing value in this space. Buyers are not that dumb.
They are already quite commoditized. Commoditization doesn't mean "cheap", and it doesn't mean you won't spend $15 a night like the GP did.
> I also ended up blowing through $15 of LLM tokens in a single evening.
Consider using Aider, and aggressively managing the context (via /add, /drop and /clear).
https://aider.chat/
I, too, recommend aider whenever these discussions crop up; it converted me from the "AI tools suck" side of this discussion to the "you're using the wrong tool" side.
I'd also recommend creating little `README`'s in your codebase that are mainly written with aider as the intended audience. In it, I'll explain architecture, what code makes (non-)sense to write in this directory, and so on. Has the side-effect of being helpful for humans, too.
Nowadays when I'm editing with aider, I'll include the project README (which contains a project overview + pointers to other README's), and whatever README is most relevant to the scope of my session. It's super productive.
I'm yet to find a model that beats the cost-effectiveness of Sonnet 3.7. I've tried the latest deepseek models, and while I love the price (nearly 50x cheaper?), it's just far too error-prone compared to Sonnet 3.7. It generates solid plans / architecture discussions, but, unlike Sonnet, the code it generates often confidently off-the-mark.
Have you tried Gemini 2.5? It's cheaper and scores higher on the Aider leaderboard.
I haven’t yet, I’ll give it a shot!
It's so good!
There is a better way than just READMEs: https://taoofmac.com/space/blog/2025/05/13/2230
I like this a lot! My root README ends up looking a lot like your SPEC.md, and I also have a file that’s pretty similar to your TODO.md.
My experience agrees that separating the README and the TODO is super helpful for managing context.
Why create READMEs and not just comments in the code?
I’d generally prefer comments in code. The README’s are relatively sparse and contain information that would be a bit too high-level for module or class-level comments. If commentary is specific to a module or class or method, the documentation belongs there. My rule of thumb is if the commentary helps you navigate and understand rules that apply to entire sets of modules rooted at `foo/`, it generally belongs in `foo/README`.
For example “this module contains logic defining routes for serving an HTTP API. We don’t write any logic that interacts directly with db models in these modules. Rather, these modules make calls to services in `/services`, which make such calls.”
It wouldn’t make sense to duplicate this comment across every router sub-module. And it’s not obvious from looking at any one module that this rule is applied across all modules, without such guidance.
These little bits of scaffolding really help narrow down the scope of the code that LLMs eventually try to write.
Comments are not easy for the LLM to refer to, ironically: https://taoofmac.com/space/blog/2025/05/13/2230
My tool Plandex[1] allows you to switch between automatic and manual context management. It can be useful to begin a task with automatic context while scoping it out and making the high level plan, then switch to the more 'aider-style' manual context management once the relevant files are clearly established.
1 - https://github.com/plandex-ai/plandex
Also, a bit more on auto vs. manual context management in the docs: https://docs.plandex.ai/core-concepts/context-management
I loathe using AI in a greenfield project. There are simply too many possible paths, so it seems to randomly switch between approaches.
In a brownfield code base, I can often provide it reference files to pattern match against. So much easier to get great results when it can anchor itself in the rest of your code base.
The trick for greenfield projects is to use it to help you design detailed specs and a tentative implementation plan. Just bounce some ideas off of it, as with a somewhat smarter rubber duck, and hone the design until you arrive at something you're happy with. Then feed the detailed implementation plan step by step to another model or session.
This is a popular workflow I first read about here[1].
This has been the most useful use case for LLMs for me. Actually getting them to implement the spec correctly is the hard part, and you'll have to take the reigns and course correct often.
[1]: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/
Here’s my workflow, it takes that a few steps further: https://taoofmac.com/space/blog/2025/05/13/2230
This seems like a good flow! I end up adding a "spec" and "todo" file for each feature[1]. This allows me to flesh out some of the architectural/technical decisions in advance and keep the LLM on the rails when the context gets very long.
[1] https://notes.jessmart.in/My+Writings/Pair+Programming+with+...
Yeah, I limit context by regularly trimming the TODOs. I like having 5-6 in one file because it sometimes informs the LLM as to how to complete the first in a way that makes sense for the follow-ups.
READMEs per module also help, but it really depends a lot on the model. Gemini will happily traipse all over your codebase at random, gpt-4.1 will do inline imports inside functions because it seems to lack any sort of situational awareness, Claude so far gets things mostly right.
The trouble occurs when the brownfield project is crap already.
I've vibe coded small project as well using Claude Code. It's about visitors registration at the company. Simple project, one form, a couple of checkboxes, everything is stored in sqlite + has endpoint for getting .xlsx.
Initial cost was around $20 USD, which later grew to (mostly polishing) $40 with some manual work.
I've intentionally picked up simple stack: html+js+php.
A couple of things:
* I'd say I'm happy about the result from product's perspective * Codebase could be better, but I could not care less about in this case * By default, AI does not care about security unless I specifically tell it * Claude insisted on using old libs. When I've specifically told it to use the latest and greatest, it upgraded them but left code that works just with an old version. Also it mixed latest DaisyUI with some old version of tailwindcss :)
On one hand it was super easy and fun to do, on the other hand if I was a junior engineer, I bet it would have cost more.
If you want to use Cline and are at all price sensitive (in these ranges) you have to do manual context management just for that reason. I find that too cumbersome and use Windsurf (currently with Gemini 2.5 pro) for that reason.
I think it's just that it's not end-to-end trained on architecture because the horizon is too short. It doesn't have the context length to learn the lessons that we do about good design.
> I noticed that LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt. One easy example is that I noticed them breaking abstractions
That doesn’t matter anymore when you’re vibe coding it. No human is going to look at it anyway.
It can all be if/else on one line in one file. If it works and if the LLMs can work at, iterate and implement new business requirements, while keeping performance and security - code structure, quality and readability don’t matter one bit.
Customers don’t care about code quality and the only reason businesses used to care is to make it less money consuming to build and ship new things, so they can make more money.
Wild take. Let’s just hand over the keys to LLMs I suppose, the fancy next token predictor is the capitan now.
Not that wild TBH.
This is a common view, and I think will be the norm on the near-to-mid term, especially for basic CRUD apps and websites. Context windows are still too small for anything even slightly complex (I think we need to be at about 20m before we start match human levels), but we'll be there before you know it.
Engineers will essentially become people who just guide the AIs and verify tests.
Have you ever tried to get those little bits of styrofoam completely off of a cardboard box? Have you ever seen something off in the distance and misjudged either what it was or how long it would take to get there?
LLMs need a very heavy hand in guiding the architecture because otherwise they'll code it in a way that even they can't maintain or expand.
Hook up something like Taskmaster or Shrimp, so that they can document as they go along and they can retrieve relevant context when they overflow their context to avoid this issue.
Then as the context window increases, it’s less and less of an issue
> LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt
I wonder if the next phase would be the rise of (AI-driven?) "linters" that check that the implementation matches the architecture definition.
And now we've come full circle back to UML-based code generation.
Everything old is new again!
Average coders, terrible engineers
$15 in an evening sounds like a great deal when you consider the cost of highly-paid software engineers
> highly-paid software engineers
For now.
The money won't be flowing forever. This will cost you $6,000 a year.
A new grad at a FANG costs ~$200k-$250k a year after benefits
If the market is new Grad at 200-250k then this product won't sell many copies.
If this product is going to be successful they are going to need the bulk of their customers at 40-100k employees.
So the fully loaded cost of copilot is ~206k~256k a year?
I don’t get it? Isn’t it just a monthly fixed subscription.
For now. Who is to say in 5 years where everyone makes this THE default workflow things work go up in price?
Nope - I use a-la-carte pricing (through openrouter). I much prefer it over a subscription, as there are zero limits, I pay only for what I use, and there is much less of a walled garden (I can easily switch between Anthropic, Google, etc).
Same here, same reasons!
I’m running o3 dozens of times a day all for the subscription price of $20. Surely this is way more cost effective.
While its being touted for Greenfield projects I've notices a lot of failures when it comes to bootstrapping a stack.
For example it (Gemini 2.5) really struggles with newer ecosystem like Fastapi when wiring libraries like SQLAlchemy, Pytest, Python-playwright, etc., together.
I find more value in bootstrapping myself, and then using it to help with boiler plate once an effective safety harness is in place.
I wish they optimized things before adding more crap that will slow things down even more. The only thing that's fast with copilot is the autocomplete, it sometimes takes several minutes to make edits on a 100 line file regardless of the model I pick (some are faster than others). If these models had a close to 100% hit rate this would be somewhat fine, but going back and forth with something that takes this long is not productive. It's literally faster to open claude/chatgpt on a new tab and paste the question and code there and paste it back into vscode than using their ask/edit/agent tools.
I've cancelled my copilot subscription last week and when it expires in two weeks I'll mostly likely shift to local models for autocomplete/simple stuff.
My experience has mostly been the opposite -- changes to several-hundred-line files usually only take a few seconds.
That said, months ago I did experience the kind of slow agent edit times you mentioned. I don't know where the bottleneck was, but it hasn't come back.
I'm on library WiFi right now, "vibe coding" (as much as I dislike that term) a new tool for my customers using Copilot, and it's snappy.
Here's a video of what it looks like with sonnet 3.7.
https://streamable.com/rqlr84
The claude and gemini models tend to be the slowest (yes, including flash). 4o is currently the fastest but still not great.
For me, the speed varies from day to day (Sonnet 3.7), but I've never seen it this slow.
I've had this too, especially it getting stuck at the very end and just.. never finishing. Once the usage-based billing comes into effect I think I'll try cursor again. What local models are you using? The local models I tried for autocomplete were unusable, though based on aiders benchmark I never really tried with larger models for chat. If I could I would love to go local-only instead.
Several minutes? Something is seriously wrong. For most models, it takes seconds.
2m27s for a partial response editing a 178 line file (it failed with an error, which seems to happen a lot with claude, but that's another issue).
https://streamable.com/rqlr84
It takes minutes for me too sometimes.
Cursor is quicker, I guess it's a response parsing thing - when they make the decision to show it in the UI.
Some example PRs if people want to look:
https://github.com/dotnet/runtime/pull/115733 https://github.com/dotnet/runtime/pull/115732 https://github.com/dotnet/runtime/pull/115762
That first PR is rough. Why does it have to wait for a comment to fix failing tests?
That first PR (115733) would make me quit after a week if we were to implement this crap at my job and someone forced me to babysit an AI in its PRs in this fashion. The others are also rough.
A wall of noise that tells you nothing of any substance but with an authoritative tone as if what it's doing is objective and truthful - Immediately followed by:
- The 8 actual lines of code (discounting the tests & boilerplate) it wrote to actually fix the issue is being questioned by the person reviewing the code, it seems he's not convinced this is actually fixing what it should be fixing.
- Not running the "comprehensive" regression tests at all
- When they do run, they fail
- When they get "fixed" oh-so confidently, they still fail. Fifty-nine failing checks. Some of these tests take upward of an hour to run.
So the reviewer here has to read all the generated slop in the PR description and try to grok what the PR is about, read through the changes himself anyway (thankfully it's only a ~50 line diff in this situation, but imagine if this was a large refactor of some sort with a dozen files changed), and then drag it by the hand multiple times to try fix issues it itself is causing. All the while you have to tag the AI as if it's another colleague and talk to it as if it's not just going to spit out whatever inane bullshit it thinks you want to hear based on the question asked. Test failed? Well, tests fixed! (no, they weren't)
And we're supposed to be excited about having this crap thrust on us, with clueless managers being sold on this being a replacement for an actual dev? We're being told this is what peak efficiency looks like?
[dead]
Thanks. I wonder what model they're using under the hood? I have such a good experience working with Cline and Claude Sonnet 3.7 and a comparatively much worse time with anything Github offers. These PRs are pretty consistent with the experience I've had in the IDE too. Incidentally, what has MSFT done to Claude Sonnet 3.7 in VSCode? It's like they lobotomized it compared to using it through Cline or the API directly. Trying to save on tokens or something?
Thanks, that’s really interesting to see - especially with the exchange around whether something is the problem or the symptom, where the confident tone belies the lack of understanding. As an open source maintainer I wonder about the best way to limit usage to cases where someone has time to spend on those interactions.
Seems amazing similar to the changes a junior would make (jump to the solution that "fixes" it in the most shallow way) at the moment
lol, those first two… poor Stephen
Major scam alert, they are training on your code in private repos if you use this
You can tell because they advertise “Pro” and “Pro+” but then the FAQ reads,
> Does GitHub use Copilot Business or Enterprise data to train GitHub’s model? > No. GitHub does not use either Copilot Business or Enterprise data to train its models.
Aka, even paid individuals plans are getting brain raped
Might have been the case, but no longer:
https://docs.github.com/en/copilot/managing-copilot/managing...
If you're programming on Windows, your screen is being screenshotted every few seconds anyway. If you don't think OCR isn't analysing everything resembling a letter on your screen boy do I have some news for you.
Windows recall is not installed by default
Windows Recall is the local storage, user enabled AI thing. Not what I was talking about.
So, pray tell, what are you talking about then?
I’ve been trying to use Copilot for a few days to get some help writing against code stored on GitHub.
Copilot has been pretty useless. It couldn’t maintain context for more than two exchanges.
Copilot: here’s some C code to do that
Me: convert that to $OTHER_LANGUAGE
Copilot: what code would you like me to convert?
Me: the code you just generated
Copilot: if you can upload a file or share a link to the code, I can help you translate it …
It points me in a direction that’s a minimum of 15 degrees off true north (“true north” being the goal for which I am coding), usually closer to 90 degrees. When I ask for code, it hallucinates over half of the API calls.
Be more methodical, it isn’t magic: https://taoofmac.com/space/blog/2025/05/13/2230
I’m sure you have no idea what my method is. Besides, this whole “you’re holding it wrong” mentality isn’t productive - our technology should be adapting to us, we shouldn’t need to adapt ourselves to it.
Anyway, I can just use another LLM that serves me better.
I played around with it quite a bit. it is both impressive and scary. most importantly, it tends to indiscriminately use dependencies from random tiny repos, and often enough not the correct ones, for major projects. buyer beware.
This is something I've noticed as well with different AIs. They seem to disproportionately trust data read from the web. For example, I asked to check if some obvious phishing pages were scams and multiple times I got just a summary of the content as if it was authoritative. Several times I've gotten some random chinese repo with 2 stars presented as if it was the industry standard solution, since that's what it said in the README.
On an unrelated note, it also suggested I use the "Strobe" protocol for encryption and sent me to https://strobe.cool which is ironic considering that page is all about making one hallucinate.
>On an unrelated note, it also suggested I use the "Strobe" protocol for encryption and sent me to https://strobe.cool which is ironic considering that page is all about making one hallucinate.
That's not hallucination. That's just an optical illusion.
> ... sent me to...
Oh wow, that was great - particularly if I then look at my own body parts (like my palm) that I know are not moving, it's particularly disturbing. That's a really well done effect, I've seen something similar but nothing quite like that.
>They seem to disproportionately trust data read from the web.
I doubt LLM's have anything like what we would conceptualize as trust. They have information, which is regurgitated because it is activated as relevant.
That being said, many humans don't really have a strong concept of information validation as part of day to day action and thinking. Development theory talks about this in terms of 'formal operational' thinking and 'personal epistemology' - basically how does thinking happen and then how is knowledge in those models conceptualized. Learning Sciences research generally talks about Piaget and formal operational before adulthood and stages of personal epistemology in higher education.
Research consistently suggests that about 50% of adults are not able to consistently operate in the formal thinking space. The behavior you are talking about is also typical of 'absolutist' epistemic perspectives where answers are right or wrong and aren't meaningfully evaluated - just identifed as relevant or not. Evaluating the credibility of information is that it comes from a source that is trusted - most often an authority figure - it is not the role of the person knowing it.
Thanks for flagging this! That isn't a behavior I've seen before in testing, and I'd love to dig into it more to see what's happening.
Would you be able to drop me an email? My address is my HN login @github.com.
(I work on the product team for Copilot coding agent.)
Given that PRs run actions in a more trusted context for private repos, this is a bit concerning.
As we've built Copilot coding agent, we've put a lot of thought and work into our security story.
One of the things we've done here is to treat Copilot's commits like commits from a first-time contributor to an open source project.
When Copilot pushes changes, your GitHub Actions workflows won't run by default, and you'll have to click the "Approve and run workflows" button in the merge box.
That gives you the chance to run Copilot's code before it runs in Actions and has access to your secrets.
(Source: I'm on the product team for Copilot coding agent.)
The announcement https://github.blog/news-insights/product-news/github-copilo... seems to position GitHub Actions as a core part of the Copilot coding agent’s architecture. From what I understand in the documentation and your comment, GitHub Actions is triggered later in the flow, mainly for security reasons. Just to clarify, is GitHub Actions also used in the development environment of the agent, or only after the code is generated and pushed?
Nice! Thanks for that info
So like the typical junior developer, then.
No, lol. Even the enthusiastic junior developer would go around pestering people asking if the dependency is OK.
No, not at all. Why do people keep saying shit like these thought terminating sentences. Try to see the glass of Kool Aid please. People are trying to understand how to communicate important valuable things about failure states and you're advocating ignorance.
Because the marketing started with "This is literally the singularity and will take over everything and everyone's jobs".
Then people realized that was BS, so the marketing moved on to "This will enhance everyone's jobs, as a companion that will help everyone".
People also realized that was pure BS. A few more marketing rebrands later and we're at the current situation where we try to equate it to the lowest possible rung of employee they can think of, because surely Junior == Incompetent Idiot You Can't Trust Not To Waste Your Time†. The funny part is that they have been objectively and undeniably only getting better since the early days of the hype bubble, yet the push now is that they're "basically junior level!". Speaks volumes IMO, how those goal posts keep getting moved whenever people actually use these systems in the real work.
---
† IMO every single Junior I've ever worked with has been some of the best moments of my career. It allowed space for me to grow my own knowledge, while I get to talk to and help someone extremely passionate if a bit overeager. This stance on Juniors is, frankly, baffling to me because it's so far from my experiences with how they tend to work, oftentimes they're a million times better than those "10x rockstars" you hear about all the time.
for better or worse, this is literally the singularity. this thing will eat the world.
"Drowning in technical debt?"
Stop fighting and sink!
But rest assured that with Github Copilot Coding Agent, your codebase will develop larger and larger volumes of new, exciting, underexplored technical debt that you can't be blamed for, and your colleagues will follow you into the murky depths soon.
> Copilot excels at low-to-medium complexity tasks
Oh cool!
> in well-tested codebases
Oh ok never mind
As peer commenters have noted, coding agent can be really good at improving test coverage when needed.
But also as a slightly deeper observation - agentic coding tools really do benefit significantly from good test coverage. Tests are a way to “box in” the agent and allow it to check its work regularly. While they aren’t necessary for these tools to work, they can enable coding agents to accomplish a lot more on your behalf.
(I work on Copilot coding agent)
In my experience they write a lot of pointless tests that technically increase coverage while not actually adding much more value than a good type system/compiler would.
They also have a tendency to suppress errors instead of fixing them, especially when the right thing to do is throw an error on some edge case.
You can tell the AI not to suppress errors
In my experience it works well even without good testing, at least for greenfield projects. It just works best if there are already tests when creating updates and patches.
Have it write tests for everything and then you've got a well tested codebase.
Caveat empor, I've seen some LLMs mock the living hell out of everything, to the point of not testing much of anything. Something to be aware of.
I've seen too many human operators do that too. Definitely a problem to watch out for
You forgot the /s
My buddy is at GH working on an adjacent project & he hasn't stopped talking about this for the last few days. I think I've been reminded to 'make sure I tune into the keynote on Monday' at least 8 times now.
I gave up trying to watch the stream after the third authentication timeout, but if I'd known it was this I'd maybe have tried a fourth time.
Word of advice: just go to YouTube and skip the MS registration tax
What specific keynote are they referring to? I'm curious, but thus far my searches have failed
MS Build is today
I’m always hesitant to listen to the line coders on projects because they’re getting a heavy dose of the internal hype every day.
I’d love for this to blow past cursor. Will definitely tune in to see it.
>I’m always hesitant to listen to the line coders on projects because they’re getting a heavy dose of the internal hype every day.
I'm senior enough that I get to frequently see the gap between what my dev team thinks of our work and what actual customers think.
As a result, I no longer care at all what developers (including myself on my own projects) think about the quality of the thing they've built.
These do not need to be mutually exclusive. Define the quality of the software in terms of customer experience and give developers ownership to improve those markers. You can think service level objectives.
In many cases, this means pushing for more stable deployments which requires other quality improvements.
“ Copilot excels at low-to-medium complexity tasks”
Then we have very different interpretations of what constitutes a medium complexity task
Medium complexity tasks in our training set*
Which GitHub subscription level is required for the agent?
I found it very confusing - we have GH Business, with Copilot active. Could not find a way to upgrade our Copilot to the level required by the agent.
I tried using my personal Copilot for the purpose of trialing the agent - again, a no-go, as my Copilot is "managed" by the organization I'm part of.
Also, you will want to add more control over to who can assign things to Copilot Agent - just having write access to the repository is a poor descriminator, I think.
I'm running into the same issue. I think you have to upgrade your entire organization to "enterprise", which comes with a per seat cost increase (separate from the cost of copilot).
Yes, so it seems.
The biggest change Copilot has done for me so far is to have me replace my VSCode with VSCodium to be sure it doesn't sneak any uploading of my code to a third party without my knowing.
I'm all for new tech getting introduced and made useful, but let's make it all opt in, shall we?
Care to explain? Where are they uploading code to?
Whatever servers run Copilot for code suggestions
That isn't running locally
I'm building RSOLV (https://rsolv.dev) as an alternative approach to GitHub's Copilot agent.
Our key differentiator is cross-platform support - we work with Jira, Linear, GitHub, and GitLab - rather than limiting teams to GitHub's ecosystem.
GitHub's approach is technically impressive, but our experience suggests organizations derive more value from targeted automation that integrates with existing workflows rather than requiring teams to change their processes. This is particularly relevant for regulated industries where security considerations supersede feature breadth. Not everyone can just jump off of Jira on moment's notice.
Curious about others' experiences with integrating AI into your platforms and tools. Has ecosystem lock-in affected your team's productivity or tool choices?
Oh, the savings calculator in your website made me sad, that's the first time I've seen it put that way. I know it's marketing but props to you for being sincere. At least you're not hiding the intentions of your service (like others).
Yeah, the ROI calculator's target audience is the folks with the checkbook, so it needs to be a dollar figure. My _actual_ hope is that this lets engineers focus on feature work (which is typically more rewarding anyway) without constantly bashing their heads against the tech debt and maintenance work they're effectively barred from performing until it becomes emergent and actively blocking.
Why don't you focus on automating your CEO's job, a comparatively easy task compared to automating engineering tasks.
I know that's a bit kneejerk, but I actually think that's a pretty reasonable question.
Automating the reputation and network of an individual person doesn't seem like a good fit for an LLM, regardless of the person. But the _decisionmaking_ capacities for a position that's largely trend-following is something that's at the very least well-supported by interacting with a well-trained model.
In my mind, though, that doesn't look like a niched service that you sell to a company. That looks like a cofounder-type for someone with an idea and a technical background. If you want to build something but need help figuring out how to market and sell it, you could do a lot worse than just chatting with Claude right now and taking much of its advice.
That might just by my own lack of bizdev expertise, though.
These kinds of patterns allow compute to take much more time than a single chat since it is asynchronous by nature, which I think is necessary to get to working solutions on harder problems
Yes. This is a really key part of why Copilot coding agent feels very different to use than Copilot agent mode in VS Code.
In coding agent, we encourage the agent to be very thorough in its work, and to take time to think deeply about the problem. It builds and tests code regularly to ensure it understands the impact of changes as it makes them, and stops and thinks regularly before taking action.
These choices would feel too “slow” in a synchronous IDE based experience, but feel natural in a “assign to a peer collaborator” UX. We lean into this to provide as rich of a problem solving agentic experience as possible.
(I’m working on Copilot coding agent)
In the early days on LLM, I had developed an "agent" using github actions + issues workflow[1], similar to how this works. It was very limited but kinda worked ie. you assign it a bug and it fired an action, did some architect/editing tasks, validated changes and finally sent a PR.
Good to see an official way of doing this.
1. https://github.com/asadm/chota
Is there anything that satisfies the people here ? Copilot today is perhaps the only AI that is actually assisting for something productive.
Microsoft, besides maybe Google and OpenAI, are the only ones that are actually exploring towards the practical usefulness of AIs. Other kiddies like Sonnet and whatnot are still chasing meaningless numbers and benchmarking scores, that sort of stuff may appeal to high school kids or immatures but burning billions of dollars and energy resources just to sound like a cool kid?
So, fun thing.. LinkedIn doesn't use Copilot.
I recently created an course for LinkedIn Learning using generative AI for creating SDKs[0]. When I was onsite with them to record it, I found my Github Copilot calls kept failing.. with a network error. Wha?
Turns out that LinkedIn doesn't allow people onsite to to Copilot so I had to put my Mifi in the window and connect to that to do my work. It's wild.
Btw, I love working with LinkedIn and have 15+ courses with them in the last decade. This is the only issue I've ever had.. but it was the least expected one.
0: https://www.linkedin.com/learning/build-with-ai-building-bet...
> LinkedIn doesn't use Copilot
They definitely use it for full-time SWEs
Source: I work there
I don't know, I feel this is the wrong level to place the AI at this moment. Chat-based AI programming (such as Aider) offers more control, while being almost as convenient.
God save the juniors...
Anthropic just announced the same thing for Claude Code, same day: https://docs.anthropic.com/en/docs/claude-code/github-action...
And Google's version: https://jules.google
Is Copilot a classic case of slow megacorp gets outflanked by more creative and unhindered newcomers (ie Cursor)?
It seems Copilot could have really owned the vibe coding space. But that didn’t happen. I wonder why? Lots of ideas gummed up in organizational inefficiencies, etc?
This is a direct threat to Cursor. The smarter the models get, the less often programmers really need to dig into an IDE, even one with AI in it. Give it a couple of years and there will be a lot of projects that were done just by assigning tasks where no one even opened Cursor or anything.
on a other note https://github.com/github/dmca/pull/17700 GitHub's automated auto-merged DMCA sync PRs get automated copilot reviews for every single one.
AMAZING
The worst thing about LLMs getting commoditized and becoming cheaper is seeing slop like this pollute every meaningful discussion on the internet
So can I switch this to high contrast Black on White on mobile instead? I cannot read any of this (in the bright sunlight where I am) without pulling it through a reader app. People do get why books and other reading materials are not published grey on black, right?
I wonder what the coding agent story will be for bespoke hardware. For instance I'd like to test somethings out on a specific gpu which isnt available on github. Can I configure my own runners and hope for the beat? What about bespoke microcontroller?
I go back and forth between ChatGPT and copilot in vs code. It really makes the grammar guessing much easier in objc. It’s not as good on libraries and none existent on 3rd party libraries, but that isn’t maybe because I challenge it enough. It makes tons of flow and grammar errors which are so easy to spot that I end up using the code most of the time after a small correction. I’m optimistic about the future especially since this is only costing me $10 a month. I have dozens of iOS apps to update. All of them are basically productivity apps that I use and sell so double plus good.
How does that compare to using agent mode in VS Code? Is the main difference that the files are being edited remotely instead of on your own machine, or is there something different about the AI powering the remote agent compared to the local one?
> Copilot coding agent is rolling out to GitHub Mobile users on iOS and Android, as well as GitHub CLI.
Wait, is this going to pollute the `gh` tool? Please tell me this isn't happening.
Don't worry - this is 100% opt in. We've just added the ability to assign Copilot to an issue from `gh issue edit` and other similar commands.
(Source: I'm on the product team for Copilot coding agent.)
ubuntu@pc:~$ gh --help
Sure! How can I help you?
In hindsight it was a mistake that Google killed Google Code. Then again, I guess they wouldn't have put enough effort into it to develop into a real GitHub alternative.
Now Microsoft sits on a goldmine of source code and has the ability to offer AI integration even to private repositories. I can upload my code into a private repo and discuss it with an AI.
The only thing Google can counter with would be to build tools which developers install locally, but even then I guess that the integration would be limited.
And considering that Microsoft owns the "coding OS" VS Code, it makes Google look even worse. Let's see what they come up with tomorrow at Google I/O, but I doubt that it will be a serious competition for Microsoft. Maybe for OpenAI, if they're smart, but not for Microsoft.
You win some you lose some. Google could have continued with Google code. Microsoft could've continued with their phone OS. It is difficult to know when to hold and when to fold.
Gemini has some GitHub integrations
https://developers.google.com/gemini-code-assist/docs/review...
Google Cloud has a pre-GA product called "Secure Source Manager" that looks like a fork of Gitea: https://cloud.google.com/secure-source-manager/docs/overview
Definitely not Google Code, but better than Cloud Source Repositories.
Or they'll just buy Cursor
It could be an amazing product. But the aggressive marketing approach from Microsoft plastering "CoPilot" everywhere makes me want to try every alternative.
UX-wise...
I kind of love the idea that all of this works in the familiar flow of raising an issue and having a magic coder swoop in and making a pull request.
At the same time, I have been spoiled by Cursor. I feel I would end up preferring that the magic coder is right there with me in the IDE where I can run things and make adjustments without having to do a followup request or comment on a line.
I'm honestly surprised by so much hate. IMHO it's more important to look at 1) the progress we've made + what this can potentially do in 5 years and 2) how much it's already helping people write code than dismissing it based on its current state.
Lately, vscode updates are all about copilot
This is quite alarming: https://www.cursor.com/security
And this one too: https://docs.github.com/en/site-policy/privacy-policies/gith...
I have been so far disappointed by copilot's offerings. It's just not good enough for anything valuable. I don't want you to write my getter and setter. And call it a day.
I think we expected disappointment with this one. (I expected it at least)[0]
But the upgraded Copilot was just in response to Cursor and Winsurf.
We'll see.
[0] https://news.ycombinator.com/item?id=43904611
Looks like their GitHub Copilot Workspace.
https://githubnext.com/projects/copilot-workspace
I love Copilot in VSCode. I have it set to use Claude most of the time, but it let's you pick your fav LLM, for it to use. I just open the files I'm going to refactor, type into the chat window what I want done, click 'accept' on every code change it recommends in it's answer, causing VSCode to auto-merge the changes into my code. Couldn't possibly be simpler. Then I scrutinize and test. If anything went wrong I just use GitLens to rollback the change, but that's very rare.
Especially now that Copilot supports MCP I can plug in my own custom "Tools" (i.e. Function calling done by the AI Agent), and I have everything I need. Never even bothered trying Cursor or Windsurf, which i'm sure are great too, but _mainly_ since they're just forks of VSCode, as the IDE.
Try doing https://taoofmac.com/space/blog/2025/05/13/2230, you’ll have some fun,
I've come to the same conclusions mentioned in most of that and done most of that already. I was an early-adopter of LLM tech, and have my own coding agent system, written in python. Soon I'm about to port those tools over to MCP so that I can just use VSCode for most everything, and never even need my Gradio Chatbot that I wrote to learn how to write tools, and use tools.
My favorite tool that I've written is one that simply lets me specify named blocks by name, in a prompt, and AI figures out how to use the tool to read each block. A named block is defined like:
# block_begin MyBlock ...lines of code # block_end
So I can just embed those blocks around the code rather change pasting into prompts.
Have you tried the agent mode instead of the ask mode? With just a bit more prompting, it does a pretty good job of finding the files it needs to use on its own. Then again, I've only used it in smaller projects so larger ones might need more manual guidance.
I assumed I was using 'Agent mode' but now that you mentioned it, I checked and you're right I've been in 'Ask mode' instead. oops. So thanks for the tip!
I'm looking forward to seeing how Agent Mode is better. Copilot has been such a great experience so far I haven't tried to keep up with every little new feature they add, and I've fallen behind.
I find agent mode much more powerful as it can search your code base for further reference and even has access to other systems (I haven't seen exactly what is the other level of access, I'm guessing it isn't full access to the web but it can access certain only info repositories). I do find it sometime a little over eager to do instead of explain, so Ask mode is still useful when you want explanations. It also appears that agent has the search capabilities while ask does not, but it might also be something recently added to both and I just don't recall it from being in ask mode as I'm use to the past when it wasn't present.
Yeah in Ask mode, it shows me the code it's proposing, and explains it, and I have to click an icon to get it to merge the change into my code tentatively and then click "Keep" to make it actually merge into the code permanently. I kind of like that workflow a lot, but I guess you're saying Agent Mode just slams the changes into the code. I may or may not like that. Often I pick only, parts of what it's done. Thanks for the tips.
Kicking the can down the road. So we can all produce more code faster but there is NSB. Most of my time isn't spent writing the code anyway.
GitHub had this exact feature late last year itself, perhaps under a slightly different name.
I think you're probably thinking of Copilot Workspace (<https://github.blog/news-insights/product-news/github-copilo...>).
Copilot Workspace could take a task, implement it and create a PR - but it had a linear, highly structured flow, and wasn't deeply integrated into the GitHub tools that developers already use like issues and PRs.
With Copilot coding agent, we're taking all of the great work on Copilot Workspace, and all the learnings and feedback from that project, and integrating it more deeply into GitHub and really leveraging the capabilities of 2025's models, which allow the agent to be more fluid, asynchronous and autonomous.
(Source: I'm the product lead for Copilot coding agent.)
Are you thinking if Copilot Workspaces?
That seemed to drop off the Github changelog after February. I’m wondering if that team got reallocated to the copilot agent.
Probably. Also this new feature seems like an expansion/refinement of Copilot Workspaces to better fit the classic Github UX: "assign an issue to Copilot to get a PR" sounds exactly like the workflow Copilot Workspaces wanted to have when it grew up.
So far, i am VERY unimpressed by this. It gets everything completely wrong and tells me lies and completely false information about my code. Cursor is 100000000x better.
How good does your test suite and code base have to be for the agent to verify re fix properly including testing things to at can be broken else where?
Which model does it use? Will this let me select which model to use? I have seen a big difference in the type of code that different models produce, although their prompts may be to blame/credit in part.
I assume you can select whichever one you want (GPT-4o, o3-mini, Claude 3.5, 3.7, 3.7 thinking, Gemini 2.0 Flash, GPT=4.1 and the previews o1, Gemini 2.5 Pro and 04-mini), subject to the pricing multiplicators they announced recently [0].
Edit: From the TFA: Using the agent consumes GitHub Actions minutes and Copilot premium requests, starting from entitlements included with your plan.
[0] https://docs.github.com/en/copilot/managing-copilot/monitori...
At the moment, we're using Claude 3.7 Sonnet - but we're keeping our options open to experiment with other models and potentially bring in a model picker.
(Source: I'm on the product team for Copilot coding agent.)
Do you at least control the prompt?
In my experience using Claude Sonnet 3.7 in GitHub Copilot extension in VSCode, the model produced hideously verbose code, completely unnecessary stuff. GPT-4.1 was a breath of fresh air.
Nice
Check in unreviewed slop straight into the codebase. Awesome.
Copilot pushes its work to a branch and creates a pull request, and then it's up to you to review its work, approve and merge.
Copilot literally can't push directly to the default branch - we don't give it the ability to do that - precisely because we believe that all AI-generated code (just like human generated code) should be carefully reviewed before it goes to production.
(Source: I'm the product lead for Copilot coding agent.)
I'm waiting for the first unicorn that uses just vibe coding.
I expect it to be a security nightmare
And why would that matter?
> Once Copilot is done, it’ll tag you for review. You can ask Copilot to make changes by leaving comments in the pull request.
To me, this reads like it'll be a good junior and open up a PR with its changes, letting you (the issue author) review and merge. Of course, you can just hit "merge" without looking at the changes, but then it's kinda on you when unreviewed stuff ends up in main.
Management: "Why aren't you going faster now that the AI generates all the code and we fired half the dev team?"
A good junior has strong communication skills, humility, asks many good questions, has imagination, and a tremendous amount of human potential.
Has a point of view, a clear motive, ability to think holistically about things that are hard to digitize, get mad and clean up a bunch of stuff absolutely correctly because they're finally just "sick of all of this shit", or, conservatively isolates legacy code, studying it and creating buffering wrappers for the new system in pieces as the legacy issues are mitigated with a long term strategy. Each move is discussed with their peers. etc etc etc thank you for advocating sanity!
Now developers can produce 20x the slop and refactor at 5x speed.
So a 4x slowdown?
In my experience in VSCode, Claude 3.7 produced more unsolicited slop, whereas GPT-4.1 didn't. Claude aggressively paid attention to type compatibility. Each model would have its strengths.
[dead]