chirsz 3 hours ago

The behavior of C macros is actually described by a piece of pseudocode from Dave Prosser and it is not in the standard:

* https://www.spinellis.gr/blog/20060626/

* https://www.spinellis.gr/pubs/jrnl/2006-DDJ-Finessing/html/S...

* https://gcc.gnu.org/legacy-ml/gcc-prs/2001-q1/msg00495.html

  • viega 2 hours ago

    Wow, I'm not sure I've ever seen this (or if I did, it was 20 years ago).

    And I was definitely looking around for this kind of history when I was searching around when writing. Perhaps my google skills have decayed... or google... or both!

    Thanks very much.

fuhsnn 3 hours ago

I wonder if the author is aware of the __VA_TAIL__ proposal[1], it covered similar grounds and IMO very well thought out, but unfortunately not accepted into C2Y (judging from committee meeting minutes).

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3307.htm

  • viega 3 hours ago

    Yes, I know that it was not accepted, but do not have any color on why not. It's well thought out; but I do not think the semantics are self-evident to the average C programmer who already finds the preprocessor inscrutable.

pjsg 3 hours ago

I wept when the author mentioned implementing SHA256 in macros.

  • viega 3 hours ago

    LOL, I suffered so you didn't have to.

dandersch 4 hours ago

Related: The Preprocessor Iceberg https://jadlevesque.github.io/PPMP-Iceberg/

There you can find a recursive macro expansion implementation (as a gcc hack) that fits on a slide:

  #2""3
  
  #define PRAGMA(...) _Pragma(#__VA_ARGS__)
  #define REVIVE(m) PRAGMA(push_macro(#m))PRAGMA(pop_macro(#m))
  #define DEC(n,...) (__VA_ARGS__)
  #define FX(f,x) REVIVE(FX) f x
  #define HOW_MANY_ARGS(...) REVIVE(HOW_MANY_ARGS) \
      __VA_OPT__(+1 FX(HOW_MANY_ARGS, DEC(__VA_ARGS__)))
  
  int main () {
      printf("%i", HOW_MANY_ARGS(1,2,3,4,5)); // 5
  }
It sounds like the one in the article works for more compilers, but there doesn't seem to be a copy-pasteable example anywhere to check for myself. Also, the "Our GitHub Org" link on the site just links to github.com.
  • viega 3 hours ago

    Author of the article here.

    Absolutely, the code box under the ascii art is a complete implementation, just paste that in a C file, and then use `H4X0R_VA_COUNT(...)`.

    Or, you could follow the link the my typed variadic arguments article (from which this post forked off). The repo there is: https://codeberg.org/h4x0r/vargs

    • viega 3 hours ago

      And yes, GCC extensions are often going to be adopted in clang, but generally not the broader world of C and C++ compilers. Everything in my article conforms to the standard.

      • fuhsnn 2 hours ago

        I played with a lot of preprocessor implementations and did my own (redesigned chibicc's expansion algorithm), not many of them even have paint-blue behavior exactly right (the standard text is vague, to me it was more "matching GCC's" than "conforming to standard").

        • viega 2 hours ago

          That's interesting. I agree with you that the standards text is pretty vague. I think that's why other attempts to show how to do this kind of thing don't get deep enough on the semantics, and why I adopted a "try it and see" strategy.

          I do try to avoid this kind of thing unless necessary, so I don't have experience as to where the different compilers will fall down on different corner cases. I'd find it very interesting though, so please do share if you kept any record or have any memory!

stevefan1999 3 hours ago

I used to write a preprocessor until I noticed those kind of thing...I stopped writing it after that

russfink 3 hours ago

Is this a DoS risk - code that sends your build chain into an infinite loop?

  • sltkr 36 minutes ago

    From a DoS risk perspective there is no practical difference between an infinite loop, or a finite but arbitrarily large loop, which was always possible.

    For example, this doesn't work:

        #define DOUBLE(x) DOUBLE(x) DOUBLE(x)
        DOUBLE(x)
    
    That would only expand once and then stop because of the rule against repeated expansion. But nothing prevents you from unrolling the first few recursive expansions, e.g.:

        #define DOUBLE1(x) x x
        #define DOUBLE2(x) DOUBLE1(x) DOUBLE1(x)
        #define DOUBLE3(x) DOUBLE2(x) DOUBLE2(x)
        #define DOUBLE4(x) DOUBLE3(x) DOUBLE3(x)
        DOUBLE4(x)
    
    This will generate 2^4 = 16 copies of x. Add 60 more lines to generate 2^64 copies of x. While 2^64 is technically a finite number, for all practical purposes it might as well be infinite.
  • saghm 3 hours ago

    Without any specific implementation of a constraint it certainly can happen, although I'm not totally sure that it's something to be concerned about in terms of a DOS as much as a nuisance when writing code with a bug in it; if you're including malicious code, there's probably much worse things it could do if it actually builds properly instead of just spinning indefinitely.

    Rust's macros are recursive intentionally, and the compiler implements a recursion limit that IIRC defaults to 64, at which point it will error out and mention that you need to increase it with an attribute in the code if you need it to be higher. This isn't just for macros though, as I've seen it get triggered before with the compiler attempting to resolve deeply nested generics, so it seems plausible to me that C compilers might already have some sort of internal check for this. At the very least, C++ templates certainly can get pretty deeply nested, and given that the major C compilers are pretty closely related to their C++ counterparts, maybe this is something that exists in the shared part of the compiler logic.

    • viega 2 hours ago

      C++ also has constexpr functions, which can be recursive.

      All code can have bugs, error out and die.

      There are lots of good reasons to run code at compile time, most commonly to generate code, especially tedious and error-prone code. If the language doesn't have good built-in facilities to do that, then people will write separate programs as part of the build, which adds system complexity, which is, in my experience, worse for C than for most other languages.

      If a language can remove that build complexity, and the semantics are clear enough to the average programmer (For example, Nim's macro system which originally were highly appealing (and easy) to me as a compiler guy, until I saw how other people find even simple examples completely opaque-- worse than C macros.

    • WalterBright 2 hours ago

      D doesn't have macros, quite deliberately.

      What it does have are two features:

      1. compile time evaluation of functions - meaning you can write ordinary D code and execute it at compile time, including handling strings

      2. a "mixin" statement that has a string as an argument, and the string is compiled as if it were D source code, and that code replaces the mixin statement, and is compiled as usual

      Simple and easy.

  • viega 3 hours ago

    No. Other modern languages have strong compile-time execution capabilities, including Zig, Rust and C++. And my understanding is that C is looking to move in that direction, though as with C++, macros will not go away.

hyperhello 4 hours ago

Can I use this technique to expand MACRO(a,b,c,…) into something like F(a,b,c…); G(a,b,c…)?

  • viega 3 hours ago

    That's just:

    ``` #define MACRO(...) F(__VA_ARGS__); G(__VA_ARGS__) ```

    The technique in the article is more often used to type check the individual parameters, or wrap a function call around them individually, etc.

    • hyperhello 3 hours ago

      Ok. How about into F(a,”a”);F(b,”b”);etc.

      The problem being automating enums and their names in one call. Like MACRO(a,b,c) and getting a map from a to “a”.

      • viega 3 hours ago

        100%, that's definitely easy to do once you understand the technique.

        • hyperhello 3 hours ago

          Please?

          • viega 2 hours ago

            I'm on my phone, but if you start with the top 8 lines in the code box under the ascii art, you'll get an implementation of `H4X0R_MAP()`; the bottom two lines are an example, and you can just write yourself a body that produces one term. Only thing you need to know beyond that is the stringify operator.

          • viega 2 hours ago

            And I should say, if you want to apply the same transformation to arguments twice, but call F() separately from G() per your starting example, you'd just apply your map twice in your top-level macro.

            • WalterBright 2 hours ago

              > if you want to apply the same transformation to arguments twice, but call F() > separately from G() per your starting example, you'd just apply your map twice in your top-level macro

              My brain just blew a fuse.

WalterBright 2 hours ago

Imagine trying to implement the C preprocessor. I had to write it from scratch 3 times before it worked 100%.

  • viega 2 hours ago

    Wow, you are a braver person than I. Well done.