Palette lighting tricks on the Nintendo 64

216 points by ibobev a day ago

accrual a day ago

It's very impressive to see "realistic" graphics on the N64. The demo reminds me of "ICO" for the PS2.

I've always wondered if it would be possible to create an SDK to abstract the N64 graphics hardware and expose some modern primitives, lighting, shading, tools to bake lighting as this demo does, etc. The N64 has some pretty unique hardware for its generation, more details on the hardware are here on Copetti.org:

https://www.copetti.org/writings/consoles/nintendo-64/

somat a day ago

Note that the N64 was designed by SGI, And seeing as how influential SGI was for 3d graphics, I sort of assume the reverse, that the n64 probably has the most standard hardware of it's generation. I would be vaguely surprised if there was not an opengl library for it.
However there is a large caveat, 1. you have to think of the system as a graphics card with a cpu bolted on. and 2. the graphics system is directly exposed.
Graphics chip architecture ends up being a ugly hateful incompatible mess, and as such the vendors of said accelerators generally tend to avoid publishing reference documents for them, preferring to publish intermediate API's instead. things like OpenGL, DirectX, CUDA, Vulcan, mainly so that under the hood they can keep them an incompatible mess(if you never publish a reference, you never have to have hardware backwards compatibility, the up side is they can create novel designs, the down side is no one can use them directly) so when you do get direct access to them, as in that generation of game console, you sort of instinctively recoil in horror.
footnote on graphics influence: OpenGL came out of SGI and nvidia was founded by ex SGI engineers.
- phire a day ago
  
  > that the n64 probably has the most standard hardware of it's generation
  The Reality Coprocessor (or RCP) doesn't look like any graphics cards that previously came out of SGI. Despite the marketing, it is not a shrunk down SGI workstation.
  It approaches the problem in very different ways is actually more advanced in many ways. SGI workstations had strict fixed function pixel pipelines, but RCP's pixel pipeline is semi-programmable. People often call describe it as "highly configurable" instead of programmable, but it was the start of what lead to modern Pixel Shaders. RCP could do many things in a single-pass which would require multiple passes of blending on a SGI workstation.
  And later SGI graphics cards don't seem to have taken advantage of these innovations either. SGI hired a bunch of new engineers (with experience in embedded systems) to create the N64, and then once the project was finished they made them redundant. The new technology created by that team never had a chance to influence the rest of SGI. I get the impression that SGI was afraid such low-cost GPUs would cannibalise their high-end workstation market.
  BTW, The console looks most like a shrunk down 90s SGI workstation is actually Sony's Playstation 2. Fixed function pixel pipeline with a huge amount of blending performance to facilitate complex multi-pass blending effects. Though, SGI wouldn't have let programmers have access to the Vector Units and DMAs like Sony did. SGI would have abstracted it all away with OpenGL
  ------------------
  But in a way, you are kind of right. The N64 was the most forwards looking console of that era, and the one that ended up the closest to modern GPUs. Just not for the reason you suggest.
  Instead, some of the ex-SGI employees that worked on the N64 created their own company called ArtX. They were originally planning to create a PC graphics card, but ended up with the contract to first create the GameCube for Nintendo (The GameCube design shows clear signs of engineers overcompensating for flaws in the N64 design). Before they could finish, ArtX were bought by ATI becoming ATI's west-coast design division, and the plans for a PC version of that GPU were scrapped.
  After finishing the GameCube, that team went on to design the R3xx series of GPUs for ATI (Radeon 9700, etc).
  The R3xx is more noteworthy for having a huge influence on Microsoft's DirectX 9.0 standard, which is basically the start of modern GPUs.
  So in many ways, the N64 is a direct predecessor to DirectX 9.0.
  - nyanpasu64 15 hours ago
    
    > The GameCube design shows clear signs of engineers overcompensating for flaws in the N64 design
    I haven't programmed for either console. Which features show this in what sense?
    
    phire 3 hours ago
    
    Both use a unified memory architecture, where the GPU and CPU share the same pool of memory.
    On the N64, the CPU always ends up bottlenecked by memory latency. The RAM latency is quite high to start with, your CPU is sitting idle for ~40 cycles if it ever misses the cache, assuming RCP is idle. If RCP is not idle, contention with can sometimes push that well over 150 cycles.
    Kaze Emanuar has a bunch of videos (like this one https://www.youtube.com/watch?v=t_rzYnXEQlE) going into detail about this flaw.
    The gamecube fixed this flaw in multiple ways. They picked a CPU with a much better cache subsystem. The PowerPC 750 had Multi-way caches instead of a direct mapped, and a quite large L2 cache. Their customisations added special instructions to stream graphics commands without polluting the caches, resulting in way less cache misses.
    And when it did cache miss, the latency to main memory is under 20 cycles (despite the Gamecube's CPU running at 5x the clock speed). The engineers picked main memory that was super low latency.
    To fix the issue of bus contention, they created a complex bus arbitration scheme and gave CPU reads the highest priority. The gamecube also has much less traffic on the bus to start with, because many components were moved out of the unified memory.
    ---------------------------
    The N64 famously had only 4KB of TMEM (texture memory). Textures had to fit in just 4KB, and to enable mipmapping, they had to fit in half that. This lead to most games on the N64 using very small textures stretched over very large surfaces with bilinear filtering, and kind of gave N64 games a distinctive design language.
    Once again, the engineers fixed this flaw in two ways. First, they made TMEM work as a cache, so textures didn't have to fit inside it. Second, they bumped the size of TMEM from 4KB all the way 1MB, which was massive overkill, way bigger than any other GPU of the era. Even today's GPUs only have ~64KB of cache for textures.
    ---------------------------
    The fillrate of the N64 was quite low, especially when using the depth buffer and/or doing blending.
    So the Gamecube got a dedicated 2MB of memory (embedded DRAM) for its framebuffer. Now rendering doesn't touch main memory at all. Depthbuffer is now free, no reason to not enable, and blending is more or less free too.
    Rasterisation was one of the major causes of bus contention on the N64, so this embedded framebuffer has a side-effect of solving bus contention issues too problem too.
    ---------------------------
    On the N64, the RSP was used for both vertex processing and sound processing. Not exactly a flaw, it saved on hardware. But it did mean any time spent processing sound was time that couldn't be spend rendering graphics.
    The gamecube got a dedicated DSP for audio processing. The audio DSP also got its own pool of memory (once again reducing bus contention).
    As for vertex processing, that was all moved into fixed function hardware. (There aren't that many GPUs that did transform and lighting in hardware. Earlier GPUs often implemented transform and lighting in DSPs (like the N64's RSP), and the industry were very quickly switching to vertex shaders)
  - midnightclubbed a day ago
    
    The RCP was actually two hardware blocks, the RDP which as you say did the fixed function (but very flexible) pixel processing and the RSP which handled command processing and vertex transformation (and audio!).
    The standard api was pretty much OpenGL, generating in-memory command lists that could be sent to the RSP.
    However the RSP was a completely programmable mips processor (with simd instructions in parallel).
    One of my favorite tricks in the RDP hardware was it used the parity bits in the rambus memory to store coverage bits for msss
    
    phire a day ago
    
    > The standard api was pretty much OpenGL
    Good point. It is the software APIs are where you do see the strong SGI influence. It's not OpenGL, but it's clearly based on their experience with OpenGL. The resulting API is quite a bit better than other 5th gen consoles.
    It's only the hardware (especially RDP) that has little direct connection to other SGI hardware.
    
    the-rc 16 hours ago
    
    The hardware folks came mostly from outside SGI and were picked especially because they had worked on cheaper systems before.
  - rasz 3 hours ago
    
    Wasnt GameCube less programmable? I remember reading about most lighting tricks accomplished with texture tricks.
- anthk a day ago
  
  Super Mario 64 has been decompiled an ported to GL 1.3.
heraldgeezer a day ago

Shadow of the Colossus... https://www.youtube.com/watch?v=xMKtYM8AzC8
- rightbyte a day ago
  
  That is very impressive for a PS2 game.
  - HideousKojima a day ago
    
    And a sequel (prequel?) to ICO, from the same devs

reidrac a day ago

I love how the post, about N64 graphic tricks, ends with the question: "Is this the future?"

echelon a day ago

The amount of indie N64 development happening right now is wild. The platform is flourishing.
The system has seen a dozen of its most popular games decompiled [1] into readable source files, which enables easy porting to PC without an emulator. It also enables a ton of mods to be written, many of which will run on the original hardware.
There are numerous Zelda fan remakes [2]. Complete games with new dungeons and storylines.
The Mario 64 scene is on fire. Kaze has deeply optimized the game [3], and is building his own engine and sequels. If you like technical deep dives into retro tech, his channel is literally golden.
Folks are making crazy demos for the platform, such as Portal [4], which unfortunately brought Valve's lawyers' attention.
Lost games, such as Rare's Dinosaur Planet [5], have leaked, been brought up to near production ready status, been decompiled, and have seen their own indie resurgence.
[1] https://wiki.deco.mp/index.php/N64
[2] https://m.youtube.com/watch?v=bZl8xKDUryI
[3] https://m.youtube.com/channel/UCuvSqzfO_LV_QzHdmEj84SQ
The whole channel is gold. He has dozens of deep dives like this: https://m.youtube.com/watch?v=DdXLpoNLywg
And his game and engine are beautiful: https://youtu.be/Drame-4ufso
[4] https://m.youtube.com/watch?v=yXzoZ2AfWwg
[5] https://m.youtube.com/watch?v=s0QSiPRmWaI
- anthk a day ago
  
  I'd love Perfect Dark backported to GL 2.1, but sadly some effects require GL 3.3 at minimum.
  - echelon a day ago
    
    A 60 fps Perfect Dark with online play would be fun.
    Curious - why the desire to have it run on GL 2.1?
    
    Levitz 12 hours ago
    
    I've played multiplayer goldeneye online.
    Turns out that perfect precision weapons on a m+kb setup are actually not much fun to play with. The movement is so limited compared to the brutal precision a mouse offers that everything just dies really really fast.
    
    anthk 17 hours ago
    
    I have an old GL 2.1 netbook which still works and renders the game perfectly fine minus the GL 3.3 FX's which are kinda like the framebuffer FX's in the N64, mappable to current day shaders. Without GL 3.3 shader effects, menus are unreadable and you loss some translucid effects. If they did a GL 2.1 backport it would be great.

typeofhuman a day ago

It blows my mind how genius these game engineers were. They dealt with so many limitations and created such imaginative and brilliant solutions.

90s_dev a day ago

Limitations demand and produce extraordinary creativity. That's the secret behind pico8 and Animal Well and so many amazing games.
I wish I didn't think of a significantly better architecture for my 2d-pixel-art-game-maker-maker this weekend. Now it'll be another month before I can release it :(
- jebarker a day ago
  
  What were the limitations for Animal Well?
  - 90s_dev a day ago
    
    - 320 x 180 screen size for starters
    - Limited map size
    - Limited color palette I think
    - and more!
    
    jebarker a day ago
    
    Were those imposed as artistic choices rather than due to hardware limitations etc? I just asked because it shipped on PC and the major consoles, so any limitations seem like they were by choice.
    
    90s_dev a day ago
    
    Yeah he talks about how it was a choice he made simply so he could get stuff done and have some end in sight.
- 01HNNWZ0MV43FF a day ago
  
  Limitations, and, popularity
  - 90s_dev a day ago
    
    Popularity comes from utility. Utility comes from the right trade offs. Limitations demand careful trade offs.
    
    01HNNWZ0MV43FF a day ago
    
    The tradeoff was that the N64 was cheap and had Pokemon on it
    
    ninjin a day ago
    
    Cheap? In its generation the Nintendo 64 was the expensive choice. Maybe not because of the console itself (price varied across its lifetime relative to the competition) but because of the cost of the games (and nearly complete lack of piracy).
    As for Pokémon, the Nintendo 64 launched in June 1996 and the first Pokémon game was Pokémon Snap released nearly three years after the console in March 1999.
    
    amaranth a day ago
    
    The N64 is older than Pokemon.
    
    bonki a day ago
    
    Not true, the N64 was released a couple of months after the first Game Boy games.
    
    anthk 17 hours ago
    
    Pokémon in Japan came much earlier. Also, the PSX was the cheap choice, among the rampant CD piracy vs the very expensive N64 cartridges.
Dwedit a day ago

This is new stuff, not stuff done during the reign of the N64.
- bob1029 a day ago
  
  Only recently did we figure out how to make Mario64 run at 30fps.
  https://news.ycombinator.com/item?id=31075622
- corysama a day ago
  
  Around the end of the PS2’s lifetime, some engine dev friends of mine figure out to do palletized spherical harmonic lighting on the PS2. That was pretty straightforward.
  What was tricky was a separate technique to get real cubemaps working on the PS2.
  Unfortunately, these came too late to actually ship in any PS2 games. The SH trick might have been used in the GameCube game “The Conduit”. Same team.
  - msk-lywenn 20 hours ago
    
    I always thought triace had shipped sh lighting on ps2 but maybe it was just a demo?
    http://research.tri-ace.com/Data/Practical%20Implementation%...
  - OCASMv2 a day ago
    
    > What was tricky was a separate technique to get real cubemaps working on the PS2.
    Any details on that?
    
    corysama a day ago
    
    If you lay out a cubemap as a 2d texture that looks literally like https://www.turais.de/content/images/size/w1000/2021/05/Stan... it's not hard, given the VU1-based triangle processing (like proto-mesh-shaders 25 years ago), to set the UVs of triangles independently to use the correct square even in the case of dynamic reflections. This doesn't do per-pixel spherical UV normalization. But, with a dense enough mesh, a linear approximation looks good enough.
    Except... The triangle UVs will often cross over between multiple squares. With the above texture, it will cross over into the white area and make the white visible on the mesh. So, you fill the white area with a duplicate of the texels from the square that is adjacent on the cube. That won't work for huge triangles that span more than 1.5 squares. But, it's good enough given an appropriate mesh.
    Probably would have been better to just use a lat-long projection texture like https://www.turais.de/content/images/size/w1600/2021/05/spru... Or, maybe store the cubemap as independent squares and subdivide any triangles that cross square boundaries.
    
    OCASMv2 a day ago
    
    Interesting, thanks!
Sharlin a day ago

I'm sure they were but, as noted, this specifically is 2025 stuff, and demoscene, not gamedev.

dejobaan a day ago

While I'm really happy we have faster systems now, there was something fun about about having to subvert constraints in games, and so satisfying and lovely when you did it right.

HN folks are probably familiar with raster interrupts (https://en.wikipedia.org/wiki/Raster_interrupt) and "racing the beam." I always associated this with the Atari 800. You weren't "supposed" to be able to do stuff like https://youtu.be/GuHqw_3A-vo?t=33, but Display List Interrupts made that possible.

What I didn't know until recently was how much Atari 2600's games owed to this kinda of craziness: https://www.youtube.com/watch?v=sJFnWZH5FXc

It's stuff like this that makes me think that if hardware stopped advancing, we'd still be able to figure out more and more interesting stuff for decades!

paulryanrogers a day ago

Demo scene and work like this is impressive. Yet I can't help but notice that it tends toward simpler more empty scenes. The kind of stuff one might expect in the background or as only a part of a game mechanic. It's as if there's just not enough resources to really make complete experiences with most of the techniques.

What I find more impressive are efforts like FastDoom or the various Mario-64 optimization projects which squeeze significantly better performance out of old hardware. Sometimes even while adding content and features. Maybe there is a connection between demo sceners and more comprehensive efforts?

kookamamie 16 hours ago

We did similar palette-based lighting techniques in our shareware game in the 90s. Basically, arranging the VGA 256-color palette so that each color we supported would have a gradient of N shades of the color. Illumination within each color could then be easily altered by adding or subtracting color indices.

heraldgeezer a day ago

I miss the PS1 and PS2 optimization. Most of them look amazing uprezzed to 1080p or 4k or more with emulation. Halo 2 era graphics in 4k is all we need imo. Yes that one is xbox but try Halo MCC Halo 2 in classic graphics. Still looks incredible.

GT3 heatwave summarizes it well.

"I showed a demo of GT3 that showed the Seattle course at sunset with the heat rising off the ground and shimmering. You can’t re-create that heat haze effect on the PS3 because the read-modify-write just isn’t as fast as when we were using the PS2. There are things like that."

https://old.reddit.com/r/ps2/comments/1cktw88/gran_turismos_...

https://youtu.be/ybi9SdroCTA?t=4103

It's not trying to emulate a real heatwave as new engines like UE5 does, that just tanks fps. It does "tricks" to do it instead. And honestly, looking at RTX tanking frame rates, I would rather have these cheap tricks.

A 299MHz MIPS runs this:

Shadow of the Colossus... https://www.youtube.com/watch?v=xMKtYM8AzC8

GoW2 https://youtu.be/IpKLwIIdvuk?si=TjifKmlYsUuvhk0F&t=970

FFXII https://youtu.be/NytHoYOs_4M?si=jE1Fxy40khEvV6Bn&t=51

GT4 https://www.youtube.com/watch?v=F6lZIxk_h9g (THE BOOTSCREEN crying)

Black (Renderware was a crazy engine) https://youtu.be/bZBjcwyq7fQ?si=Pev5ifpksJm4X6Oi&t=356

Valkyrie profile 2 https://youtu.be/9ScjO4NuUtA?si=Z29cR-hLsT2pnP2I&t=38

Rouge Galaxy https://youtu.be/iR1evzyl-7Q?si=fldm3-NnuFxOITMn&t=624

Burnout 3 https://www.youtube.com/watch?v=_r5r0nE1sA4

Jak and Daxter, Ratchet.

For GC - RE4, Metroid, The Zeldas... ofc. Looks crazy good.

I kneel.

MegaDeKay a day ago

> A 299MHz MIPS runs this:
Sorta. The GoW2 video was captured on PCSX2 and likely benefited from upscaling and other niceties in that clip. Didn't look through the rest of them. Either way, GoW2 was an incredible achievement on PS2.
anthk a day ago

With the PS2 you are right. With the PSX... so-so. Yes, it could match maybe a Pentium 90 almost 100, but a MMX pentium with 3DFX would stomp it and be on par of the N64 if not better.
MIPS CPU's are amazing, they can do wonders at low cicles. Just look at the PSP, or the SGI Irix.
Also, the PS2 "GPU" is not the same as the R4k CPU. BTW, on the PS2... the Deus Ex port sucked balls against the PC port, it couldn't fully handle the Unreal engine.
Yes, the PS2 did crazy FX, but with really small levels for the mentioned port; bear in mind DX was almost 'open word' for a huge chunk of the game.
- rasz a day ago
  
  > With the PSX... so-so. Yes, it could match maybe a Pentium 90 almost 100, but a MMX pentium with 3DFX would stomp it
  Pentium much faster than MIPS CPU for game logic, 3dfx 50 MPixels/s fillrate matches Playstations 60 MPixels/s, Pentium FPU tho is no match for Playstation GTE 90-300K triangles per second meaning you would have to rely on CPU power alone for geometry processing (like contemporary Bleem) resulting in 166-233MHz Pentium minimum requirements. MMX would be of no help here, it was barely used in few games for audio effects.
  - anthk 17 hours ago
    
    Bleem it's an emulator; it emulates the architecture, is not a virtualizer. 233 MHZ to emulate the 33 MHZ PSX seems reasonable, Windows 95/98 take up a good chunk of the CPU themselves. But, you forgot something.
    The PSX "GPU" just worked with integers and that's it. Any decent compiler such as GCC and flags like -ffast-math would emulate the both dead simple MIPS CPU and the fixed point GPU where no floats are used at all while taking tons of shortcuts. MMX? Ahem, MPEG decoding from videos. If you did things right you could even bypass the BIOS decodings and just call the OS MPEG decoding DLL's (as PPSSPP does with FFMPEG) and drop the emulation CPU usage to a halt and let your media player framework do the work for you.
    
    rasz 4 hours ago
    
    Bleem didnt need 233MHz Pentium to emulate 33MHz MIPS CPU, it needed it for the geometry (rotation, scaling). GTE 90-300K triangles per second is a LOT LOT. Geometry was the bottleneck of PC games in mid nineties. For example contemporary Quake was heavily optimized to operate on as little geometry as possible (BSP), rendering up to ~1000 triangles per second while only ever touching up to maybe 10K? triangles (I dont want to research this down to instrumenting quake code/looking at map data, Google results suggest PVS leaves are as small as hundreds of triangles) in active Potentially Visible Sets (PVS) at any given time. Playstation 1 on the other hand DGAF and could rotate/scale/light whole levels on every frame with raw power of GTE.
    MMX was meant for anything you would normally use traditional DSP (also fixed point). Intel envisioned software modems and audio processing, in reality it was criminally underused and fell into 'too much effort for little gain' hole. Intels big marketing win was paying Ubi Soft cool $1mil for POD "Designed for MMX" ad right no the box https://www.mobygames.com/game/644/pod/cover/group-3790/cove... while game implements _one optional audio filter_ using MMX. Microsoft also didnt like Intel's Native Signal Processing (NSP) initiative and killed it https://www.theregister.com/1998/11/11/microsoft_said_drop_n...
    MP3 - you could decode on Pentium ~100 so why bother, MPEG Pentium ~150 will play it flawlessly as long as graphic card can scale it in hardware. I would love to see the speed difference decoding MPEG with ffmpeg between Pentium 166 with and without MMX. Contemporary study shows up to 2x speedup in critical sections of image processing algorithms but only marginal gains for mp3/mpeg cases https://www.cs.cmu.edu/~barbic/cs-740/mmx_project.html
    >drop the emulation CPU usage to a halt
    Playstation 1 doesnt support MPEG.
    Now could you implement GTE with MMX? Certainly yes, but again why bother when already 166-233MHz CPU is enough to accomplish same thing with integer unit alone.
kjkjadksj a day ago

I still think halo 3 looks a lot better than some modern games. Stuff like blur bloom and all that grass and foliage pop in does not in fact look good. It looks worse than just turning all of that off. And I can’t appreciate a high polygon count model when the game is a high speed fps so whats the point of that either. Halo 3 texture resolution to my eye is fine. I don’t think I would notice twice or 4x the size textures. Only thing I notice is the hardware demands.