From the Burrow

The long path to shader debugging

2017-09-18 09:32:42 +0000

Writing shaders (in lisp or otherwise) is fun, however debugging them is not. Where on the CPU we get exceptions or error codes, on the gpu we get silence and undefined behavior. I really felt this when trying (and failing) to implement procedural terrain generation on the livestream. I tried to add additional outputs so that I could inspect the values but it was very easy to make a mistake and change the behavior of the shader..or worse to forget it was there and waste time debugging a side effect from the instrumentation. I need a more reliable way to get values back to the CPU. Luckily CEPL has some great places we can hide this logic.

Quick recap, in CEPL we define GPU functions and then compose them into a pipeline using defpipeline-g:

(defpipeline-g some-pipeline ()
  (vertex-stage :vec4)
  (fragment-stage :vec2))

This is a macro that generates a function called some-pipeline that does all the wrangler to make the gl draw call. You then use it by using map-g

(map-g #'some-pipeline vertex-data)

This is another macro that expands into some plumbing and (ultimately) a call to the some-pipeline function.

Putting aside other details what we have here is 2 places we can inject code, one in the function body and one at the function call-site. This gives us tonnes of leverage.

My goal is to take some gpu-function like this:

(defun-g qkern ((tc :vec2) &uniform (tex :sampler-2d) (offset :vec2))
  (+ (* (texture tex (- tc offset)) 0.3125)
     (* (texture tex tc) 0.375)
     (* (texture tex (+ tc offset)) 0.3125)))

And add calls to some function we will call peek.

(defun-g qkern ((tc :vec2) &uniform (tex :sampler-2d) (offset :vec2))
  (+ (peek (* (texture tex (peek (- tc offset))) 0.3125))
     (* (texture tex tc) 0.375)
     (* (texture tex (+ tc offset)) 0.3125)))

Peek will capture the value at that point and make it available for inspection from the CPU side of your program.

The way we can do it is to:

  • compile the shader normally (we need to do this anyway)
  • inspect the AST for calls to peek and the types of the argument
  • create a new version of the shader with peek replaced with the instrumenting code

For example:

(defun-g qkern ((tc :vec2) &uniform (tex :sampler-2d) (offset :vec2))
  (let (((dbg-0 :vec2))
        ((dbg-1 :vec4)))
    (+ (setf dbg-1 (* (texture tex (setf dbg-0 (- tc offset))) 0.3125))
       (* (texture tex tc) 0.375)
       (* (texture tex (+ tc offset)) 0.3125))
    (values dbg-0 dbg-1)))

This code will work mostly the same way except that it will be returning the captured values instead of the original one. I say ‘mostly’ as now the code that doesnt contribute to the captured values is essentially dead code and it is likely that the GLSL compiler will strip chunks of it.

So now we have an augmented shader stage as well as the original, defpipeline-g can generate, compile and store these and on each map-g it can make 2 draw calls. First the debug one capturing the results using transform-feedback (for the vertex stages) and FBOs for the fragment stage. Because map-g is also a macro we use it to implicitly pass the thread-local ‘CEPL Context’ object to the pipeline function. This lets us write debug values into a ‘scratch’ buffer stored on the context making the whole process transparent.

With this data available we can then come up with nice ways to visualize it. Just dumping it to the REPL will usually be a bad move as a single peek in a fragment shader is going to result in a value for every fragment, which (at best) means 2073600 values for a 1920x1080 render target.

There are a lot of details to work out to get this feature to work well[0], however it could be a real boost in getting real data[1] back from these pipelines and can work on all GL versions CEPL supports.

Seeya next week, Peace.

[0]: transform feedback only works from the last implemented vertex stage, so if you have vertex, tessellation & geom stages, only geom can write to the transform feedback buffer. [1]: Another option was to compile the lisp like shader language to regular lisp. However implementing the GLSL standard library exactly is hard and it’s impossible to capture all the gpu/manufacturer specific quirks.

More small steps

2017-09-11 09:28:06 +0000

Over the weekend I got a little lisping done and was working on something that has been rolling around my head for a couple of years.

During the standardization process of lisp as well as agreeing on what would go in, there were also things cut. Some of those things have become de facto standards as all the implementations ship them, however some seem rather fundamental.

One of the more fundamental ones that didn’t make it was the idea of introspectable (and extensible) environment objects.

The high level view goes something like this: An environment object is a set of lexical bindings, having access to this (and any metadata about those bindings) would allow you to do more semantic analysis of the code. Given that any macro is allowed access to the environment object when evaluated this would allow a macro to expand differently depending on the data in the environment.

For example let’s say that we use the environment to store static type information in the environment; we could then potentially optimize certain function calls within that scope using this information (like using static dispatch on a generic function call).

Now a while back stassats kindly shared a snippet which allows you to essentially define a variable who’s value is available during macro expansion. Over the weekend I’ve been playing with this to provide a way to allow the following.

(defmacro your-macro-here (&environment env x y)
  (with-ext-env (env)
    (augment-environment env :foo 10)
    `(+ x y)))

So you can wrap the code in your macro in with-ext-env and this lets you get access to a user-extensible environment object. We would then provide functions (like augment-environment) to modify the environment, in the above code to store the value 10 against the key :foo however we could use this for type info.

The downsides are that we don’t get all the data that was potentially available in the proposed feature. I’d really like to have access to all the standard declarations made as well as our additional ones.

Luckily it’s possible to make a new CL package with modified versions of let, labels, etc and in those to capture the metadata and make it available to our extensible environment.

With this we may be able to make something that convincingly does the job of a extensible macro environment. I have made a prototype of the meat (the passing of the environment itself) and so next is it wrap the other things into a package and then see if it is useful.

Other than this I’ve been poking around a little with Unity. It’s fun to see how it’s done in the big leagues and to see where ones approach aligns and diverges from a larger player’s philosophy.

That’s all for now,

Seeya!

Rolling forward

2017-09-08 11:24:57 +0000

Once again I don’t have much to report but things are going ok.

The streams are still going and still going well, this week was using a cute approach from http://nullprogram.com/blog/2014/06/01/ to make voronoi diagrams. You can see that stream here

I also revisited my bindings to the Newton Dynamics physics engine. Last time I had tested it seem they were abysmally slow. Luckily a quick review showed me that the ‘max fps’ for the simulation was set too low and that to something sensible made a world of difference. Some profiling also revealed that the bindings have a non-trivial amount of overhead which I believe I can remove by declaring the types and turning up the performance optimizations.

That’s all for now Ciao

Back from wherever

2017-08-30 11:18:37 +0000

2 days back I started getting into coding in the evenings again. For a while my brain has not been into it but it looks like I might be back at last.

So far I havent done much but tonight I’ll be doing another stream where I will try to implement boids.

That’s all for now, except that if you are an old fan of Command & Conquer you can get the whole game here for free http://nyerguds.arsaneus-design.com/cnc95upd/cc95p106/ . It looks sketchy but it’s legit and DAMN it’s fun :D

Peace all

Scrabbling back onto the wagon

2017-08-10 09:07:31 +0000

I fucked up for a few weeks by not doing these. Sorry about that.

Update will be easy as I have been in ‘absorb phase’ for a few weeks and so I’ve not been coding much in my free time.

Aside from watching a few films and playing a few games my intake has mainly been around the ‘data oriented design’ space. I’ve been rewatching these:

code::dive conference 2014 - Scott Meyers: Cpu Caches and Why You Care CppCon 2014: Chandler Carruth - Efficiency with Algorithms, Performance with Data Structures CppCon 2014: Mike Acton - Data-Oriented Design and C++

And then

code::dive 2016 conference – Chandler Carruth – Making C++ easier, faster and safer (part 1) code::dive 2016 conference – Chandler Carruth –Making C++ easier, faster, safer (part 2) On “simple” Optimizations - Chandler Carruth - Secret Lightning Talks - Meeting C++ 2016 2013 Keynote: Chandler Carruth: Optimizing the Emergent Structures of C++ Things that Matter - Scott Meyers | DConf2017

I’ve also been reading the wonderful What every programmer should know about memory which is poor in name but outstanding in content. It’s really for programmers who need to get everything out of the machine and folks who need to understand why they aren’t. I’m not finished yet but its already been very helpful.

From that my interest in assembly programming was growing again. The primary reason is that lisp has a disassemble function that lets you see what your function got compiled to. Here is a destructive ‘multiply a vector3 by a float’ function:

CL-USER> (disassemble #'v3-n:*s)
; disassembly for RTG-MATH.VECTOR3.NON-CONSING:*S
; Size: 51 bytes. Origin: #x22966965
; 65:       F30F105201       MOVSS XMM2, [RDX+1]              ; no-arg-parsing entry point
; 6A:       F30F59D1         MULSS XMM2, XMM1
; 6E:       F30F115201       MOVSS [RDX+1], XMM2
; 73:       F30F105205       MOVSS XMM2, [RDX+5]
; 78:       F30F59D1         MULSS XMM2, XMM1
; 7C:       F30F115205       MOVSS [RDX+5], XMM2
; 81:       F30F105209       MOVSS XMM2, [RDX+9]
; 86:       F30F59CA         MULSS XMM1, XMM2
; 8A:       F30F114A09       MOVSS [RDX+9], XMM1
; 8F:       488BE5           MOV RSP, RBP
; 92:       F8               CLC
; 93:       5D               POP RBP
; 94:       C3               RET
; 95:       0F0B10           BREAK 16                         ; Invalid argument count trap

Cool, but naturally not much good if I can’t read it. So that is my driving factor: Understanding this ↑↑↑↑.

I have been repeatedly advised to start with something simpler that x64 asm, but i just cant find any motivation, when it comes down to it I dont want to program for the c64 any more, and arm/mips holds no appeal until I have a need for it. So against better judgement I picked up the AMD64 Architecture Programmer’s Manual Volume 1 and started reading.. and it’s nice. I quickly run into things I dont get of course but then it’s off to youtube again where kupula’s series of x86_64 Linux Assembly tutorials gave me a nice soft intro.

This is a clearly the start of a looooong road, but I’m in no rush, and everything I learn is directly helping me grok that disassembly above.

I think that’s it. I’m still streaming and still struggling with the procedural erosion stuff..turns out there are more bugs in the paper than expected :

But that’s for another day

Seeya!

Post Holiday

2017-07-18 10:28:25 +0000

I haven’t written for a while as I had a two week holiday and decided to take a break from most things except streaming. It was lovely and I got so much done that this can only really be a recap:

Performance

I made a profiler for CEPL that is plenty fast enough to use in real-time projects. I then used some macro ‘magic’ to instrument every function in CEPL, this let me target the things that were slowest in CEPL. I could happily fill a post on just this, but I will spare you that for now. Needless to say, CEPL is faster now.

I am contributing to a project called sketch by porting it to CEPL and I was really annoyed at how it was doing 90fps on one stress test and the CEPL branch was doing 20. After the perfomance work the CEPL branch was at 36fps but still sucking compared to the original branch. In a fit of confusion I commented out the rendering code and it only went up to 40fps..at which point I realized that, on master, I had been recording fps from a function that was called 3 times a frame :D so the 90fps was actually 30!

The result was pretty cool though as the embarrassment had probably pushed be to do more than I would have done otherwise.

GL Coverage

I have added CEPL support for:

  • scissor & scissor arrays
  • stencil buffers, textures & fbo bindings
  • Color & Stencil write masks
  • Multisample

and also fixed a pile of small bugs.

Streaming

Streaming is going well although removing the ‘fail privately’ from programming is something I’m still getting used to. (I LOVE programming for letting me fail so much without it all being on record)

This wednesday I’m going to try and do a bit of procedural terrain erosion which should be cool!

FBX

FBX is a proprietary format for 3D scenes which is pretty much an industry standard. It is annoyingly proprietary and the binary format for the latest version is not known. There is a zero-cost (but not open source) library from autodesk for working with fbx files but it’s a very c++ library so using it from the FFI of most languages is not an option.

Because this is a pita I’ve decided to make the simplest solution I can think of, use the proprietary library to dump the fbx file to a bunch of flat binary files with known layout. This will make it trivial to work with from any language which can read bytes from a file. I’m specifically avoiding a ‘smarter’ solution as there seem to be a bunch of projects out there in various states of neglect and so I am making something that, when autodesk inevitably make a new version of the format, I can change without much thought.

The effect on my life

It’s odd to be in this place, 5 years of on/off work on CEPL and this thing is starting to look somewhat complete. There is still a bunch of bugfixes and newer GL features to add but the original vision is pretty much there. This seems to have freed up the mind to start turning inwards again and I’ve been in a bit of a slump for the last week. These slumps are something I have become a bit more aware of over the last year and so have become a little better at noticing.. Still, as this is about struggles as well as successes and as the rest of this post is listing progress it seemed right to mention this.

Peace folks

input given

2017-06-23 09:36:19 +0000

I’m late writing this week but there has been progress.

So after the PBR fail I knew I had to put some energy in different places for a week as I needed content for the stream. I knew dealing with input was coming soon and my input system was ugly, so that was an obvious candidate.

This is probably the 4th attempt I’ve made of making an event system. The originals were to varying degrees attempts on the callback/reactive input systems but they suffered a few key problems:

  • Callbacks can be hard to reason about when there are enough of them
  • Subscribing to something means that the provider now holds a reference to the subscribee, which means it cant be freed if you forget to detach it (this is an issue as we experiment in the repl a lot so throwing things away should be easy)
  • Input propagation can end up driving the per frame execution (more on this in a minute)
  • In the versions where I made immutable events the allocation costs per frame could be very high[0]

The third one was particularly tricky, and it only became apparent to me when I had some specific use cases:

The first one was with mouse position events. The question is, which event was the last one for that frame. Let’s say I’m positioning something based on mouse position and that causes a bunch of work in other parts of the game. If I receive 10 move events in a frame I don’t want to do that work 10 times so there is going to be some kind of caching and I need to know when I have received the last event. This is a bit artificial and there are plenty of strategies around it but caching is something that ends up appearing a lot in the event propagating approach.

Next was when I had made a fairly strict entity component system and was trying it out in a little game. In this system you processed the components in batches, so you would process all of the location components and then all of the renderable components, etc. This posed a problem as events needed to be delivered to components but events were pumped at the beginning of the frame and components were processed later. I didn’t want to break the model so again I fell back to caching.

I needed something different and so the latest version, skitter, is much simpler.

First some design goals:

  • The api is pull based[1]
  • There must only be allocations happening when new input-sources (like mice, gamepads) are added or when new controls (like buttons, scroll wheels) are added to input sources.
  • skitter shouldn’t need to know about any specific input api, it’s only dealing in its own state.
  • It should be possible to both redefine controls live OR tell the system to make them static and optimize the crap out of them.

In the system you now define a control like this:

;;          name ↓↓      lisp data type ↓↓     ↓↓ initial value
(define-control wheel (:static t) single-float 0.0)

This is like defining a type in the input system, you give it a name, tell it the kind of data it holds and it’s initial value.

the :static t bit means this will be statically typed and will not be able to be changed live

this one is slightly different:

(define-control relative2 (:static t) vec2 (v! 0 0) :decays t)

Here we have told it to ‘decay’. This means that each frame the value returns to the initial value. This is because you don’t get an event from (all/any?) systems to tell you that something has stopped happening.

We can now make an input source

(define-input-source mouse ()
  (pos position2)
  (move relative2)
  (wheel wheel2)
  (button boolean-state *))

Here a mouse has an absolute position, a relative movement, a possibly 2d scroll wheel and buttons (we dont know how many so we say *)

We can then call (mouse-pos mouse-obj) to get the position.

You can then write code that takes the events from your systems (sdl2, qt, glfw, etc) and puts them into the source by using functions like (skitter:set-mouse-pos mouse-obj timestamp new-position)

Bam cross system event management. It’s dirt simple to use and the hairy logic is all handled by a few macros internally.

BUt we have missed one thing that callbacks were good for, events over time. Reactive approaches have this awesome way of composing streams of events into new events and, whilst I can’t have that, I do want something. The caching in our system poses a problem as we now only have the latest event. So rather that bringing the events to the processing code we will bring the processing to the event (or something :D).

What we do is add a logicial-control to our input-source. Maybe we want double-click for our mouse, we will make a kind of control with it’s own internal state that sets itself to true when it has seen two events within a certain timeframe:

(define-logical-control (double-click :type boolean :decays t)
    ((last-press 0 integer))
  ((button boolean-state)
   (if (< (- timestamp last-press) 10)
	   (progn
		 (setf last-press 0)
		 (fire t))
	   (setf last-press timestamp))))

This is made up & untested code, however how it should work is that we have new kind of control called double-click, it’s state is boolean. It has a private field called last-press which holds the time since the last press. It depends on the boolean-state of another control and internally we give this thing the name ‘button’. The if is going to get evaluated every time one of the dependent controls (button in this case) is updated. fire is the function you call to in a logical-control to update your own state.

What is nice with this is that you can then define attack combos as little state-machines, put them in logical controls and attach them to the controller object. You can now see the state of Ryu’s Shoryuken move in the same way you would check if a button is being held down.

The logical-control is still very much a WIP but I’m happy with the direction it is going.

That’s all for input for now, I have been doing other stuff this week like making a tablet app which lets you use sliders and pads to send data to lisp [2] and of course more streaming[3] but it’s time for me to go do other stuff.

This weekend I am going to have a go at making a little RPG engine with all this stuff. Should be fun!

[0] you could of course cache event objects, but then you need n of each type of event object (because we may dispatch on type) and we can’t make them truly immutable.

[1] I made a concession that you can have callbacks as they are used internally and it hopefully means people won’t try and hack their own in on top of the system.

[2] tablet app

[3]

Swing and a miss

2017-06-14 11:17:50 +0000

So this last weekend I worked on PBR again as there was a wonderful new tutorial out. The good news is that it cleared up a lot of points of confusion for me. The bad news is my version is still incorrect :(

I have been through every damn line of glsl to make sure that the PBR implementation itself matched the tutorial..which leaves the major possibility that it was something else all along; that some part of the deferred pass is incorrect.

It would explain a lot but also be crazy annoying.

The other possibility of course is that my implementation of PBR doesn’t match the tutorial but Im having an increasingly hard time believing that.

Other than that, streaming is going well and I am doing another one tonight, being forced to learn something well enough to explain it is good stuff.

Anyhoo, that’s this week, hopefully next time it will be better news :D

ugh

Ughhh

2017-06-06 10:25:39 +0000

I’ve been slogging through some really boring stuff these last few days. Boring but necessary.

People have been asking for a stream on using Varjo and so that’s what I’m doing tomorrow. However I’ve been more and more bothered by the fact I would do this stream and, within a month, it would be obsolete as knew I wanted to change how things were structured. This really meant I had to bite the bullet and get it done.

It’s done now but these changes affected every project I work on so naturally testing took some time. The good news is that all of these changes are in the release branches for the various projects and so will ship with the next quicklisp cycle.

Next up was documentation for CEPL. I’ve been shipping a bunch of new goodness recently so plenty of things either needed documenting or tweaking. Also the documentation generator I use now supports markdown so I went through everything reformatting and editing things.. that was terrifically boring..I finished that around 4am on Sunday. You can find the result here

Then I was informed by the lovely quicklisp folks that a couple of my libraries werent building..damn. I hadnt tested those in the rush of everything else. That took an hour to fix but it’s good now.

I also wrote the documentation for my cl-soil wrapper library (MORE BORING!) which can be found here

Making good product is hard man, I had a 3 day weekend and feel tired and a bit grumpy (oh poor me <tiny violin> :p) but at least tomorrows stream will make sense.

Except.

I may have worked out how to have &rest arguments (varargs in other languages) in Vari so I may try and hack that in tonight..which means more testing :D I’m a glutton for punishment.

There were also some other fixes this week but not interesting enough to be worth writing about now.

Until next week, Ciao.

2 days late again

2017-06-01 11:12:49 +0000

But there is news at least!

Varjo

I changed varjo’s if form so that if certain conditions are right it will emit a ternary operator instead of a full if statement in the glsl

(glsl-code
  (translate
   (make-stage :vertex '((a :int)) nil '(:450)
			   '((let ((x (if (< a 10)
							   a
							   (- a))))
				   (v! 1 2 3 4))))))
"// vertex-stage
#version 450

layout(location = 0)  in int A;

void main()
{
    int X = (A < 10) ? A : (- A);
    gl_Position = vec4(float(1),float(2),float(3),float(4));
    return;
}
"

Aside from this I fixed a few bugs and made little tweaks such as; if you use -> in a name in lisp it becomes _to_ in glsl. A tiny touch but it makes the generated code nicer to read.

Nineveh

This last week I added color space conversion gpu-functions and also functions for generating mesh data for common primitives (sphere, box, cone, etc)

Streaming

Here are my last 2 streams. I’m getting more used to this each time and it’s lovely seeing a few regulars in the chat, this week was especially lovely as there were lots of questions. Not much else to say on this other than I’m gonna try and keep up the ‘stream every Wednesday’ goal.

Ciao