From the Burrow

Lisping Furiously

2018-03-25 19:23:09 +0000

This week has been pretty productive.

It started with looking into a user issue where compile times were revoltingly slow. It turns out they were generated very deeply nested chains of ifs and my code that was handling indenting was outstandingly bad, almost textbook example of how to make slow code. Anyway after getting that fixed up I spent a while scraping back milliseconds here and there.. there’s a lot of bad code in that compiler.

Speaking of that, I knock Varjo a bunch, and I’d never make it like this again, but as a vessel for learning it’s been amazing, most issues I hear about in compilers now I can at least hook somewhere in my head. It’s always ‘oh so that’s how real people do this’, very cool stuff. Also, like php, varjo is still unreasonably still providing me with piles of value; what it does is still what I wanted something to do.

After that I was looking into user extensible sequences and ended up hitting a wall in implementing something like extensible-sequences from sbcl. I really want map & reduce in Vari, but in static languages this seems to mean some kind of iterator.

It would really help to have something like interfaces but I can’t stand the idea of not being able to say how another user’s type satisfies the interface, so I started looking into adding traits :)

I was able to hack in the basics and implement a very shoddy map and could get stuff like this:

TESTS> (glsl-code
        (compile-vert () :410 nil
          (let* ((a (vector 1.0 2.0 3.0 4.0))
                 (b (mapseq #'(sin :float) a)))
            (vec4 (aref b 0)))))

"// vertex-stage
#version 410

void main()
{
    float[4] A = float[4](1.0f, 2.0f, 3.0f, 4.0f);
    float[4] RESULT = float[4](0.0f, 0.0f, 0.0f, 0.0f);
    int LIMIT = A.length();
    for (int STATE = 0; (STATE < LIMIT); STATE = (STATE++))
    {
        float ELEM = A[STATE];
        RESULT[STATE] = sin(ELEM);
    }
    float[4] B = RESULT;
    vec4 g_GEXPR0_767 = vec4(B[0]);
    gl_Position = g_GEXPR0_767;
    return;
}

but the way array types are handled in Varjo right now is pretty hard-coded when I had it making separate functions for the for loop each function was specific to the size of the array. So 1 function for int[4], 1 function for int[100] etc.. not great.

So I put that down for a little bit and, after working on a bunch of issues resulting in unnecessary (but valid) code in the GLSL, I started looking at documentation.

This one is hard. People want docs, but they also don’t want the api to break. However the project is beta and documenting things reveals bugs & mistakes. So then you have to either document something you hate and know you will change tomorrow, or change it and document once.. For a project that is purely being kept alive by my own level of satisfaction it has to be the second; so I got coding again.

Luckily the generation of the reference docs was made much easier due to Staple which has a fantastically extensible api. The fact I was able to hack it into doing what I wanted (with some albeit truly dreadful code) was a dream.

So now we have these:

Bloody exhausting.

The Vari docs are a mix of GLSL wiki text with Vari overload information & handwritten doc strings for the stuff from the Common Lisp portion of the api.

That’s all for now. On holiday for a week and I’m going to try get as much lisp stuff out of my head as possible. I want my lisp projects to be in a place where I’m focusing on fixes & enhancements, rather than features for when I start my new job.

Peace

It's been a long time

2018-03-12 12:45:12 +0000

.. how have you been?

  • glados

I fell out of the habit of writing these over christmas so it’s reall time for me to start this up again.

Currently I am reading ____ in preparation for my new job but I have also had some time for lisp.

On the lisp side I have been giving a bit more time to Varjo as there is a lot of completeness & stability things I need to work on over there. I’ve hacked a few extra sanity checks for qualifiers, however the checks are scattered around and it feels sucky so I really need to do a cleanup pass.

In good news I’ve also made a sweep through the CL spec looking for things that are worth adding to Vari. This resulted in this list and I’ve been able to add whole bunch of things. This really helps Vari feel closer to Common Lisp.

One interesting issue though is around clashes in the spec; lets take round for example. In CL round rounds to the nearest even, in GLSL it rounds in an implementation defined direction. GLSL does provide roundEven which more closely matches CL’s behavior. So the conundrum is, do we defined Vari’s round using roundEven or round. One way we may trip up CL programmers, the other way we trip up GLSL programmers, and also anyone porting code from GLSL to Vari without being aware of those rules. We can of course provide a fast-round function, but this doesnt neccessarily help with the issue of discoverability or ‘least surprise’.

I’m leaning towards keeping the GLSL meaning however, simply as we are already not writing true CL and our dialect makes sacrifices in the name of performance already. Maybe this is ok.

That’s all I’ve got for now, seeya all next week.

p.s. No stream this week as ferris is giving a rust talk here in Oslo.

p.p.s I also feel the ‘I haven’t been making lil-bits-of-lisp videos in ages’ guilt so I need to get back to that soon.

Raining Features

2017-11-27 09:08:01 +0000

Every now and again people ask about if CEPL will support compute and I’ve always said that it would happen for years. The reason is that, because compute is not part of the GL’s draw pipeline, I thought there would be a tonne of compute specific changes that would need to be made. Turns out I was wrong.

Wednesday I was triaging some tickets in the CEPL repo and saw the one for compute. I was about to ignore it when I thought that it would be nice to read through the GL wiki to see just how horrendous a job it would be. 5 minutes later I’m rather disconcerted as it looked easy.. temptingly easy.

‘Luckily’ however I really want SSBOs for writing out of compute and that will be hard .. looks at gl wiki again .. Shit.

Ok so SSBOs turned out to be a much smaller feature than expected as well. So I gave myself 24 hours, from late Friday to late Saturday to implement as many new features (sanely) as I could.

Here are the results:

Sync Objects

GL only has one of these, the fence. CEPL now has support for this too.

You make a fence with make-gpu-fence

(setf some-fence (make-gpu-fence))

and then you can wait on the fence

(wait-on-gpu-fence some-fence)

optionally with a timeout

(wait-on-gpu-fence some-fence 10000)

also optionally flushing

(wait-on-gpu-fence some-fence 10000 t)

Or you can simply check if the fence has signalled

(gpu-fence-signalled-p some-fence)

Query Objects

GL has a range of queries you can use, we have exposed them as structs you can create as follows:

(make-timestamp-query)
(make-samples-passed-query)
(make-any-samples-passed-query)
(make-any-samples-passed-conservative-query)
(make-primitives-generated-query)
(make-transform-feedback-primitives-written-query)
(make-time-elapsed-query)

To begin querying into the object you need to make the query active. This is done with with-gpu-query-bound

(with-gpu-query-bound (some-query)
  ..)

After the scope of with-gpu-query-bound the message to stop querying is in the gpu’s queue, however the results are not available immediately. To check if the results are ready you can use gpu-query-result-available-p or you can use some of the options to pull-gpu-query-result, let’s look at that function now.

To get the results to lisp we use pull-gpu-query-result. When called with just a query object it will block until the results are ready:

(pull-gpu-query-result some-query)

We can also say not to wait and CEPL will try to pull the results immediately, if they are not ready it will return nil as the second return value

(pull-gpu-query-result some-query nil) ;; the nil here means don't wait

Compute

To use compute you simply make a gpu function which takes no non-uniform arguments and always returns (values) (a void function in C nomenclature) and then make a gpu pipeline that only uses that function.

(defstruct-g bah
  (data (:int 100)))

(defun-g yay-compute (&uniform (woop bah :ssbo))
  (declare (local-size :x 1 :y 1 :z 1))
  (setf (aref (bah-data woop) (int (x gl-work-group-id)))
        (int (x gl-work-group-id)))
  (values))

(defpipeline-g test-compute ()
  :compute yay-compute)

You can the map-g over this like any other pipeline..

(map-g #'test-compute (make-compute-space 10)
      :woop *ssbo*)

..with one little difference. Instead of taking a stream of vertices we now take a compute space. This specify the number of ‘groups’ that will be working on the problem. The value has up to 3 dimensions so (make-compute-space 10 10 10) is valid.

We also soften the requirements around gpu-function names for the compute stage. Usually you have to specify the full name of a gpu function due to possible overloading e.g. (saturate :vec3) however as compute shaders can only take uniforms, and we don’t offer overloading based on uniforms, there can only be one with a given name. Because of this we allow yay-compute instead of (yay-compute).

SSBOs

The eagle eyed of you will have noticed the :ssbo qualifier in the woop uniform argument. SSBOs give you storage you can write into from a compute shader. Their api is almost identical to that of UBOs so I copied-pasted that code in CEPL and got SSBOs working. This code will most likely be unified again once I have fixed some details with binding however for now we have something that works.

This means we can take our struct definition from before:

(defstruct-g bah
  (data (:int 100)))

and make a gpu-array

(setf *data* (make-gpu-array nil :dimensions 1 :element-type 'bah))

and then make an SSBO from that

(setf *ssbo* (make-ssbo *data*))

And that’s it, ready to pass to our compute shader.

.Phew.

Yeah. So all of that was awesome, I’m really glad to have a feature land that I wasnt expecting to add for a couple more years. Of course there are bugs, the most immediately obvious is that when I tried the example above I was getting odd gaps in the data in my SSBO

TEST> (pull-g *data*)
(((0 0 0 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6
   0 0 0 7 0 0 0 8 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0 0 0
   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)))

The reason for this that I didnt understand layout in GL properly. This also means CEPL doesnt handle it properly and that I have a bug :) (this bug)[https://github.com/cbaggers/cepl/issues/193].

Fixing this is interesting as it means that, unless we force you to make a different type for each layout e.g.

(defstruct-g bah140 (:layout :std140)
  (data (:int 100)))

(defstruct-g bah430 (:layout :std140)
  (data (:int 100)))

..which feels ugly, we would need to support multiple layouts for each type. Which means the accessor functions in lisp would need to switch on this fact dynamically. That sounds slow to me when trying to process a load of foreign data quickly.

I also have a nagging feeling that the current way we marshal struct elements from c-arrays is not ideal.

This things together make me think I will be making some very breaking changes to CEPL’s data marshaling for the start of 2018.

This stuff needs to be done so it’s better we rip the band-aid off whilst we have very few known users.

News on that as it progresses.

Yay

All in all though, a great 24 hours. I’m currently learning how to extract std140 layout info from varjo types so that will likely be what I work on next Saturday.

Peace

Little bits of progress

2017-11-20 09:49:00 +0000

Hi again! This last week has gone pretty well. Shared contexts have landed in CEPL master, the only host that supports them right now is SDL2 although I want to make a PR to CEPL.GLFW as it should be easy to support there also. Glop is proving a little harder as we really need to update the OSX support, I started poking at it but it’s gonna take a while and I’ve got a bunch of stuff on my plate right now.

I started looking at multi-draw & indirect rendering in GL as I think these are the features I want to implemented next. However I immediately ran into existing CEPL bugs in defstruct so I think the next few weeks are going to be spent cleaning that up and fixing the struct related issues from github.

AAAAges back I promised beginner tutorials for common lisp and I totally failed to deliver. I had been hoping the Atom support would get good enough that we could use that in the videos. Alas that project seems to have slowed down recently[0] and my guilt finally reached the point that I had to put out something. To that end I have started making a whole bunch of little videos on random bits of common lisp. Although it doesnt achieve what I’d really like to do with proper tutorials, I hope it will help enough people on their journey into the language.

That’s all for now, Peace.

[0] although I’m still praying it get’s finished

Transform Feedback

2017-11-13 09:10:35 +0000

Ah it feels good to have some meat for this week’s writeup. In short transform feedback has landed in CEPL and will ship in the next[0] quicklisp release.

What is it?

Transform feedback is a feature that allows you to write data out from one of the vertex stages into a VBO as well as passing it on to the next stage. It also opens up the possibility of not having a fragment shader at all and just using your vertex shader like the function is a gpu based map function. For a good example of it’s use check out this great tutorial by the little grasshopper.

How is it exposed in CEPL?

In your gpu-function you simply add an additional qualifier to one or more of your outputs, like this:

(defun-g mtri-vert ((position :vec4) &uniform (pos :vec2))
  (values (:feedback (+ position (v! pos 0 0)))
		  (v! 0.1 0 1)))

Here we see that the gl_position from this stage will be captured, now for the cpu side. First we make a gpu-array to write the data into.

(setf *feedback-vec4*
	  (make-gpu-array nil :element-type :vec4 :dimensions 100))

And then we make a transform feedback stream and attach our array. (transform feedback streams can have multiple arrays attached as we will see soon)

(setf *tfs*
	  (make-transform-feedback-stream *feedback-vec4*))

And finally we can use it. Assuming the gpu function above was used as the vertex stage in a pipeline called some-pipeline then the code will look like this:

(with-transform-feedback (*tfs*)
  (map-g #'prog-1 *vertex-stream* :pos (v! -0.1 0)))

And that’s it! now the first result from mtri-vert will be written into the gpu-array in *feedback-vec4* and you can pull back the values like this:

`(pull-g *feedback-vec4*)`

If you add the feedback modifier to multiple outputs then they will all be interleaved into the gpu-array. However you might want to write them into seperate arrays, this can be done by providing a ‘group number’ to the :feedback qualifier.

(defun-g mtri-vert ((position :vec4) &uniform (pos :vec2))
  (values ((:feedback 1) (+ position (v! pos 0 0)))
		  ((:feedback 0) (v! 0.1 0 1))))

making another gpu-array

(setf *feedback-vec3*
	  (make-gpu-array nil :element-type :vec3 :dimensions 10))

and binding both arrays to a transform feedback stream

(setf *tfs*
	  (make-transform-feedback-stream *feedback-vec3*
									  *feedback-vec4*))

You can also use the same pipeline multiple times within the scope of with-transform-feedback

(with-transform-feedback (*tfs*)
  (map-g #'prog-1 *vertex-stream* :pos (v! -0.1 0))
  (map-g #'prog-1 *vertex-stream* :pos (v! 0.3 0.28)))

CEPL is pretty good at catching and explaining cases where GL will throw an error such as: not enough vbos (gpu-arrays) bound for the number of feedback targets -or- 2 different pipelines called within the scope of with-transform-feedback

More stuff

During this I ran into some aggravating issues relating to transform feedback and recompilation of pipelines, it was annoying to the point that I rewrote a lot of the code behind the defpipeline-g macro. The short version of this is that the code emitted is no longer a top-level closure and also that CEPL now has ways of avoiding recompilation when it can be proved that the gpu-functions in use havent changed.

I also found out that in some cases defvar with type declarations is faster than the captured values from a top level closure, even when they are typed. See here for a test you can run on your machine to see if you get the same kind of results.[1]

Shipping it

Like I said this code is in the branch to be picked up by the next quicklisp release. This feature will certainly have it’s corner cases and bugs but I’m happy to see this out and to have one less missing feature from CEPL.

Future work on transform feedback includes using the transform feedback objects introduced in GLv4 to allow for more advanced interleaving options and also nesting of the with-transform-feedback forms.

Next on the list for this month is shared contexts. More on that next week!

Ciao

[0] late november or early december [1] testing on my mac has given different results, use defvar and packing data in a struct is faster than multiple defvars but slower than the top level closure. Luckily it’s still on the order of microseconds a frame (assuming 5000 calls per frame) but measurable. It’s interesting to see as packed in defvar was faster on my linux desktop ¯\_(ツ)_/¯ I’ll show more data when I have it.

Multiple Contexts are ALIVE..kinda

2017-11-07 08:56:13 +0000

Allo again!

Its now November which means it’s NanoWrimo time again, each year I like to participate in spirit by picking a couple of features for projects I’m working on and hammer them out. This is well timed as I haven’t had much time for adding new things to CEPL recently.

The features I want to have by the end of the month are decent multi-context support and single stage pipelines.

Multi-Context

This stuff we have already talked about but I have bit the bullet and got coding at last. I have support for non-shared contexts now but not shared ones yet, the various hosts have mildly-annoyingly different approaches to abstracting this so I’m working on finding the sweet spot right now.

Regarding the defpipeline threading issues from last week I did in the end opt for a small array of program-ids per pipeline indexed by the cepl-context id. I can make this fast enough and the cost is constant so that feels like the right call for now. CEPL & Varjo were never written with thread safety in mind so it’s going to be a while before I can do a real review and work out what the approach should be, for now its a very ‘throw some mutexes in and hope’ situation, but it’s fine…we’ll get there :p

One side note is that all lambda-pipelines in CEPL are tied to their thread so don’t need the indirection mentioned above :)

Single Stage Pipelines

This is going to be fun. Up until now if you wanted to render a fullscreen quad you needed to:

  • make a gpu-array holding the quad data
  • make a stream for that gpu-array
  • make a pipeline with:
  • a vertex shader to put the points in clip space
  • a fragment shader

The annoying thing is that the fragment shader was the only bit you really cared about. Luckily it turns out there is a way to do this with geometry shaders and no gpu-array and it should be portable for all GL versions CEPL supports. So I’m going to prototype this out on the stream on wednesday and, assuming it works, I’ll make this into a CEPL feature soon.

That covers making pipelines with only a fragment shader of course but what about with only a vertex stage? Well transform feedback buffers are something I’ve wanted for a while and so I’m going to look into supporting those. This is cool as you can then use the vertex stage for data processing kinda like a big pmap. This could be handy when you data is already in a gpu-array.

Future things

With transform feedback support a few possibilities open up. The first is that with some dark magic we could run a number of variants of the pipeline using transform feedback to ‘log’ values from the shaders, this gives us opportunities for debugging that weren’t there before.

Another tempting idea (which is also easier) is to allow users to call a gpu-function directly. This will

  • make a temporary pipeline
  • make a temporary gpu-array and stream with just the arguments given (1 element)
  • run the code on the gpu capturing the result with a temporary transform-feedback buffer or fbo
  • convert the values to lisp values
  • dispose of are the temporaries

The effect is to allow people to run gpu code from the repl for the purposes of rapid prototyping. It obviously is useless in production because of all the overhead but being able to iterate in the repl with stuff like this could really be great.

That’ll do pig

Right, time to go.

Seeya soon

Mutliple Contexts

2017-10-18 11:41:55 +0000

This last weekend I put a little time into multi-context support in CEPL.

CEPL has it’s own context (cepl-context) class that holds both the gl-context handle[0] and also state that is cached to improve performance. cepl-contexts are passed implicitly[1] down the stack and are tied to a single thread.

Most of the work was just finding simple errors in my code an shoring them up, but I did find one tricky case and that was in pipelines. So a pipeline is usually defined in a top level declaration like so:

(defpipeline my-pipeline ()
  :vertex some-gpu-function
  :fragment some-other-gpu-function)

And this generates all the bootstrapping to compile the gpu functions, get the GL program-id, etc. However that program-id is a GL resource and belongs to a single GL context. As it is right now it’ll be the context that calls this pipeline first..ew.

So how to tackle this? We could create one program-id per context, however this means either looking up the program-id based on the context per call in a pipeline local cache..or looking up the program-id in a context local cache based on the pipeline. Neither is great, as extra lookups per call are something we should be avoiding.

Another option is to have shared GL contexts. This is nice anyway as it means we can share textures/buffers/etc between threads which I think is a nice default behavior. However even with this solution there are still issues with pipelines.

The state of a gl program object is naturally shared between the two threads too, that state includes which uniforms are bound, so if two threads try to use the same pipeline with different uniforms then we are in a fun data-racey land again.

This seem to lead back to the ‘gl program per gl context’ thing again. I’ll ponder this some more but I think it’s the only real option.

Happy to hear suggestions too,

I think that’s all for now

Peace

[0] in the future I expect I will allow multiple GL contexts per CEPL context [1] or explicitly if you prefer

Small things

2017-10-11 09:02:48 +0000

This last week hasn’t seen much exciting code so there isn’t too much to write up.

I’m still dreaming up some way to wrangle data in my games in a way that maximizes performance whilst keeping live redefinition in tact, however this isn’t even fully formed in my head yet so there is no code to show or even speak of. However I’ve been increasingly interested in relational databases recently. The fact that you only define the layout of your table data and queries, and that the system just works out what other passes as intermediate data-structures it needs to work best is pretty sweet. You can get a free book on mssql query optimizer here.

CppCon is also out, here are a few good talks I’ve been watching so far:

I’ve also just had a book on Garbage Collection delivered. YAY! It’s another one of those amazing computer systems where you get to directly impact people, but without having the deal with horrible human factors (like unicode & dates & BLEEEEGHH). I’m pretty stoked to work through this book.

Other than this researchy stuff I’ve still been streaming. Last week we played with a physics engine and tonight we are going to implement chromatic aberration :) I’m pretty happy with where the streaming has been going, the nerve wracking part of the process these days is finding things I can do in the two hours rather than the stream itself.

That’ll do for now, seeya next week

Not much this time

2017-10-02 09:12:21 +0000

My lack of focus over the weekend was disappointing so I haven’t got much to report. The one thing I did get done however was to finish adding types to my WIP lisp bindings for the newton-dynamics physics engine. This was motivated by the fact that although I had got the basics working a while back, I had seen some overhead from the lisp code; that should be minimized now.

I think I might try using the physics bindings on this week’s stream. Could be fun.

Other than that I’ve been reading and procrastinating. This book is now in my ‘to read’ list, I have no desire to make a proper database but I’m super interested in how their query planner/optimizers work.

That’s all for now, seeya!

Sketch

2017-09-25 08:58:26 +0000

This weekend I put a bit of time into Sketch which I, to my shame, have not worked on in a while. Sketch is a lovely project by Vydd which looks to sit in a similar place to processing, but in the lisp world.

A while back I was approach to look into porting it to CEPL so we could have the shader development process of CEPL in Sketch. We started by monkey-patching CEPL in which provided a fantastic test case for performance and resulted in some big refactoring and wins back in July.

Sketch was previously built on the excellent sdl2kit but there aren’t enough hooks in the projects to have them work together yet so I’m currently replacing the bootstrapping. I stripped down a bunch of code and have a test which shows things are rendering so that’s a start. However CEPL’s support for multiple contexts is untested so this project is really gonna force me to implement that well which is AWESOME. Incidentally sketch was the project that forced me to add CEPL’s multi window support (which will also get more robust as I port this).

Other than that I’m busy with other projects and ideas that may become stuff in the future, I’ve got so much to learn :) This last week has seen me binging on xerox parc related research talks (mainly smalltalk stuff) which has been building up a nice healthy level of dissatisfaction. I have proto-ideas rocking around with big ol’ gaps in their narratives, so I’m just pushing a load of chunks of software dna into my head in the hope of some aberrant collision will result in some useful mental genesis will occur. TLDR feed brain hope to shit ideas.

That’ll do for this post.

Seeya!

Mastodon