Internally we have been calling this bug “popcorn dice”, and it’s been driving me crazy for a while.
As we have our own wrapper around Unity’s new physics engine, I had assumed that I was just ‘driving’ it incorrectly. So today, I sat down and read how Unity’s ECS hooks into the physics. I had hoped the mistake was how I was loading in the physics-body data, or when reading it out after the physics step. Alas, no such luck.
I saw that they kept track of their fixed steps in a slightly different way than I did. It should have resulted in identical behavior, but just to be sure, I followed their example. Again no dice (har har).
I read that damn code forwards and backward and finally remembered that I had a test scene that had been surprisingly stable this whole time. It’s the one I showed in an update a few weeks back. It looked like this:
So what could this possibly have in common with the ECS tests I’d been doing, but that was also different from the main board scene. I suddenly got a hunch and couldn’t help but laugh when it proved to be correct.
The floor was too big.
YUP. So we’d been lazy and just made the board 6000x6000 units in size a looooong time ago and promptly forgot about it. The classic physics engine (bless its soul) had been doing a great job, and so we’d never seen the issue. However, with the new system, if you have mesh colliders interacting with a single box collider more than ~2000 units across, it gets very unstable and starts popcorning like we see above.
This find is a huge relief. TaleSpire obviously relies on the dice behavior being reliable and, if it were a problem with the engine itself, I’d have no idea of how to fix it. This should be trivial. We can make a smaller collider and just teleport it around where we need it.
Alright, that’s a great place to end the day.
I’ll seeya all tomorrow with something else!
Updates from the land of code.
The batching/rendering code has reached where we need it to for the Early Access. We see a three times speedup on some of our stress tests, and while there are very clear ways to get more, this endeavor has already left us two months behind schedule, so we need to press on into other features. That said, it is lovely to see all this work pay off, and I’m very excited by what we can do to get the rest of the perf that is still on the table.
During this, I hit one very annoying bug. It seems that Unity does not handle writing into IndirectArguments buffers properly on all GPUs. The GPU-frustum-culling system I wrote recently worked great on my home PC but failed silently and strangely on my laptop. I ended up using a more naive approach and having the culling compute shaders pushing values into an append-buffer and using the CopyCount method to transfer the final length to the IndirectArguments buffer afterward. This still kept things on the GPU but significantly increased the number of compute shader dispatches and buffers required to work. The good news is that it does open up some opportunities for removing some indirection in the vertex shaders, which should give some performance wins later. I think I should also be able to make a system on top of this that can reduce the number of final draw calls, but we’ll see how that goes.
Ree and I did some digging into the physics instabilities we have been seeing. Along with some simple mistakes, we remembered that the Unity.Physics API allows you to swap out the engine used behind the scenes, and that Havok has such a package. We had heard rumors that it was just plug-and-play, but when is programming ever that simple?
It was that simple. To be honest, I was floored. I changed maybe ten lines of code, and TaleSpire was using Havok. We saw that, while the stability was a little better, we were getting uncannily similar issues to using Unity’s DotsPhysics. This made us confident that the problem was how my code is driving the physics rather than the instability coming from the engine.
It should be noted that DotsPhysics was significantly faster than Havok in the stress test we threw at it. TaleSpire doesn’t have many dynamic objects, but we have hundreds of thousands of small static objects. DotsPhysics seems to be heading a very compelling direction, and I think it will carve out a very successful niche for itself.
We then upgraded to the latest version of Unity, which, in turn, allowed us to move to the latest builds of various packages, including physics. The improvements in DotsPhysics was again, notable. It seems that our strange blend of using old and new systems in Unity has paid off well as we haven’t yet run into any painful bugs on this new build. Unless anything major happens to make us have to downgrade, 2020.2 will be the version of Unity we will ship TaleSpire using.
We also merged master into the branch with all the new tech. This includes the props system. To let prop development be done in parallel, Ree has been serializing the props separately from the tile data. Now that props are shaping up well, we can see what data is needed and so we are in a good place to bring props into the same system as tiles. This will let sync, undo/redo, and all the other normal stuff work with props too.
A short time back, discussions with the community on how line-of-sight should work led me to re-think how that system should be implemented. Yesterday, Ree helped me prototype one possible version of this. It relied on us not using the BatchRendererGroup in ShadowsOnly mode but instead discarding fragments in passes where we didn’t want to write to depth. It worked well in small scenes, but we took a significant FPS hit in our stress-test. I’m happy that we found this out very quickly and can now start the version that is a little more work but will not put additional costs on unrelated systems. That version will require us to dispatch the DrawMeshInstanced calls to render the scene to individual faces of the cubemap. We will use jobs to handle the selection & per-face culling of the tiles for these draw-calls.
Aside from this, I’ve continued work on the character controller. It’s not there yet, but we are relatively confident we can fix up the remaining issues, so it’s currently a lower priority.
That wraps up everything for last Wednesday to now. My next immediate task is to hunt down what I’m doing wrong in Physics as currently, it’s the biggest thing that stops us from play-testing this branch.
Hope you folks are doing good. Peace.
 It could be that the IndirectArguments issue was a graphics API limitation rather than a Unity one. I’d need to do more reading to find out.
Hey folks, I was meant to write this in the morning, but code is distracting :P
Yesterday I worked on dice physics. I had been concerned by the behavior of dice and wanted to find a good path forward.
To be able to see what was going on, I made this test scene. I can reset the dice by clicking the reset ‘button.’
This is one place where Unity absolutely excels. Being able to knock up tests and tooling in minutes so often can help you make headway on otherwise stubborn issues.
There are a few points that I find interesting:
DotsPhysics and ClassicPhysics do not line up in behavior. This is really important as this means that I shouldn’t expect my conversion routines to give the same results as classic as mine are directly based on the ones built into DotsPhysics.
The BouncePhysics wrapper around DotsPhysics lines up with the behavior of DotsPhysics. In that, the way they bounce and come to rest is similar. This lends support to the point above and gives some evidence that the conversion is ok.
DotsPhysics falls faster. This is because they don’t handle time properly, so faster framerate means faster simulation.
Mine falls at the same speed as ClassicPhysics. This is very important. It suggests my fixed-timestep code and use of Unity’s simulations values (like gravity) are in the right ballpark.
Once I realized that DotsPhysics doesn’t give the same bounce, drag, etc, for the same input values, I felt comfortable playing with values to find something that felt right. This clip is not final behavior but shows how a couple of minutes of tweaking can help.
Afterward, I set up some ramps and D12s, so we had a decent place to start testing when we sit down to dial this in for real.
I also saw that my interpolation code was incorrect. I found some issues, but I was struggling to match all of Unity’s behavior. I then noticed that we don’t use interpolation in the current build and so this stopped being a priority. However, it still took me a couple of hours to quit playing with it. It’s so tempting to keep going when you know you have the math right, and it’s the surrounding plumbing that is the issue.
Anyhoo that was that. Today I’ve been working on the character controller again, results are promising, but I’ll leave that until tomorrow.
With lights under control, I’ve turned my evil eye back to physics. Since I replaced the physics engine, gm-blocks and hide-volumes have not been working. This was expected, but it was time to fix them.
For sanity, I’m going to call Unity’s old physics engine ClassicPhysics, their new one DotsPhysics, and our wrapper BouncePhysics.
Hide volumes make use of the fact that, in ClassicPhysics, we could resize colliders on the fly. We needed something similar in our new system. First up was a different problem, however. When we convert Unity’s colliders to the form required by DotsPhysics, we remove the components. This means there isn’t an obvious place for the programmer to set the values. This meant BouncePhysics needed its own version of these components.
I knocked together a Sphere, Cylinder, Capsule, and Box collider components. I modified the BouncePhysicsBody to walk the hierarchy, converting the ClassicPhysics components to their BouncePhysics equivalent, and registering all these new colliders.
During this, I noticed some issues in how my code handled baking of scale when the collider is subject to multiple transformations. This sucked for a good few hours, but I was able to find code that showed how to do it properly, and I got it fixed.
With that done, I was able to hook up the hide volumes and get them working again \o/
Next, gm-blocks. There I found an interesting bug when deselecting them. First, the gm-block would be destroyed, and as that happened, it would unregister itself from BouncePhysics. The issue arose because, in DotsPhysics, you reset and pass all the RigidBodys every frame. This means that although I had unregistered the object from BouncePhysics, the RigidBody was still in the DotsPhysics collision world. I can’t just remove that one body (the only option is Reset, which clears everything). Instead, I now set a flag inside the RigidBody to state that it is unregistered and then modified the collision queries to check for this flag when collecting results. This all takes place after the updates to motion have been calculated, so there is no risk of something bouncing off this unregistered body before the next frame, and by then, it will have been cleaned up.
Of course, this check (which is essentially a comparison of a ulong) is not free. I will probably have the system only use those new collectors on frames where something has been unregistered. Luckily in TaleSpire, we don’t have much churn in the physics objects.
During this, it made sense to clean up some of the physics code in general. Until now, I had made the class that holds the board data responsible for the physics ‘world’. This wasn’t ideal, and the physics related code was very out of place. I decided to centralize it in the BouncePhysics classes. This does mean that, for now, we only allow only one board to be using physics at a time (which could matter when changing boards), but the trade-off was that I could make the API much more like ClassicPhysics. The API design is important because, as soon as I merge this branch, Ree needs to be able to get up to speed quickly. The more familiar it is, the better.
In amongst this, there was the usual smattering of bugs, confusion, and other things that slow one down. But overall, it’s felt good to have some forward momentum.
Next, I really need to find out why dice in BouncePhysics are not behaving like they did in ClassicPhysics. It could be an issue in the conversion routines or, more likely, mistakes in the code than manages fixed-timestep and interpolation. I’ll set up some test scenes where I can compare everything side-by-side, and we’ll see what we can find out.
Alright, that’s all from me for now. Seeya later
Since our move away from GameObjects, we have had to make our own batching, culling, and animation systems. One that I had left for a while was lighting, so it was time to tackle that.
First off, the deeply annoying news. Unlike with the BatchRendererGroup, which gave us a way to avoid GameObjects and populate the data from multiple threads, there is no similar thing for lights. In fact, when I looked into Unity’s ECS, they were just spawning GameObjects for the lights and updating their transform each frame to be in line with the entities that supposedly contained them.
This is a pain, but it’s the only option, so we needed something similar.
As always, the fact that TaleSpire is a user-generated-content game means we have a bunch of complexity. We don’t know in advance how many lights are about to be spawned. Worse, because we are back using GameObjects, we know that spawning too many per frame will cause serious fps issues. This means we need a progressive spawning approach and pools for reusing light that that been destroyed.
Also, spaghet scripts are allowed to update light color, intensity, position, and rotation on a per-frame basis. Now position and rotation updates can be jobified if we put the light’s Transform in a TransformAccessArray. However, color and intensity can only be set from the main thread… joy >:[
On each change to a zone, we rebuild all the batches, including laying out the lights again. We have jobs to:
- find out what tiles and pros are using lights
- collate the numbers of each kind of light
- write the details for each instance of each kind of light (in parallel) into various collections.
We push all of the current lights back into the pools for reuse and, over the course of multiple frames, spawn new lights or pull them from the pools.
Dynamic lights are updated by Spaghet scripts. If there is a change to the intensity or color, we push the index of the modified light into a NativeQueue. That queue is consumed on the main thread where we can apply the changes.
Getting all of this to play nice has taken me the last 3-4 days. There is still plenty of room for improving performance, but I need to profile using real-world scenes before making any more changes. Luckily we have fantastic community sites full of slabs I can use :D
Alright, the next stop is looking at little things that broke on the move to the new physics engine.
Seeya in the next log (which will hopefully take much less than a week this time :P)
This one is interesting to write as it’s fairly downbeat. However, it is part and parcel of making stuff, so let’s do it.
As mentioned in previous posts, we have known for a while that TaleSpire would not scale to bigger boards using their GameObject abstraction. There was a lot of song and dance from Unity about their upcoming ECS, and while it looked pretty promising, we miscalculated the timeframe it would take Unity to bring it to life. That meant that this year, as we tested again, we saw we needed a custom solution. That’s what I’ve been working on.
Now I have tried hard to always be clear about when the issues we have run into have been Unity’s fault and when they have been ours. In general, the responsibilities have been ours as we are using packages marked experimental and not in general recommended for shipping. However, today’s principal issue is, I believe, not on us.
To move away from using GameObjects we have been building on the BatchRendererGroup. This object lets you specify what needs to be drawn and then provide the matrices to lay these instances out in world-space. The matrix array you fill is a NativeArray and, as such, is fully compatible with the job system. This meant that we have been able to parallelize the population of these arrays, which has resulted in the massive improvements in the performance spawning tiles we have seen in previous dev-logs.
When you are using instanced rendering, you need to somehow get the per-instance data to your shaders. Traditionally in Unity, this is done using MaterialPropertyBlocks. You can associate a MaterialPropertyBlock with an object and, when it renders, that data is made available to your shader. When using BatchRendererGroup, this requirement seems to be filled by using GetBatchVectorArray (and co).
Now here is my portion of the fuckup. When I couldn’t get this to work in my early testing, I put it down to me not being too hot with Unity’s shaders, and that I would work it out later. Unlike the new stuff in Unity, the BatchRendererGroup is not marked as experimental as so I had the same confidence in the system that I do for other released parts of Unity. I put a few questions up on the forums and got back to work. I rigged up a simple system where I looked up the data from ComputeBuffers using the instance-id as the index; this works fine without culling enabled.
The problem is that GetBatchVectorArray does not (as far as I can tell) work with the classic rendering system in Unity. This is not stated in the documentation, but in the docs for an experimental package called the ‘Hybrid Renderer,’ it mentions some things that can be construed to mean it only works with the new scriptable rendering pipelines.
UPDATE: While writing this log, I’ve seen that my bug report against this method has been accepted. I will keep you posted on how this goes.
This sucks. Worse, we know the old version won’t work; we are months into the rewrite with no way to go back. Frankly, it was rather distressing.
Now those of you who know a little about Unity might be wondering if we could just make something custom using DrawMeshInstanced or DrawMeshInstancedIndirect. The problem is that things drawn with those methods are not culled, and this is a problem as realtime lights/shadows require rendering the scene from multiple vantage points. Without culling, you end up rendering everything for each camera regardless of the direction it’s facing. This massively hurts performance. The BatchRendererGroup was excellent in this regard as it allowed us to implement the culling jobs ourselves.
One thing that the BatchRendererGroup does have is a flag that lets you specify that you only want to use it to render shadows. Another little detail is that TaleSpire’s shadows don’t need to respond to the tile drop-in/out animation we implement in the vertex shader. This gives us the possibility to keep using the BatchRendererGroup for shadows and keep the culling advantages.
However, that still leaves actually drawing the scene. The only thing I could think of was that I’d need to implement this myself. I’d use all the batching code we have written to layout data on the GPU, I’d write a GPU-frustum culling implementation that matched the one we use on the CPU side, and use DrawMeshInstancedIndirect to dispatch the draw calls.
Now I’ve never written this kind of thing before, but it felt like the only feasible option. Over a few days, I got this written and was finally able to cull our scenes again without horrible flickering.
Now the scary thing is that this was not part of the plan. It’s cost us a lot of time and how it scales is currently unknown. What I do know is now we have a whole pile of extra complexity to manage and improve. Not great.
However, there are some up-sides:
Currently, we have to use the same meshes for rendering, occlusion checks, and shadow rendering. I wanted to use different meshes for shadows and occlusion checks as this allows us to use lower-poly meshes (helping performance) and close tiny cracks the fog of war shouldn’t be able to see through. I was implementing that as part of the GPU-occlusion system. When I delayed that feature to after the Early Access release, we delayed the ‘separate shadow mesh’ feature too. With our new system, we can trivially use different meshes, so these improvements are now on the cards for Early Access.
As mentioned recently, most of Unity’s API has a max of 500 instances per batch when using non-uniform scaling. DrawMeshInstancedIndirect does not as all the data is already on the GPU. This means we have the potential to have much larger batch sizes. Currently, we batch on a per-zone basis, so we are a little limited, but it will not be hard to take zones that have not changed in a while and combine their batches. It’s something I’ll look into when I get back to improving performance.
Thirdly, we now have a bunch of tile/prop data on the GPU that we can easily write compute-shaders to operate on. This gives us more options for future developments.
So all in all, this has been pretty horrible. I’m very grateful that the community is so supportive and that the Eldritch Foundry update was so well received. That was a serious boost in a tough couple of weeks.
However, we are still very far from done. By my estimates, I was at least a month behind my personal goals before this escapade, so I’m not sure where we are now. Luckily we’ve been vague with deadlines, but we will need to sit down and look at the plan again.
What a year!
This dev log is long, so I’m gonna leave talking about lighting to the next post.
 This is important as, traditionally, most of Unity’ apis are only valid to call from the main thread.
 There is such a lot of work to do for Early Access that often I can’t afford to stress on issues that can be worked out another day. The important thing is to be making progress somewhere as often another day I’ll have had time for the solution to come to me.
 Culling means some instances are not being drawn and so the instance-id for a given tile can be different from frame to frame.
A quick tip when dealing with BatchRendererGroup. The instance limit per batch is 1023, but the UNITY_INSTANCED_ARRAY_SIZE limitation still applies, and so on desktops, this will likely cut your max batch size down to 500. This matters if you rely on instance-ids. I saw some very odd stuff when a single batch went over 511 instances; what was happening was that Unity was splitting the batch in two, one for the first 511 instances and one for the last one. This meant the instance-id for the 512th instance was 0 rather than 511, and this naturally resulted in incorrect behavior when using it to index into a buffer.
That was an afternoon of pain explained after a quick trip to the frame profiler, I was so concerned that my batching code was wrong that I didn’t consider the instance-ids not being what I expected. I can’t wait to know more of Unity’s internals as these issues are such a pain and seem avoidable.
Another node that I added last week is the random node. It can produce random floats, int, and vectors of each. It uses a simple stateless random function I’ve used in shaders before, which means it would be terrible for security, but is perfectly fine for visuals. The node has an option called ‘Update Mode’ that might be worth mentioning.
By default, if you give the node the same input (the seed), you are going to get the same result on the output. You will probably feed time into the input to get different results each frame. However, if you are using global time, then the output will be the same for all instances of this script running on all assets this frame. That might not be what you are going for! This is where the ‘Update Mode’ comes into play. You can set it to global, which gives the behavior described above, or you can pick one of two ‘local’ options, which both modify the seed in a way that will give different results. The ‘Local’ option means that it mixes in the tile’s unique id, so now the result will be different from other tiles running the same script. Note, though, that each asset within a tile can have a script, and maybe you don’t want those assets to get the same result from the same node. To handle this case, you have ‘Asset Local’, which also mixes the asset’s index into the seed along with the tile’s id.
Alright, back to work. Seeya!
 Unity’s use of constant-buffers for its object-to/from-world matrices and constant-buffers have a max guaranteed size of 65k. See section 1.4 over here for some interesting details.
Hey folks. For the last few days, I’ve been looking at batching again. Unity has an interesting limitation in that the max batch size is 1023. This seems to be an artifact of some decision long ago, however, we need to make sure we don’t try and make batches bigger than this. Making our code respect this is a delicate operation as it’s very multi-threaded code and the access to the data are often not protected by Unity’s safety systems (for performance reasons). This means I’ve been taking it very slowly, it’s going well, and I hope to have it finished soon.
As that is quite a short update, here is something from last week.
There is a node in Spaghet for choosing a mesh from a list. We use this for the stop-motion style effect of the fire. It looks like this in the graph.
While fixing bugs in the implementation, I had to fix things both on the TaleWeaver and TaleSpire sides, and the process got very tedious. At some point, I decided to re-prioritize work on the modding tools and hooked up Spaghet such that it works as a live previous in TaleWeaver. Here it is in action. Notice that every time we save, the changes are immediately applied.
This sped up the process a lot! I was able to find some mistakes in my animation packing code and fix those before getting back to batching.
Alright, that’s the lot for this update.
Hi everyone, last week was intense but productive. I’ve been down at Ree & Heckbo’s place, working on the engine. We had hoped to start integrating our branches, but my stuff still was not ready, so instead, we just kept working, making use of the collaboration advantages that being in the same room gives.
As there has been a bunch of stuff happening, I’m gonna spread it over a few posts.
Let’s start with something random: line-of-sight.
Line of sight checks came up in conversation in the community recently. The issue was that people have issues because one creature can block the line of sight to other creatures behind it. This is one of those things that sounds like it makes sense but just doesn’t feel right in-game. Any gaming using miniatures is always an approximation of the fantasy setting, and so sticking to it like it represented reality often doesn’t make sense.
Let’s take a concrete example. A giant is blocking the view to a goblin behind its leg. This sounds fine, but imagine the scene as it would play out in ‘real-life’. The giant would be towering above you, and as it walks, you catch glimpses of the goblin as it darts about trying to get a shot on your comrades. This fidelity is lost if we treat the table as ground truth.
Instead, it’s better to ignore iter-creature relationships when checking line-of-sight.
This poses a minor implementation issue that is worth exploring. Back in the alpha, we checked line-of-sight using collision rays. This worked, but it was inaccurate as you can only perform so many checks per frame. Back then, we checked less than 10 positions per creature, and both false positives and negatives were common. People rightly asked for a more accurate line of sight.
In response, we changed the approach to rendering the whole scene from the creature in question’s perspective. We rendered tiles and creatures into a cubemap using colors as IDs. We then used compute shaders to gather the results and report back what colors (and thus creatures) we visible. As you can see, this introduces the issue that any creature that is entirely obscured by another creature will be considered invisible.
To fix this limitation, we are going to need something new. The idea is to render the scene without creatures into the cubemap, then we will render all the creatures, but they will not write their depths (in-fact, we will probably discard all their fragments). This means that we should get a fragment shader call for every fragment in each creature that isn’t discarded by depth. We can then write the visibility info from the fragment shader into the result buffer, hopefully giving us the data we need.
This will also help fog-of-war, which uses the same visibility cubemap. (more on fog-of-war another day).
I’m gonna jump back into code now, more dev-logs coming when I take breaks!
Ree has been working full steam on the flying mechanic, and it’s looking pretty damn nice. You should get your hands on it very-soon™ if all goes well.
Work on assets has been going well too. We have some good stuff we hope to show there soon-ish also.
Before the weekend, I tried to get more dice working. However, the D12 was spinning relentlessly, so I just put that down for a bit. I feel like we are past the big hurdles with that part of the physics integration, so it should be a case of just hunting down those issues.
I also started on porting the animated fire to the new system. This requires some relatively simple changes to the batcher but also means we need to recreate some scripts in spaghet. Here is the noodly goodness as it stands now.
This will definitely be cleaner when we support collapsing subgraphs into function-nodes, but it’s not too bad even now. We are sampling a few curves in order to:
- pick the active mesh
- rotate the object which holds the light
- adjust the light’s intensity
For the curious, this compiles down to a short sequence of operations.
I still need to switch from local time to global time and add a random-number node to add some timing variation between each fire instance. I believe I still need to do a little work to handle some cross-frame state, but this hopefully won’t throw any spanners in the works.
Tomorrow I’ll pick up where I left off and try to get this lot running in TaleSpire. Except for lights, which are their own can of worms.