Today in “I’ve not been working on line-of-sight but…”, we’ll be talking about things that ended up higher priority than starting on line-of-sight :P
On Friday, I was chatting to Ree about the next tasks on the list, and he suggested that it could be helpful to convert the boards from the beta format sooner rather than waiting for the per-zone-sync. At first, I balked at this, but after a sleep, I realized that much of the work would be valid even when we come to rewrite the new persistence system.
And so that’s where I’ve been. Format conversion is always sensitive, but Ι took the opportunity to simplify how we handle it. Now, each time we need to make a new version, we’ll:
- copy all the current types into a new numbered ‘legacy’ namespace
- strip out everything beyond the deserializing code
- write a converter from that version to the latest version.
It’s dumb and involves more leg work than fancier solutions, but it works. It’s reassuring to know that you aren’t accidentally messing with something needed by an older version of the board you still need to support when making a change. We’ll see how this feels to work with, but I’ve tried a couple of other approaches before and have found them lacking. Let’s see how Ι do with this one.
After getting conversion working, I realized the codebase was in a good place for some cleanup. I had a bunch of naming (of variables and types) that needed updating, and there is not likely to be another good time to do it before the Early Access. So Ι threw a few hours at this.
I noticed a case where, on photon failing to connect, we wouldn’t retry, essentially leaving you stuck with no board. I’ve put a simple retry in on my branch and will cherrypick this into master tonight. This will be available in the next release.
During testing of board format conversions, I found a case where the physics wrapper would incorrectly schedule some jobs. This is now also fixed.
That’s the lot for now. Back soon with more
Today I started out thinking I’d be working on line-of-sight, but instead, I dug into some issues we’ve had with builds since the Unity version upgrade.
First, one was that the dummy-asset resource wasn’t loading reliably on game start. This is very odd, and it feels like it’s a Unity bug. As we are currently using the 2020.2 beta, this is entirely possible. For now, I’ve worked around it by simply exposing a SerializableField for the prefab, which, to be honest, is probably a better way to do it anyway.
Next, I noticed that even though we had supposedly moved away from our old JSON asset format, I hadn’t converted the music asset files. This work began on the TaleWeaver side, and it was straightforward to get the music info into the new binary asset format. As I was upgrading the TaleSpire side to use this new data, I was able to do some cleanup, which totally separated the old asset-data classes from TaleSpire and so now TaleWeaver can own those. This is nice as both sides can evolve their formats in response to their own needs.
I then went on the hunt for an error that occurred when deserializing hide volumes. The TLDR was that, due to having moved to C# 8, a different method overload was being selected for a specific call. It took me a little while to spot, but it was a very easy fix. The C# change that caused this is actually the one I’ve been most excited for over the last year or two, unmanaged-constructed-types.
The code cleanup was really enjoyable, and, given that it was nearly the end of the day, I decided to do some more cleanup that had been waiting for the engine to get to this point. In this case, it was on the manager that handles asset loading and the manager that handles tiles and props. They were a bit too bound together, and their respected roles weren’t always clear enough. The changes themselves aren’t worth enumerating, but I’m happy with the result.
Tomorrow I might look at line-of-sight, or I might look at laying the groundwork for the new board format and the code to handle upgrading the current boards.
Once again, today was mostly spent on physics. Having solved the dice ‘popcorning,’ we talked about yesterday, I was left with an issue that dice would slide around as if they were on ice. It looked like this:
I had assumed this was material related and, sure enough, I found several places where I had not been setting the materials properly when generating colliders. However, even with those fixed, the problem still remained.
This led me into a whole bunch of tests and tweaks which weren’t getting me very far. However, after some of the previously mentioned material fixes, I saw that things worked correctly on one of my test scenes. That scene was only using GameObjects with driven by BouncePhysics wrapped around Unity.Physics. The sliding must have been related to tile colliders, which are managed by the board and written into the physics engine separately. However, I had already set a material for all these colliders.
At some point, I finally noticed that the material I was using for the tiles (
Unity.Physics.Material.Default) had a FrictionCombinePolicy than in the material I was using for other objects. For BouncePhysics, I was using a material converted from the default material found in Unity’s classic physics engine. The result of that conversion used ‘arithmetic mean’ for friction combination, whereas the
Unity.Physics.Material.Default used geometric mean. I switched the tiles to use the converted classic material, and the sliding stopped.
This is good, of course, but what a silly mistake. I had just assumed the two would play well together.
Anyhow for completeness, here are the dice. They won’t behave like this when they ship. This is just to show the difference from the gif above.
With that done, I knew the next task was to rewrite line-of-sight, but that seemed like a pretty overwhelming way to end the day, and so I decided to play with something much more straightforward.
I’ve really wanted to experiment with adding a custom URL scheme for TaleSpire. This would mean opening a link that looked like
talespire://<something> would open TaleSpire and do something useful. My first thought was that you could get a URL for a specific point in a particular board in your campaign. You could then share that with your party or even use it in external tools (World Anvil comes to mind), so you could link points in your documents directly to the TaleSpire maps that represent them. Also, if we support links out of TaleSpire, we could get a very basic but rather compelling way to move in both directions, which isn’t tied to any specific tool or system.
To add a custom URL scheme to Windows, I needed to write some registry keys. Steam seems to support this, but when I tested the registry options of the install script, I saw only partially made registry subkey trees. This was disappointing, but they also support executing other processes as part of setup. I assumed they would need to run as administrator, so I made a little c# script to add the required registry entries. This worked great. Here it is in action:
Note: I have chopped out a bit of the middle of this video as loading takes much longer in a developer build, and it’s boring to watch
I’m not going to do any more work on this for now, as there are many more important things to do, but this was a nice change of pace for a few hours.
That’s all for now.
Internally we have been calling this bug “popcorn dice”, and it’s been driving me crazy for a while.
As we have our own wrapper around Unity’s new physics engine, I had assumed that I was just ‘driving’ it incorrectly. So today, I sat down and read how Unity’s ECS hooks into the physics. I had hoped the mistake was how I was loading in the physics-body data, or when reading it out after the physics step. Alas, no such luck.
I saw that they kept track of their fixed steps in a slightly different way than I did. It should have resulted in identical behavior, but just to be sure, I followed their example. Again no dice (har har).
I read that damn code forwards and backward and finally remembered that I had a test scene that had been surprisingly stable this whole time. It’s the one I showed in an update a few weeks back. It looked like this:
So what could this possibly have in common with the ECS tests I’d been doing, but that was also different from the main board scene. I suddenly got a hunch and couldn’t help but laugh when it proved to be correct.
The floor was too big.
YUP. So we’d been lazy and just made the board 6000x6000 units in size a looooong time ago and promptly forgot about it. The classic physics engine (bless its soul) had been doing a great job, and so we’d never seen the issue. However, with the new system, if you have mesh colliders interacting with a single box collider more than ~2000 units across, it gets very unstable and starts popcorning like we see above.
This find is a huge relief. TaleSpire obviously relies on the dice behavior being reliable and, if it were a problem with the engine itself, I’d have no idea of how to fix it. This should be trivial. We can make a smaller collider and just teleport it around where we need it.
Alright, that’s a great place to end the day.
I’ll seeya all tomorrow with something else!
Updates from the land of code.
The batching/rendering code has reached where we need it to for the Early Access. We see a three times speedup on some of our stress tests, and while there are very clear ways to get more, this endeavor has already left us two months behind schedule, so we need to press on into other features. That said, it is lovely to see all this work pay off, and I’m very excited by what we can do to get the rest of the perf that is still on the table.
During this, I hit one very annoying bug. It seems that Unity does not handle writing into IndirectArguments buffers properly on all GPUs. The GPU-frustum-culling system I wrote recently worked great on my home PC but failed silently and strangely on my laptop. I ended up using a more naive approach and having the culling compute shaders pushing values into an append-buffer and using the CopyCount method to transfer the final length to the IndirectArguments buffer afterward. This still kept things on the GPU but significantly increased the number of compute shader dispatches and buffers required to work. The good news is that it does open up some opportunities for removing some indirection in the vertex shaders, which should give some performance wins later. I think I should also be able to make a system on top of this that can reduce the number of final draw calls, but we’ll see how that goes.
Ree and I did some digging into the physics instabilities we have been seeing. Along with some simple mistakes, we remembered that the Unity.Physics API allows you to swap out the engine used behind the scenes, and that Havok has such a package. We had heard rumors that it was just plug-and-play, but when is programming ever that simple?
It was that simple. To be honest, I was floored. I changed maybe ten lines of code, and TaleSpire was using Havok. We saw that, while the stability was a little better, we were getting uncannily similar issues to using Unity’s DotsPhysics. This made us confident that the problem was how my code is driving the physics rather than the instability coming from the engine.
It should be noted that DotsPhysics was significantly faster than Havok in the stress test we threw at it. TaleSpire doesn’t have many dynamic objects, but we have hundreds of thousands of small static objects. DotsPhysics seems to be heading a very compelling direction, and I think it will carve out a very successful niche for itself.
We then upgraded to the latest version of Unity, which, in turn, allowed us to move to the latest builds of various packages, including physics. The improvements in DotsPhysics was again, notable. It seems that our strange blend of using old and new systems in Unity has paid off well as we haven’t yet run into any painful bugs on this new build. Unless anything major happens to make us have to downgrade, 2020.2 will be the version of Unity we will ship TaleSpire using.
We also merged master into the branch with all the new tech. This includes the props system. To let prop development be done in parallel, Ree has been serializing the props separately from the tile data. Now that props are shaping up well, we can see what data is needed and so we are in a good place to bring props into the same system as tiles. This will let sync, undo/redo, and all the other normal stuff work with props too.
A short time back, discussions with the community on how line-of-sight should work led me to re-think how that system should be implemented. Yesterday, Ree helped me prototype one possible version of this. It relied on us not using the BatchRendererGroup in ShadowsOnly mode but instead discarding fragments in passes where we didn’t want to write to depth. It worked well in small scenes, but we took a significant FPS hit in our stress-test. I’m happy that we found this out very quickly and can now start the version that is a little more work but will not put additional costs on unrelated systems. That version will require us to dispatch the DrawMeshInstanced calls to render the scene to individual faces of the cubemap. We will use jobs to handle the selection & per-face culling of the tiles for these draw-calls.
Aside from this, I’ve continued work on the character controller. It’s not there yet, but we are relatively confident we can fix up the remaining issues, so it’s currently a lower priority.
That wraps up everything for last Wednesday to now. My next immediate task is to hunt down what I’m doing wrong in Physics as currently, it’s the biggest thing that stops us from play-testing this branch.
Hope you folks are doing good. Peace.
 It could be that the IndirectArguments issue was a graphics API limitation rather than a Unity one. I’d need to do more reading to find out.
Hey folks, I was meant to write this in the morning, but code is distracting :P
Yesterday I worked on dice physics. I had been concerned by the behavior of dice and wanted to find a good path forward.
To be able to see what was going on, I made this test scene. I can reset the dice by clicking the reset ‘button.’
This is one place where Unity absolutely excels. Being able to knock up tests and tooling in minutes so often can help you make headway on otherwise stubborn issues.
There are a few points that I find interesting:
DotsPhysics and ClassicPhysics do not line up in behavior. This is really important as this means that I shouldn’t expect my conversion routines to give the same results as classic as mine are directly based on the ones built into DotsPhysics.
The BouncePhysics wrapper around DotsPhysics lines up with the behavior of DotsPhysics. In that, the way they bounce and come to rest is similar. This lends support to the point above and gives some evidence that the conversion is ok.
DotsPhysics falls faster. This is because they don’t handle time properly, so faster framerate means faster simulation.
Mine falls at the same speed as ClassicPhysics. This is very important. It suggests my fixed-timestep code and use of Unity’s simulations values (like gravity) are in the right ballpark.
Once I realized that DotsPhysics doesn’t give the same bounce, drag, etc, for the same input values, I felt comfortable playing with values to find something that felt right. This clip is not final behavior but shows how a couple of minutes of tweaking can help.
Afterward, I set up some ramps and D12s, so we had a decent place to start testing when we sit down to dial this in for real.
I also saw that my interpolation code was incorrect. I found some issues, but I was struggling to match all of Unity’s behavior. I then noticed that we don’t use interpolation in the current build and so this stopped being a priority. However, it still took me a couple of hours to quit playing with it. It’s so tempting to keep going when you know you have the math right, and it’s the surrounding plumbing that is the issue.
Anyhoo that was that. Today I’ve been working on the character controller again, results are promising, but I’ll leave that until tomorrow.
With lights under control, I’ve turned my evil eye back to physics. Since I replaced the physics engine, gm-blocks and hide-volumes have not been working. This was expected, but it was time to fix them.
For sanity, I’m going to call Unity’s old physics engine ClassicPhysics, their new one DotsPhysics, and our wrapper BouncePhysics.
Hide volumes make use of the fact that, in ClassicPhysics, we could resize colliders on the fly. We needed something similar in our new system. First up was a different problem, however. When we convert Unity’s colliders to the form required by DotsPhysics, we remove the components. This means there isn’t an obvious place for the programmer to set the values. This meant BouncePhysics needed its own version of these components.
I knocked together a Sphere, Cylinder, Capsule, and Box collider components. I modified the BouncePhysicsBody to walk the hierarchy, converting the ClassicPhysics components to their BouncePhysics equivalent, and registering all these new colliders.
During this, I noticed some issues in how my code handled baking of scale when the collider is subject to multiple transformations. This sucked for a good few hours, but I was able to find code that showed how to do it properly, and I got it fixed.
With that done, I was able to hook up the hide volumes and get them working again \o/
Next, gm-blocks. There I found an interesting bug when deselecting them. First, the gm-block would be destroyed, and as that happened, it would unregister itself from BouncePhysics. The issue arose because, in DotsPhysics, you reset and pass all the RigidBodys every frame. This means that although I had unregistered the object from BouncePhysics, the RigidBody was still in the DotsPhysics collision world. I can’t just remove that one body (the only option is Reset, which clears everything). Instead, I now set a flag inside the RigidBody to state that it is unregistered and then modified the collision queries to check for this flag when collecting results. This all takes place after the updates to motion have been calculated, so there is no risk of something bouncing off this unregistered body before the next frame, and by then, it will have been cleaned up.
Of course, this check (which is essentially a comparison of a ulong) is not free. I will probably have the system only use those new collectors on frames where something has been unregistered. Luckily in TaleSpire, we don’t have much churn in the physics objects.
During this, it made sense to clean up some of the physics code in general. Until now, I had made the class that holds the board data responsible for the physics ‘world’. This wasn’t ideal, and the physics related code was very out of place. I decided to centralize it in the BouncePhysics classes. This does mean that, for now, we only allow only one board to be using physics at a time (which could matter when changing boards), but the trade-off was that I could make the API much more like ClassicPhysics. The API design is important because, as soon as I merge this branch, Ree needs to be able to get up to speed quickly. The more familiar it is, the better.
In amongst this, there was the usual smattering of bugs, confusion, and other things that slow one down. But overall, it’s felt good to have some forward momentum.
Next, I really need to find out why dice in BouncePhysics are not behaving like they did in ClassicPhysics. It could be an issue in the conversion routines or, more likely, mistakes in the code than manages fixed-timestep and interpolation. I’ll set up some test scenes where I can compare everything side-by-side, and we’ll see what we can find out.
Alright, that’s all from me for now. Seeya later
Since our move away from GameObjects, we have had to make our own batching, culling, and animation systems. One that I had left for a while was lighting, so it was time to tackle that.
First off, the deeply annoying news. Unlike with the BatchRendererGroup, which gave us a way to avoid GameObjects and populate the data from multiple threads, there is no similar thing for lights. In fact, when I looked into Unity’s ECS, they were just spawning GameObjects for the lights and updating their transform each frame to be in line with the entities that supposedly contained them.
This is a pain, but it’s the only option, so we needed something similar.
As always, the fact that TaleSpire is a user-generated-content game means we have a bunch of complexity. We don’t know in advance how many lights are about to be spawned. Worse, because we are back using GameObjects, we know that spawning too many per frame will cause serious fps issues. This means we need a progressive spawning approach and pools for reusing light that that been destroyed.
Also, spaghet scripts are allowed to update light color, intensity, position, and rotation on a per-frame basis. Now position and rotation updates can be jobified if we put the light’s Transform in a TransformAccessArray. However, color and intensity can only be set from the main thread… joy >:[
On each change to a zone, we rebuild all the batches, including laying out the lights again. We have jobs to:
- find out what tiles and pros are using lights
- collate the numbers of each kind of light
- write the details for each instance of each kind of light (in parallel) into various collections.
We push all of the current lights back into the pools for reuse and, over the course of multiple frames, spawn new lights or pull them from the pools.
Dynamic lights are updated by Spaghet scripts. If there is a change to the intensity or color, we push the index of the modified light into a NativeQueue. That queue is consumed on the main thread where we can apply the changes.
Getting all of this to play nice has taken me the last 3-4 days. There is still plenty of room for improving performance, but I need to profile using real-world scenes before making any more changes. Luckily we have fantastic community sites full of slabs I can use :D
Alright, the next stop is looking at little things that broke on the move to the new physics engine.
Seeya in the next log (which will hopefully take much less than a week this time :P)
This one is interesting to write as it’s fairly downbeat. However, it is part and parcel of making stuff, so let’s do it.
As mentioned in previous posts, we have known for a while that TaleSpire would not scale to bigger boards using their GameObject abstraction. There was a lot of song and dance from Unity about their upcoming ECS, and while it looked pretty promising, we miscalculated the timeframe it would take Unity to bring it to life. That meant that this year, as we tested again, we saw we needed a custom solution. That’s what I’ve been working on.
Now I have tried hard to always be clear about when the issues we have run into have been Unity’s fault and when they have been ours. In general, the responsibilities have been ours as we are using packages marked experimental and not in general recommended for shipping. However, today’s principal issue is, I believe, not on us.
To move away from using GameObjects we have been building on the BatchRendererGroup. This object lets you specify what needs to be drawn and then provide the matrices to lay these instances out in world-space. The matrix array you fill is a NativeArray and, as such, is fully compatible with the job system. This meant that we have been able to parallelize the population of these arrays, which has resulted in the massive improvements in the performance spawning tiles we have seen in previous dev-logs.
When you are using instanced rendering, you need to somehow get the per-instance data to your shaders. Traditionally in Unity, this is done using MaterialPropertyBlocks. You can associate a MaterialPropertyBlock with an object and, when it renders, that data is made available to your shader. When using BatchRendererGroup, this requirement seems to be filled by using GetBatchVectorArray (and co).
Now here is my portion of the fuckup. When I couldn’t get this to work in my early testing, I put it down to me not being too hot with Unity’s shaders, and that I would work it out later. Unlike the new stuff in Unity, the BatchRendererGroup is not marked as experimental as so I had the same confidence in the system that I do for other released parts of Unity. I put a few questions up on the forums and got back to work. I rigged up a simple system where I looked up the data from ComputeBuffers using the instance-id as the index; this works fine without culling enabled.
The problem is that GetBatchVectorArray does not (as far as I can tell) work with the classic rendering system in Unity. This is not stated in the documentation, but in the docs for an experimental package called the ‘Hybrid Renderer,’ it mentions some things that can be construed to mean it only works with the new scriptable rendering pipelines.
UPDATE: While writing this log, I’ve seen that my bug report against this method has been accepted. I will keep you posted on how this goes.
This sucks. Worse, we know the old version won’t work; we are months into the rewrite with no way to go back. Frankly, it was rather distressing.
Now those of you who know a little about Unity might be wondering if we could just make something custom using DrawMeshInstanced or DrawMeshInstancedIndirect. The problem is that things drawn with those methods are not culled, and this is a problem as realtime lights/shadows require rendering the scene from multiple vantage points. Without culling, you end up rendering everything for each camera regardless of the direction it’s facing. This massively hurts performance. The BatchRendererGroup was excellent in this regard as it allowed us to implement the culling jobs ourselves.
One thing that the BatchRendererGroup does have is a flag that lets you specify that you only want to use it to render shadows. Another little detail is that TaleSpire’s shadows don’t need to respond to the tile drop-in/out animation we implement in the vertex shader. This gives us the possibility to keep using the BatchRendererGroup for shadows and keep the culling advantages.
However, that still leaves actually drawing the scene. The only thing I could think of was that I’d need to implement this myself. I’d use all the batching code we have written to layout data on the GPU, I’d write a GPU-frustum culling implementation that matched the one we use on the CPU side, and use DrawMeshInstancedIndirect to dispatch the draw calls.
Now I’ve never written this kind of thing before, but it felt like the only feasible option. Over a few days, I got this written and was finally able to cull our scenes again without horrible flickering.
Now the scary thing is that this was not part of the plan. It’s cost us a lot of time and how it scales is currently unknown. What I do know is now we have a whole pile of extra complexity to manage and improve. Not great.
However, there are some up-sides:
Currently, we have to use the same meshes for rendering, occlusion checks, and shadow rendering. I wanted to use different meshes for shadows and occlusion checks as this allows us to use lower-poly meshes (helping performance) and close tiny cracks the fog of war shouldn’t be able to see through. I was implementing that as part of the GPU-occlusion system. When I delayed that feature to after the Early Access release, we delayed the ‘separate shadow mesh’ feature too. With our new system, we can trivially use different meshes, so these improvements are now on the cards for Early Access.
As mentioned recently, most of Unity’s API has a max of 500 instances per batch when using non-uniform scaling. DrawMeshInstancedIndirect does not as all the data is already on the GPU. This means we have the potential to have much larger batch sizes. Currently, we batch on a per-zone basis, so we are a little limited, but it will not be hard to take zones that have not changed in a while and combine their batches. It’s something I’ll look into when I get back to improving performance.
Thirdly, we now have a bunch of tile/prop data on the GPU that we can easily write compute-shaders to operate on. This gives us more options for future developments.
So all in all, this has been pretty horrible. I’m very grateful that the community is so supportive and that the Eldritch Foundry update was so well received. That was a serious boost in a tough couple of weeks.
However, we are still very far from done. By my estimates, I was at least a month behind my personal goals before this escapade, so I’m not sure where we are now. Luckily we’ve been vague with deadlines, but we will need to sit down and look at the plan again.
What a year!
This dev log is long, so I’m gonna leave talking about lighting to the next post.
 This is important as, traditionally, most of Unity’ apis are only valid to call from the main thread.
 There is such a lot of work to do for Early Access that often I can’t afford to stress on issues that can be worked out another day. The important thing is to be making progress somewhere as often another day I’ll have had time for the solution to come to me.
 Culling means some instances are not being drawn and so the instance-id for a given tile can be different from frame to frame.
A quick tip when dealing with BatchRendererGroup. The instance limit per batch is 1023, but the UNITY_INSTANCED_ARRAY_SIZE limitation still applies, and so on desktops, this will likely cut your max batch size down to 500. This matters if you rely on instance-ids. I saw some very odd stuff when a single batch went over 511 instances; what was happening was that Unity was splitting the batch in two, one for the first 511 instances and one for the last one. This meant the instance-id for the 512th instance was 0 rather than 511, and this naturally resulted in incorrect behavior when using it to index into a buffer.
That was an afternoon of pain explained after a quick trip to the frame profiler, I was so concerned that my batching code was wrong that I didn’t consider the instance-ids not being what I expected. I can’t wait to know more of Unity’s internals as these issues are such a pain and seem avoidable.
Another node that I added last week is the random node. It can produce random floats, int, and vectors of each. It uses a simple stateless random function I’ve used in shaders before, which means it would be terrible for security, but is perfectly fine for visuals. The node has an option called ‘Update Mode’ that might be worth mentioning.
By default, if you give the node the same input (the seed), you are going to get the same result on the output. You will probably feed time into the input to get different results each frame. However, if you are using global time, then the output will be the same for all instances of this script running on all assets this frame. That might not be what you are going for! This is where the ‘Update Mode’ comes into play. You can set it to global, which gives the behavior described above, or you can pick one of two ‘local’ options, which both modify the seed in a way that will give different results. The ‘Local’ option means that it mixes in the tile’s unique id, so now the result will be different from other tiles running the same script. Note, though, that each asset within a tile can have a script, and maybe you don’t want those assets to get the same result from the same node. To handle this case, you have ‘Asset Local’, which also mixes the asset’s index into the seed along with the tile’s id.
Alright, back to work. Seeya!
 Unity’s use of constant-buffers for its object-to/from-world matrices and constant-buffers have a max guaranteed size of 65k. See section 1.4 over here for some interesting details.