This week has been dedicated to research and planning, and it’s been fantastic.
We have a lot of things that need to be done by Early Access. Between our plans, user requests, and polish, we definitely have our work cut out for us.
Luckily some features are also polish, probably none more so than performance. We’ve known for a long time that we would need to take some big steps to tackle performance issues. TaleSpire is a user-generated-content game, and these have some challenges which are not present in many games. These typically revolve around the fact that the scene can change dramatically at any time and so you cannot use techniques that require precalculating lots of things at build time.
There are tools which we’ve mentioned we wanted to look into, the big two being Unity’s new ECS and Gpu occlusion culling.
We started by performing some quick and dirty tests with entities by adding an entity conversion component to the tiles being spawned. We happily noted that in our first scene, the time spent on rendering noticeably dropped. There were some cases, however, where it didn’t seem so cut and dry, and so we did some digging. It certainly feels like there has been a significant performance regression in the hybrid-renderer since the ‘Mega City’ demo unity was showing off last year. Based on the code, this seems centered around material properties. This was a bit disheartening but did teach me about the BatchRenderGroup class, which is likely going to become very important to us this year.
Now, before folks get up in arms against Unity, this system is in preview, it’s not shipping quality and the version in unity 2020 apparently already has significant performance improvements. This is just how software development is. We are just inside the sausage factory here.
Now it’s very true that, given the same engineers, general systems will often be slower than one made for a specific task. And from my reading, it seems that, in TaleSpire’s case, we can have opportunities to cache more heavily and get speedups based on assumptions we can make about our own data usage. We will dig into this over the coming months.
Another similar system is Unity’s new stateless physics system. From our tests, we see very promising speedups compared to the current system. Once again, I believe there are places where we can feed the system in a way that can limit the impact of some tasks they have to do when starting the physics step for that frame.
Now, changing the physics system means changing everything that uses the old system. That means board building tools, dice, and creatures, just to name three. And, let’s face it, those three are a lot of what TaleSpire is :D We are going to have to work very aggressively on this to get TS back into a playable state as quickly as possible, but it’s still going to take time.
If we can achieve both of these changes, we can eliminate the time it takes to spawn tiles. This means that we can remove the progressive-tile-load system, which is one of the most complicated parts of TaleSpire. Its whole job was to mitigate the fact that Unity’s previous object-oriented approach resulted in us not being able to spawn many objects per frame. This meant that what you saw, and the actual state of the board data are necessarily out of sync. We managed that, but, as mentioned, making that robust took a lot of work. I’d love to remove it.
So that’s all on the entities side. I’ve left out a lot of details and complexity, but I’m very optimistic that we can get see some big wins here.
Gpu Occlusion Culling
Occlusion culling is the act of not drawing things hidden from view (occluded) by other objects in the scene. It’s a very powerful concept but a reasonably complicated one, which often requires a lot of number crunching. Many games do this ‘offline,’ the level is analyzed by a program that precalculates what is visible from where. This would be done during builds of the levels and then would be used at runtime. As mentioned above, offline approaches are not ones TaleSpire can use, so what can we do? More recently, as GPUs have improved, people have worked out ways to achieve realtime versions of this on the GPU. It is very common in TaleSpire that people decorate the insides of builds, and none of those tiles usually need to be drawn when the camera is outside the building. In short, we might get substantial wins from occlusion culling.
I’d started implementing a GPU occlusion culler outside Unity before, but I’d lacked the Unity knowledge to know how to do it there without writing my own render pipeline. After seeing BatchRenderGroup, a bunch of stuff clicked, and now I’m pretty sure I know how to do this. As with my previous experiments, I’m basing my work on this blog post, and a yesterday I wrote a MonoBehaviour that computes the hierarchical depth (Hi-Z) mip chain for a camera.
I’ve got a few things on my todo list for this week, but as soon as possible, I want to write a working test version of this so I can explore how far we can take this.
One side note from this is that we will be moving to use low poly meshes for all occluders. This means line-of-sight, fog-of-war, and shadows should all get a speed boost.
As well as the research above, we spent a good deal of time exploring ideas for the following game systems.
We have an initial idea for a system that we hope will balance a decent game experience with compact data representation. No details on it today, but Ree is currently prototyping to see if the idea will feel good enough for a v1 in the Early Access.
The prop system has been near to releasing at least 3 times over the last 2 years. We sat down and started by discussing the implementation details (as it’s going to have to be made to work with all the changes mentioned above). However, as we talked, we grew increasingly concerned that the attachment-point based approach might not be good enough for handling the kinds of cases we had seen watching people build. We went back and forth on this for a bit and realized that we needed to prototype the tooling and see what it would end up feeling like. The last 20% of making something often takes 80% of the time, so it’s far better that we convince ourselves now than to build the wrong thing.
Locking the Unity version
Our modding system relies on Unity’s prefab format. Once people can make mods, we need to make sure that we don’t break mods by upgrading to a Unity version, which uses an updated prefab format. To us, this means we need to pick a Unity version and stick with it for a long time. Naturally, that means we can’t take advantage of the newer Unity feature or fixes, so we have to be sure that the version we have is something we can stick with. We’ll keep you posted on this too.
I’d like to talk on stream a bit about this stuff and a selection of the feature-requests. We won’t have timelines for the requests, but we can at least address what we see going into TS and what won’t. At that time, I’ll release the list of feature requests. I’m not going to do it before as the notes need cleaning-up and that work takes a lot of time.
Alright, that’s it for now,
Today I finished collating all the entries in the #feature-requests channel on the TaleSpire discord. With that done, @Ree and I went through them to work out what we wanted to do for each. What was cool was that there weren’t many major surprises. Most were about systems that we still have plenty of plans around improving. I’ll probably need to organize a dev-stream at some point so we can go through some of them and talk about the plans for TaleSpire in general. However, I’m not scheduling that today.
The rest of this week will be planning and prototypes around other major systems we need. The big ones being terrain, water, and the rendering changes we need to get the performance we all crave :)
Hope this finds you well. Back with more tomorrow
Work goes well. @Ree has been looking back into the emote system work he did previously. Once that is up to scratch, we can get the data & sync side worked out, and then hopefully, it’s just testing and tweaks before we ship.
Today I started with a few hours of going through all the posts in the #feature-request channel in discord, making the big list of things to discuss internally. Starting on Wednesday, @Ree and I are taking some days to plan and hack on some experiments to set the direction for what we need to achieve over the coming year.
Inspired by an item on that list, I’ve been revisiting my old NDI tests to see how much work it would take to get this shipped experimentally. I can pick the source from NDI and get texture, but I’m currently fighting the UI side. Once that is doing what I want, I’ll need to handle some strangeness I saw when the call reconnects before looking at sync. I’m not yet sure how the sources are named on each client, so I’m not sure how to synchronize the feed information for each client. Regardless we’ll work it out or, if it takes too long, it’ll go back on the shelf for now.
That’s all for today. I’ll put out a bugfix tomorrow as today I was made aware of a bug that was introduced to line-of-sight in the last release.
A quick update tonight.
Work continues of the fog of war. I’ve been fixing bugs and implementing the code that handles cleaning up fog-of-war jobs when transitioning boards. I still think there are some memory bugs, but it’s all going in the right direction.
Ree’s work on the keyboard movement has gone well. We’ve found a couple of smalls bugs in the line-of-sight and creature scaling code, so we’ll get those fixed up asap.
p.s A quick video showing what the 30 unit radius of the fog-of-war update looks like
TLDR: The first prototype of 3D fog of war started working about an hour ago. Check it out:
BIG OL’ CAVEAT: This is not the final mesh or shader used on the fog. This just shows that the raw transforms can work.
A lot of the leg work over the last days has been managing the pools of cubemaps, buffers and such used in the fog of war system so that we never allocate more often than we absolutely have to. We also have been making sure that no step blocks for longer than necessary and that we have something we can easily tune.
So here is the very rough rundown:
- when you place a creature that you control, the scene (for a 30 unit radius) is rendered into a cubemap holding creature ids and distances from the observer.
- the ids are used by the line of sight system to accurately determine what creatures are visible to the creature you just placed.
- the cubemap (and some other data) is kicked over to the fog-of-war system that works out every cell (a 1x1x1 unit volume) visible for the observer.
- It packs this data into a buffer that is then applied to the zone’s fog-mask.
- The updated fog mask is handed over to the mesher which generates a relatively low poly mesh for the fog and sends gives it to the zone to display
Multiple of these can be done at once. The line-of-sight and fog-of-war updates that rely on processing the data in the cubemap are all on the GPU, and any other step that does any kind of data processing is done in jobs dispatched over multiple cores. I’ve not profiled this yet, but it’s feeling ok, and we have the tools to make this quick.
Right now, I’m just stoked this is starting to work. Tomorrow I’ll be bug-fixing, and after that, I’ll start work on the network sync for these updates. I’ll write a dev-log on that problem as it’s a fun one too.
We are also getting closer to the creature scaling update. We found a bug in the keyboard movement and we need to get fixed before we ship. We’ll keep you posted on that.
For now, I’m gonna go poke this system some more :)
p.s. Here is a pic of a fog mesh. You can see that the mesher does an ok job of cutting down the number of polys from the worst case.
Today has been spent testing and fixing the backend changes for creature scaling and hooking the creature scale feature into the data model.
The scaling is feeling great, Ree’s done great work, and I’m very excited to see this ship. There are still bugs to be fixed, so we’ll keep working on those.
The server side has now been patched, and so everything is ready to go.
The first thing I’ll be doing tomorrow is adding some code to detect mismatched versions of the client. This will stop a person with an older version of the client connecting to a session hosted by someone with a newer version (and vice versa). This is important as the board sync format can change and are not compatible between different versions of the client .
That’s the lot for today
 Please note that this format is not the same as the board persistence format for which there are format upgrade methods written.
Hi again :)
Today went reasonably well, although it was mostly getting my new laptop set up.
I finally tried out i3 and decided that, right now, it’s not for me, and I’ve gone back to using stumpwm. I’ll probably spend more time getting used to i3’s approach to splitting in the future. It’s stacks and workspaces did seem very cool although I should experiment with stumpwm’s groups feature first.
Getting my server dev environment set up was pretty painless (yay docker), although I currently have an issue where TaleSpire isn’t uploading boards to my local minio server. I’m not sure what the issue is, so I’ll need to do more testing tomorrow. I’ve prepared the patches for the DB and erlang server, so I now need to test TS with these changes. With that done, I’ll be able to push to production, and then we’ll be ready for creature scaling.
Ree’s been working hard on fixing all manner of issues relating to creature control at low (and high) fps. Getting that stable is a heck of a project, but it’s paying off.
To follow up on the log from earlier today, it does seem that the latest patch does fix the SSE4 issue, which is great! We’ll be able to close a bunch of tickets from github real soon :)
Have a good night folks, Back with more tomorrow.
Heya folks, time for another log.
We have a content update dropping soon, but also in there is a potential fix for those who have been unable to play due to their machine not supporting SSE4. This limitation was due to older versions of the Burst compiler not supporting earlier SIMD instructions. We had wanted to upgrade the Burst compiler package previously, but we found a bug in that package that blocked us. We (reported it to Unity not long back)[https://issuetracker.unity3d.com/issues/the-sizeof-of-struct-with-structlayout-returns-a-smaller-value-in-the-burst-compiled-code-compared-to-the-mono-compiled-code] and they have already shipped a fix! With the packages updated, we should now support fallback to SSE2 (which every x64 machine supports).
My laptop, after holding on for nine years, finally started breaking the other day. My new one has just arrived. That means I’m going to lose at least a day to swearing at windows for being such a massive, invasive, pain in the ass and setting up Linux so I can get server work done in a more agreeable environment.
I’ve also started on the server changes to support creature scale. These are going well, and I will begin integrating the changes on the client-side soon.
Alright, that’ll do for now. Back to the laptop for me.
A short one this time. Between two Norwegian national holidays and my own brain, I’ve been struggling with focus. Progress has slowly continued on the fog of war. I’ve written the compute shader, but I still need to do some plumbing, so I’ve nothing gif-worthy yet.
Ree’s work on the creature controller is wrapping up, so that will ship soon. There are a slew of fixes, and the work includes the bulk of the work for creature scaling. So once the first update has shipped (hopefully early next week), I’ll jump tasks to implement the sync and persistence for scaling.
The art team has also been working away, so you can expect a new asset update soon!
I’m going to spend the rest of today around the community, so if you are around, seeya there.
Hi folks, another day of slow but steady progress on the fog of war. What I did today was hook up the code that queues up mesher jobs when the underlying fog data changes. I can now select a region, press a key, and the fog volume data for all the zones in the region is updated. The presentation for those zones then enqueues a job that reads the volume data and generates a minecraft-style mesh that will (in time) be used for fog. As, in this test, the selection tools work with cuboids, the fog only follows that shape, but we can see that the resulting mesh is reasonable (lower poly compared to the worst case), and I can see that the jobs are being run in parallel. So far, so good.
Now, this is done I can write the code that takes the cubemaps captured from the line-of-sight system that handles hiding creatures, and passes them to a new compute-shader which will make an update mask of what areas are visible. I’ll be doing this on the GPU as:
- The data I need is already on the GPU which means no having to shift it around
- The approach I’ll be trying is trivial to parallelize without divergence and so should be very suited to running there.
From the CPU side, we will then wait on a GPU-fence (which tells us when the compute-shader has finished), read the result from the compute-buffer, and apply it to the fog data in the zone. This change will then trigger the code we wrote today and we should see the new mesh which will hopefully have removed all parts that were in view. It will absolutely be rough, to begin with, but with the full round trip written, we can move onto playing with it and making it better.
I’ll go into more detail on the approach once I’ve got it working as it should be pretty simple.
Alright, seeya around.