Happy new year!
We will be performing server upgrades at 11:00am UTC on Monday, the 3rd of January. You can see what time that is in your region here https://www.timeanddate.com/worldclock/fixedtime.html?msg=TaleSpire+Sever+Maintenance&iso=20220103T12&p1=187&ah=2
We have scheduled two hours for the maintenance, but if all goes smoothly, it will take much less than that. During the update, the servers will be unavailable or unreliable.
The upgrades aim to fix a logging issue we have had for a while and add the code for the upcoming HeroForge integration.
Have a great weekend folks.
p.s. Please note, this does not mean that HeroForge support will be available immediately after this upgrade. This is just one step in the process. We will keep you posted with news as it develops
Today has been focused on the upcoming HeroForge integration.
I’ve tested the server code with the current build, and all is looking good so far. I will put out an announcement to schedule the server as soon as we have decided when that will be. This will also land some other bug fixes I’ve been waiting on, so I’m eager for this.
I’ve also started merging the master branch into the HeroForge branch. Both have significant changes, and so it’s quite a beast. I could see this taking me a day or two.
Also, the new year is here, so work will be a little broken up around that.
That’s all the news for today,
It’s my first day back to work, and I’m happily settling back in. I only tackled two things today:
- I added some options for use in the Unity Editor to allow username + password login instead of Steam. This lets us test interactions between GM & player clients more easily on one machine
- I fixed a regression where names of hidden creatures have become visible to players again.
Tomorrow I’ll probably tackle another smaller bug, but I’d like to focus on getting the next server patch ready. That one lands fixes and the bulk of the HeroForge backend code. It’s working with my HF branch of TaleSpire, but I know there are some places where the master branch doesn’t play nice with it.
That’s all for now. Hope you had a lovely Christmas.
Since last time I’ve worked on the polymorph feature, which is going quite nicely.
We still need UX, better transitions, etc. But the direction feels good.
Photon suffered what seems to be another attack. As the attacks affect specific regions, we pushed a patch to TaleSpire to redirect users to different regions. It is currently just a simple lookup table, but it was the fastest way to progress. The next step on that issue is making server changes to allow region remapping per campaign session.
A couple of weeks back, I did some performance work to improve the handling of large boards. I pushed a build of this to Steam for some internal testing. An issue was quickly found in hide-volumes, so I’ve fixed that and pushed another build so we can continue the testing. After that, we’ll merge that branch and ship it to you folks.
I also found a bug that hide-volumes don’t save when placed in a zone with no tiles or props. This should be an easy fix.
Continuing the subject of bug fixing, I am spending this next week fixing reported bugs. I have no strict criteria for which I’ll be tackling. If I can replicate the problem, I’ll try fixing it :)
That’s all for now. I hope you have a good day.
I’ve continued my HeroForge work and got to the point where spawning HeroForge minis works. Which is great. In this video, you will see both the asset-pack files appearing after conversion and the creature being spawned.
So that I didn’t get distracted by UI, I just knocked up an inspector in Unity to list and spawn the linked minis.
The next problem is that this code has a data race. It only works if the HeroForge mini information comes back from the server before the board loads. We could try delaying everything, but this is a symptom of a larger issue, how to handle missing creatures.
Modding and HeroForge integration both expose this problem. Up until now, asset packs have had an index, which we can load fast and gives us creature info, and asset-bundles which hold the mesh, textures, etc., and load slower. We were okay with asset-bundles loading relatively slowly as we made systems that could work with only the data from the index. However, we can no longer rely on the index being available, or at least not in a timely fashion.
To this end, we have to take our code one step further. Instead of just handling the delayed arrival of information from bundles, now we need to handle even more basic information being missing. Things like default-scale, name of the kind of creature, head-position, and many more.
Head position is actually a critical one as, without it, we can’t do line-of-sight checks or fog-of-war updates. We have to be very careful here to minimize the chances of differing results on different clients. I have some ideas, but I’ll save them for another day.
Today I am doing the big refactor to the creature management, spawning, and vision code. Hopefully this goes smoothly.
Until next time,
Today I’ve continued work on HeroForge, but first, let’s have a little multiplatform update.
The 10Gb ethernet gear arrived and worked brilliantly out of the box. The reason we bought these is two-fold:
- Our m1 iMacs only have 8GB of RAM, and that makes for a poor dev environment 
- We already have a nice setup on our windows machines. Given that we can cross-compile, it makes sense to build for Mac from there.
The big problem with building on Windows, however, is getting the result to the Mac. Even though it’s not big by modern standards, TaleSpire still takes a while to copy over a 1Gb network. In my experience so far, that delay is enough to hamper mental flow.
Re-enter the 10Gbe kit! I have now got the Windows box and the Mac directly connected via DAC, and I can copy an 8GB file between them faster than I can copy it from HDD to SSD on the same machine. 
I tested building directly to the Mac from Unity, and there was no drop in build time compared to building to the local one . This test was the Windows build, however. So the next step was to start experimenting with builds for Apple Silicon.
The TLDR is there are still things to work out. For some reason, while building for Mac locally works, building directly to the remote folder fails silently. Next, my build scripts are not producing the dll containing the Burst compiled code, whereas building from the Unity GUI does.
These aren’t too worrying though. Soon enough, we’ll have something that allows for fast iteration across these platforms. If it holds up under actual use, we’ll order some more for the team members who need them,
Oh, and Unity’s profiler works great across the link too!
Steam support has been back in touch and answered my remaining queries. The key points are:
- Steam’s store has no feature for us to treat running with Proton as the official Linux build (as I understand it)
- It is “not feasible” to bundle Proton with TaleSpire as a means to make a ‘native’ Linux build, which will show in the store as such.
What that means is that new Linux players will be buying the Windows version. Which is the same as it is now. Naturally, this works fine, but it’s a little less official than we would like.
We will simply have to make sure our messaging makes it clear what we officially support. It looks like the store page makes that easy.
Alright, back to the real work of the day.
The focus of today’s work has been two classes.
- HeroForgeDownloadManager: Which is responsible for downloading minis, converting them for TaleSpire, and writing the resulting asset packs to disk.
- HeroForgeManager: Which communicates with HeroForge (via our servers) and handles information about which minis are attached to the campaign.
The complexity comes from the number of moving parts:
- Anyone can add or remove minis from their HeroForge account
- They give/revoke TaleSpire access to talk to their HeroForge account
- They can give/revoke specific campaigns access to specific minis
- This information needs to be available on all clients regardless of whether they are actively playing or arrive later.
- The assets will download at different rates for different people
- There are systems (like fog-of-war) that need specific information about creatures to be on all clients at the same time to ensure that all clients compute the same results
My work today has been focused on making sure the flow of information guarantees that the systems have what they need at the correct times. So that, for example, the FoW system can correctly update vision for a creature even if the asset itself has not yet been downloaded.
It’s going well. I’ve got probably half a day more of wiring things up before I can start testing with real data and find all the places that are totally wrong :D 
I’m pretty tired now, so I’m calling it a night.
Have a great weekend folks!
 Unity complains about low memory when trying to do something as simple as profile a dev build while the project is open in the editor. This clearly is unworkable. I don’t blame Unity too much for this, to be honest. 8GB is just not enough for development tools and games running on the same machine.
 The network copy took about 12 seconds, IIRC.
 In fact, it was 8 seconds faster, but I’m assuming that was an anomaly.
 The build appears to succeed, but there is no file.
 This will happen. The interconnectedness of all of this has meant it’s been a while since the code has been running. The compiler can only catch so much!
Before we get started, the number of the dev-logs is not something we sync up with features or other announcements. We aren’t that media-savvy :P
Ok, let’s get into things. In my last log, I was working away on HeroForge integration. However, on Thursday, I nerd sniped myself by watching A Deep Dive into Nanite Virtualized Geometry from Unreal and finally understanding some details of occlusion culling. It was just a little detail about building the early Hi-Z, but it got my mind wiring.
This, in turn, spun me off to this phenominal talk on clustered shading and shadows by Ola Olsson. Even though we’ve managed to replicate Unity’s lighting approach without GameObjects, it simply doesn’t handle thousands of lights efficiently. This feels like something we should be able to integrate with Unity’s built-in rendering pipeline to improve non-shadow-casting lights.
On the subject of the built-in rendering pipeline, I should take this quick detour. The built-in rendering pipeline (sometimes called the BIRP) is the rendering pipeline that is enabled out of the box in Unity. They have, much more recently, added the SRP (scriptable rendering pipeline) that exposes much more control to developers. However:
- Porting a project to these is non-trivial and would require a lot of engineering and work on both TaleSpire and TaleWeaver
- They are not finished. In fact, SRP has been on hold for a while now as Unity works on the internals of DOTS and which we won’t go into here
Because of this, we are looking at sticking with the BIRP until we simply outgrow it and have to invest in SRP. The good news is that it’s looking like we might be able to do more than I expected before having to take that plunge.
Ok back to code.
With the clustered shading talk fresh in my mind, I looked around to find more info about the technique. In the process, I found this post, which was a great read on clustering in itself and, I’m sure, will be very valuable in the future.
While musing back on the occlusion culling, it got me thinking about early-z passes in general, and I found this post. I don’t have anything to add about it, but it sparked plenty of ideas off in my head.
While the lights on our tiles and props currently don’t cast shadows, the sun, on the other hand, does, and rendering it is a significant portion of frame-time. Even if we could make our own version and use occlusion culling to optimize the process, I didn’t think we could integrate it without SRP .
After reading these two excellent articles on how Unity handles its cascaded shadow maps, I spotted something that might allow us to hook our own approach in. As shown in cutlikecoding’s article, Unity collects the shadows into a screen-space map after making the cascades. Not only is the shader that does that available  but Unity also gives an option in the project settings to replace it.
I’m hoping that this means we could make a version of this that integrates both the Unity shadow map and our own.
This is, of course, speculation at this point. But it was rather exciting nevertheless.
The above research included plenty I haven’t mentioned, and Friday had come to a close when I wrapped that up.
However, I was in a performance frame of mind, and I had a hankering to see what I could improve. To this end, I broke out the profiler and went looking.
One experiment I wanted to do was on how we use Unity’s
Unity.Physics engine. The engine is considered stateless, meaning you have to load in all the rigid-bodies you want to include in the simulation, each frame. There is an optimization, however, where you can tell it that the static objects are the same as last time, and thus it doesn’t have to rebuild the BVH (bounding volume hierarchy) or other internal data for statics.
As you can imagine, in a board with hundreds of thousands of tiles, loading that data into the physics engine starts taking a non-trivial amount of time.
The test board I was using was a monster that was shared on the discord and has many thousands of zones. We spawned a job for each copy, and I was concerned that the overhead was hurting the speed. To that end, I gathered all the pointers and lengths into an array and spawned fewer jobs to do the copying. It didn’t make a big difference, however. What did, was calling
JobHandle.ScheduleBatchedJobs after scheduling every N jobs. It kept the workers fed well and gave enough of a speedup that I moved on to other things.
The next thing I wanted to test was, very crudely, would only copying the data from nearby zones be a win, given that the physics engine would have to rebuild the internal data for the statics more often? I quickly hacked this in and, to get a nice worst case, I forced the static rebuild on every frame.
I was, as expected, able to see an improvement, even with the rebuild costs. What I rediscovered, though, was how nice double-right clicking is to zip around the board. I noticed this because I wasn’t including zones from far away, so I couldn’t click them. This is not a show stopped by any means, but it is something I’d need to keep in mind when making a proper solution.
The proper solution would need to work out which zones need physics, and then minimize the number of times we need to rebuild the statics by being smart about when to drop zones from simulation.
During all this, I finally spotted something which had been bugging me for ages. I had never understood why we needed to copy the statics in every frame when, as far as I could see, they weren’t being modified. On re-reading their code, I finally noticed that they store dynamic and static rigid-bodies in the same array and that statics always came after dynamics. This meant that the indices from the BVH to the static bodies became incorrect whenever the number of dynamic rigid-bodies changed.
The upshot of this is that we can skip copying static objects if the zones have not been modified and the number of dynamic bodies has not changed since the last frame.
This gave a noticeable improvement! In the future, we’d probably want to fork their code and lay the data out in a way that lets us avoid copying more often. I’m also curious if we could develop a way to rebuild portions of the static data instead of all of it. Very exciting if possible.
I remember an aphorism that goes roughly:
If you get a speedup of 100 or 1000 times, you probably didn’t start doing something smart, but instead stopped doing something dumb.
While we are definitely not seeing anything in the realm of 100 times speedups, I think we are still well in the land of ‘stopping doing dumb things’ when it comes to performance.
Over the course of the weekend, I did the following:
- Added an early out to culling when we are given a frustum which will definitely cull all assets
- Changed all Spaghet managers to share the ‘globals’ data for scripts. This meant we only had to update one.
- Noticed that zones checked every frame to see if they had missing assets that had finished loading. The zones now register for updates and unregister once all are loaded.
- Learned that calling GetNativeBufferPtr causes a synchronization with the rendering thread and that we should cache it instead.
- Refactored a bunch of update logic zones push data into collections for use in the next frame rather than scanning the zones to collect it
- Started using culling info to avoid uploading data for dynamic lights.
And other little things.
The result is that I was able to get a 50% improvement in framerate in my test on the monster board.
Please note: This does not mean 50% improvement in all boards. This is a result of a single test.
The speedups are probably more noticeable in large boards. However, even small improvements in smaller boards are worth it.
Since our announcement that we would support macOS and Linux, I’ve been trying to get info from Steam regarding how best to ship with Proton.
For context, and as itsfoss explains, on Linux, you can choose to enable SteamPlay for ‘supported titles’ or ‘all other titles’.
We inquired about two things:
- How to be a supported title
- Whether we could pin a particular proton version as default so we could test against a known software
There has been a lot of back and forth over the last month, but I think we have the info now.
As to the version pinning question, the answer was no.
As to the first question, we got this feedback from support:
No problem, Chris! It’s a confusing piece slightly because of past history.
Before SteamDeck drove a bunch of additional effort on Proton, games did need to be “whitelisted” so to speak, and a customer playing games on Linux could take the step to say “let me try Windows games via proton even if they haven’t been whitelisted”
So, those customer-facing settings in the Steam client are still visible. But because we’ve made so much progress with Proton, the notion of that whitelisting doesn’t really make sense– we don’t update it, and we may phase it out of the Steam Linux client settings altogether at some point.
For you as a developer, there’s no longer a list to be added to– you’re good to go!
This is good and… well, not bad but interesting. We would like to make TaleSpire on Linux as obviously supported as on Windows. However, it seems like it will be behind an option for now, and we don’t have a way to indicate when we officially support it.
It might be a non-issue, as we Linux folks are used to having to jump through some hoops, but it is a little disappointing.
One option would be to bundle Proton with TaleSpire ourselves and ship it as a Linux native game. This would then appear in the store as such. This is a bunch more work, and feedback has indicated that it would still be good to allow people to use their own proton install if desired.
It’s an interesting situation, to be sure. We are still a little ways off from looking into the glaring Linux-specific bugs, so we don’t have to worry about an official release yet.
We’ll definitely keep you posted.
That’s enough for today. Congrats if you made it all the way to the end!
Hope you have a lovely week, and I’ll be back soon with more dev-logs
 The lighting in the video isn’t great, and so I recommend following allow with the slides you can find here https://efficientshading.com/2016/07/12/game-technology-brisbane-presentation-2016/
 Not least because there is so little public information about it, and far too much conjecture online.
 Officially available from Unity Hub or here
 This is not the default physics engine in Unity but one which shipped with DOTs. It is stateless and impressively fast given what we throw at it.
 These places include:
- wherever a player is pointing with their cursor
- wherever any dice are
- wherever any moving creatures are
 Physics really shows this to be true. The longer a frame takes, the more time the physics engine has to simulate the next frame, and simulating more time takes more time. This means that speeding up non-physics code can mean the physics is using less time per frame.
Hi folks, time for another dev-log :)
HeroForge work continues
The first task was to replace the use of BlobAssetReference for serializing the HeroForge asset data, as yesterday we discovered that it was far too slow (~100ms on my machine).
That went smoothly and removed the most significant lag in our code. This allowed us to see all the stuff that was still too slow. From there, the bulk of the work was moving things to jobs. And making sure that, whenever we did have to do something on the main thread, we did it fast.
With asset processing/saving improved, I moved on to asset loading. The work was very similar until something cropped up that was rather surprising…
…Unity was taking a very long time to upload the textures. At first, I thought we were trying to access the texture’s data too soon after creation, so I put a delay between those steps. I think it helped a little, but the spike in the perf graph still remained.
I’ll spare you the details of all the tests and head-scratching, but I eventually stumbled on something by accident:
The first spike is the load with slow texture upload. All the little spikes are from running the same code in rapid succession. It seems that, after a delay, the first texture upload is slow, but uploads within a couple dozen frames of each other are fast.
I could make some wild guesses about what is happening, but to be honest, I just don’t know. For now, it’s enough to know that it’s not going to choke when batch processing, but it is rather unsettling.
There are two other spots where the loading code is slower than I’d like.
The first is that opening the asset file takes a couple of milliseconds. We can just move that out to the job that reads from that file.
The second is that the Texture2D constructors take a few milliseconds to run. Unfortunately, there isn’t I can do about that one as it’s not our code :(
Tomorrow I’ll start moving this into TaleSpire.
About two weeks ago, Unity 2021.2 shipped. This is the first build of the editor to officially support Apple Silicon.
We had already tested TaleSpire in the 2021.2 beta and, on Windows, all looks good. This means we will probably upgrade the project in the coming weeks.
TaleSpire on macOS is still, of course, totally broken. But this was already known, and we covered the details back in this dev-log. We can, however, prepare our development environment. We are currently waiting on a fix to a Rider bug before we can code comfortably there. However, a more ideal solution would be cross-compiling from our primary dev environment on Windows.
I’ve done my fair share of pushing builds across the network to other Windows test machines, and the time it takes to move a few GB really makes iteration times painful. If you’ve done much coding, I’m sure you know just how much difference iteration time makes to flow and thus how fast you can get things done.
Rather than replicate that joy on macOS, we have ordered some Sonnet 10Gb gear and will be testing to see if that makes for a tenable experience.
I’m rather excited about that!
That’s your lot
Ok, that’s all for tonight.
Hope you’ve had a good day.
 I’m a big ol’ emacs nerd and would have loved to stay, but omnisharp or its integration are just too slow. I tried all the usual suspects, but Rider has unfortunately been the best for me by a fair margin so far.
Today my goal was to take the HeroForge asset processing code and turn it into the state machine used in TaleSpire.
This has gone reasonably well. The code is split up into a few steps, and the texture processing jobs seem to be running across many frames as hoped. This should have meant that our code was barely blocking the main thread. However, I got to learn something new instead!
It turns out that CreateBlobAssetReference is slow, really slow, like 100ms to allocate slow. This really surprised me until I realized that, while we load BlobAssetReferences in TaleSpire (and they load fast), we only create them in TaleWeaver, where subsecond delays don’t feel bad.
It’s only a minor issue, though, as we can just knock together our own little format. This will take a few hours tomorrow, and then I can look at the performance again. It remains to be seen how these long-running jobs will interact with the per-frame work that TaleSpire already has to do, but we’ll be able to check that once we get this all hooked up.
The really slow part is not our code, though, but GLTFast. On my machine, it’s taking 54ms per frame over 3 frames for it to load the data, and my machine is no slouch. I’ll definitely be looking into what we can quickly do to improve this, but not yet. I want to get all this code into TaleSpire and start wiring things up so I can see the gaps that still need filling.
Although Ree and I usually blog about our own stuff, I’m going to shout out his work today as I find it exciting. Ree’s currently refactoring the camera controller code, which is the kind of nuts-and-bolts improvement that has been stuck on our todo lists for ages. Not only does it make the codebase easier to work with, but it also takes us measurably closer to being able to give you all access to the cutscene tools that we use to make our trailers. The idea naturally being that you would be able to play these to your players in a session.
Alrighty, that’s the lot from me today. Seeya tomorrow.
 Hopefully, I’ve just not read the docs properly, and there’ll be some “skip horribly expensive operation” option. But if not, I’ll have to look at having it create views into the data rather than Unity objects and handle the raw data myself. Then at least we can kick it off to a separate thread and wait for it to finish.
Blimey, what a week.
I’ve been looking into converting heroforge models into a format we can consume. For our experiments, we were using some APIs that were either only available in the editor or not async and thus would stall the main thread for too long.
We get the assets in GLTF format, so we used GLTFUtility to do the initial conversion. We then needed to:
- pack the meshes together (except the base)
- resize some textures
- pack some texture together
- DXT compress the textures
- Save them in a new format which we can load quickly and asynchronously
The mesh part took a while as I was very unfamiliar with that part of Unity. Handling all possible vertex layouts would be a real pain, so we just rely on the models from HeroForge having a specific structure. This is a safe assumption to start with. Writing some jobs to pack the data into the correct format was simple enough, and then it was on to textures.
We are packing the metallic/gloss map with the occlusion map using a shader. We also use this step to bring the size of these textures down to 1024x1024. To ensure the readback didn’t block, I switched the code to use AsyncGPUReadback.
This did get me wondering, though. The GLTFUtility spends a bunch of time, after loading the data, spawning Unity objects for Meshes and Textures. Worse, because then use Texture.LoadImage it has to upload the data to the GPU too, which is totally unnecessary for the color and bump maps as we save those almost unchanged.
So I started attempting to modify the library to avoid this and make it more amenable to working with the job system.
Images in the GLTF format are (when embedded in the binary) stored as PNGs in ARGB32 format. LoadImage previously handled that for us, so I added StbImageSharp, tweaked it so as not to use objects, and wired that in instead.
Unfortunately, the further I went, the more little details made it tricky to convert. Even after de-object-orienting enough of the code and making decent progress, I was faced with removing functionality or more extensive rewrites. I was very aware of the time it was taking and the sunken cost fallacy and didn’t want to lose more time than I had to. I also noticed that some features in GLTF were not yet supported, and integrating future work would be tricky.
As I was weighing up options, I found GLTFast, another library that supports 100% of the GLTF specification purports to focus on speed. I had to rejig the whole process anyway, so it was an ok time to swap out the library.
In the last log, I talked about porting stb_dxt. stb_dxt performs compressions of a 4x4 block on pixels, but you have to write the code to process a whole image (adding padding as required). I wrote a couple of different implementations, one that collected the 4x4 blocks one at a time, and one that collected a full row of blocks before submitting them all. The potential benefit of the latter is that we can read the source data linearly. Even though it looked like I was feeding the data correctly, I was getting incorrect results. After a lot of head-scratching, I swapped out my port of stb_dxt for StbDxtSharp and was able to get some sensible results. This is unfortunate, but I had already reached Friday and didn’t want to waste more time. If we are interested, we can look into this another day.
Over the weekend, I did end up prodding this code some more. I was curious about generating mipmaps as the textures included didn’t have any. Even though the standard implementation is just a simple box filter, it’s not something I’ve written myself before, so I did :)
A bit of profiling of the asset loading shows mixed results. Reading the data from the disk takes many milliseconds, but we’ll make that async so that won’t matter. The odd thing is how long calls to Texture2D.GetRawTextureData are taking. I’m hoping it’s just due to being called right after creating the texture. I’ll try giving it a frame or two and see what it looks like then. The rest of the code is fast and amenable to being run in a job, so it should mean even less work on the main thread.
The processing code is going to need more testing. GLTFast is definitely the part that takes the longest. Once again, the uploading of textures to the GPU seems to be the biggest cost and is something we don’t need it to do… unless, of course, we want to do mipmap generation on the GPU. It’s all a bit of a toss-up and is probably something we’ll just leave until the rest of the HeroForge integration code is hooked up.
So there it is. A week of false starts, frustrations, and progress.
Have a good one folks!