Blimey, what a week.
I’ve been looking into converting heroforge models into a format we can consume. For our experiments, we were using some APIs that were either only available in the editor or not async and thus would stall the main thread for too long.
We get the assets in GLTF format, so we used GLTFUtility to do the initial conversion. We then needed to:
- pack the meshes together (except the base)
- resize some textures
- pack some texture together
- DXT compress the textures
- Save them in a new format which we can load quickly and asynchronously
The mesh part took a while as I was very unfamiliar with that part of Unity. Handling all possible vertex layouts would be a real pain, so we just rely on the models from HeroForge having a specific structure. This is a safe assumption to start with. Writing some jobs to pack the data into the correct format was simple enough, and then it was on to textures.
We are packing the metallic/gloss map with the occlusion map using a shader. We also use this step to bring the size of these textures down to 1024x1024. To ensure the readback didn’t block, I switched the code to use AsyncGPUReadback.
This did get me wondering, though. The GLTFUtility spends a bunch of time, after loading the data, spawning Unity objects for Meshes and Textures. Worse, because then use Texture.LoadImage it has to upload the data to the GPU too, which is totally unnecessary for the color and bump maps as we save those almost unchanged.
So I started attempting to modify the library to avoid this and make it more amenable to working with the job system.
Images in the GLTF format are (when embedded in the binary) stored as PNGs in ARGB32 format. LoadImage previously handled that for us, so I added StbImageSharp, tweaked it so as not to use objects, and wired that in instead.
Unfortunately, the further I went, the more little details made it tricky to convert. Even after de-object-orienting enough of the code and making decent progress, I was faced with removing functionality or more extensive rewrites. I was very aware of the time it was taking and the sunken cost fallacy and didn’t want to lose more time than I had to. I also noticed that some features in GLTF were not yet supported, and integrating future work would be tricky.
As I was weighing up options, I found GLTFast, another library that supports 100% of the GLTF specification purports to focus on speed. I had to rejig the whole process anyway, so it was an ok time to swap out the library.
In the last log, I talked about porting stb_dxt. stb_dxt performs compressions of a 4x4 block on pixels, but you have to write the code to process a whole image (adding padding as required). I wrote a couple of different implementations, one that collected the 4x4 blocks one at a time, and one that collected a full row of blocks before submitting them all. The potential benefit of the latter is that we can read the source data linearly. Even though it looked like I was feeding the data correctly, I was getting incorrect results. After a lot of head-scratching, I swapped out my port of stb_dxt for StbDxtSharp and was able to get some sensible results. This is unfortunate, but I had already reached Friday and didn’t want to waste more time. If we are interested, we can look into this another day.
Over the weekend, I did end up prodding this code some more. I was curious about generating mipmaps as the textures included didn’t have any. Even though the standard implementation is just a simple box filter, it’s not something I’ve written myself before, so I did :)
A bit of profiling of the asset loading shows mixed results. Reading the data from the disk takes many milliseconds, but we’ll make that async so that won’t matter. The odd thing is how long calls to Texture2D.GetRawTextureData are taking. I’m hoping it’s just due to being called right after creating the texture. I’ll try giving it a frame or two and see what it looks like then. The rest of the code is fast and amenable to being run in a job, so it should mean even less work on the main thread.
The processing code is going to need more testing. GLTFast is definitely the part that takes the longest. Once again, the uploading of textures to the GPU seems to be the biggest cost and is something we don’t need it to do… unless, of course, we want to do mipmap generation on the GPU. It’s all a bit of a toss-up and is probably something we’ll just leave until the rest of the HeroForge integration code is hooked up.
So there it is. A week of false starts, frustrations, and progress.
Have a good one folks!
This morning I’m continuing my work on the HeroForge integration.
My current tasks revolve around loading the assets. For Dimension20, we had knocked together an importer which did the job but has two issues that don’t make it suitable for use inside TaleSpire:
- It didn’t need to worry about blocking the main thread
- It used some functionality from UnityEditor, which is not available at runtime.
This means we need to get coding :)
The first target to replace was EditorUtility.CompressTexture, which we use to compress the textures to DXT5/DXT1. You might at first think that you could just replace it with Texture2D.Compress, but it doesn’t have the same quality settings and (much worse) is not async.
So I went looking elsewhere. Luckily for us, the fantastic stb single file library project has just what the doctor ordered. stb_dxt is small, fast, and easy to read; however, it is in C, and while I could just include a dll, it looked like it would be easy to port to Burst’s HPC#.
So that’s what I got up to the other day. It was a straightforward task, and now I just need to write the code that drives the compression process.
Today my focus is on converting the format we get from HeroForge into something suitable for TaleSpire. We need to extract what we need, apply compression and store everything in a format suitable for fast async loading.
That’s it from me for now. I’ll be back with more as it develops.
 I’ll probably release that code once I can confirm it’s all working.
 DXT compresses 4x4 pixel clusters, so stb_dxts’ API takes one cluster and appends the results to a buffer. The code to provide the clusters (with correct padding) is not part of stb_dxt.
 A small amount of work still needs to be done on the main thread as creatures use Unity’s GameObjects. However, most of it can be done without blocking.
There has been a lot going on this past week, but just for fun, this morning, I took a few hours to look at performance.
The motivation came from a community member who posted some thoughts on performance issues they were seeing and shared the relatively beefy board they were testing.
I cracked out the profiler and saw what I expected. The frame-time was dominated by shadow rendering. Due to reasons, the shadows are culled on the CPU, and because of the sheer number of tiles, this was taking a long time.
Poking around, I saw lots of things that are already on my todo list for future improvements. What I didn’t expect was to spot something dumb.
BatchRendererGroup’s AddBatch method takes a Bounds which encapsulates all the instances. I had assumed that it would be used during culling to exclude batches that clearly didn’t need culling. However, this wasn’t the case.
Armed with this knowledge, I simply tweaked the culling job to check the bounds for the entire batch first, and only if it intersected the view frustum, to cull the individual instances. Naturally, this had a big effect.
When I first tested the board linked earlier, I was getting ~28fps. After this change, I was getting ~58fps. It dipped in some places in the board but never below 40fps, so this was still a nice win. 
This will go out in a patch later this week.
While I was in the headspace, I also added some coarse culling to dynamic lights. It helped a little (and nudged the test board up to ~60fps), but doing more optimizing can wait for another day. 
Have a good one folks!
 Initially we tried to render all tiles via BatchRendererGroups. This failed, however, due to the (undocumented) face that `BatchRendererGroup’s were never meant to be used with Unity’s built-in render pipeline, and per-instance data was simply not supported.
To tackle this, we use compute shaders to perform frustum culling and populate the draw lists for DrawMeshInstancedIndirect. However, when using
DrawMeshInstancedIndirect Unity doesn’t have enough information to do culling for you, and there are no hooks for doing this (In the built-in render pipeline, which we use).
So! We opted for a hybrid monstrosity. Shadows are handled via
BatchRendererGroup, and we use our custom code to do the primary rendering.
BatchRendererGroup gives us nice hooks to perform culling, and we do these in Burst compiled jobs.
 This code no-doubt needs optimization too, but that’s for another day.
 Naturally, your mileage will vary. The effect will be most visible in larger boards where a higher percentage of the board is off-screen at any given time. Also, I’m running an AMD threadripper on my dev machine, so it inflates numbers a bit. However, this change will improve performance on all machines regardless of CPU as it’s simply doing less work :)
 The next big candidate for performance improvement is physics. I’m pretty confident that we can be smarter about what assets are involved in the simulation of each frame. Cutting down the number of assets included has the potential to help quite a bit.
This dev-log is an update from just one of the developers. It does not aim to include all the other things that are going on.
Just a quick warning. This is just a regular dev log, no big news in here.
Things have been good recently, but there are a lot of spinning plates to keep track of.
We had a server issue last week that stemmed from a database setting I hadn’t realized was enabled. The setting was to apply minor DB patches automatically. I had missed this, and so when the system obediently upgraded the DB, the server lost connection and got a tad confused.
Naturally, we want to schedule these patches explicitly so that the setting has been corrected.
Backend isn’t my strongest suit, so I’m reading and looking into the best way to handle this in the future. One likely portion of this is adding a ‘circuit breaker’ in front of the DB connection pool.
Unity has a rather interesting WebRTC package in the works, so I’ve been studying this topic again. We know we’d like to have audio and video chat in the future. Ideally, this would be p2p, but (IIRC) you’d be lucky to get NAT traversal to work for more than 85% of people, so this is usually paired with a TURN server to act as a relay for those folks.
That, of course, means handling the cost of such a server, and more importantly, the fees for the data going through it.
By default, p2p would imply a bidirectional connection between pair of players. So if you have ten people, you are sending the same data 10 times to different places. What many video providers opt for instead is to have servers that mix the feeds for you, so you only have one bidirectional connection. However, naturally, that means you are no longer p2p, and the server’s requirements (and thus costs) are MUCH higher than a simple relay server.
Lots to think about here. We’ll likely focus on p2p (with TURN fallback) when we start, but we’ll see how it evolves.
Performance work is never done, and we know that we need to do a lot to try and support both larger maps and lower-end machines. This past week my brain had latched onto this, so I spent a good deal of time reading papers and slides from various games to try and learn more of what contemporary approaches are.
Not much concrete to say about this yet. I know I have a lot to learn :P
We had an internal play session the other day, which was a lot of fun and resulted in another page of ideas of potential improvements.
I’ve been working on HeroForge a bit, too, of course. I’m not satisfied with our backend design, so I need to focus on that more this week.
Hmm yeah, I think that’s most of it. Last week was very research-heavy, so I hope I end up coding more in this one.
The above is, of course, just me. The others have been making all manner of assets for future packs, which is both super exciting to see but also agonizing as it’s not time to show them publicly yet.
I hope you are all well. Have a good one folks!
Hi folks, just a quick log today to say that I’m [Baggers] taking a week off to relax. The rest of the team are still around, of course, so you’ll still be getting your regular scheduled programming :)
The last few weeks have felt great as bookmarks, links, and bugfixes have been shipping. Work on persistent emotes has been picking up again behind the scenes, and the assets in the pipeline are super exciting.
I hope you all have a great week.
Well, naturally, the biggest excitement in my day has been seeing the Dimension20 trailer go public, but code is also progressing, so I should talk about that.
Buuuuut I could watch it one more time :D
RIGHT! Now to business.
I started by looking into markers. Oh, actually, one little detail, soon you will be able to give markers names, at which point they are known as bookmarks. I’m gonna use those names below, so I thought I should mention that first.
Currently, markers are pulled when you join a specific board, and we only pull the markers for that board. To support campaign-wide bookmark search, we want to pull all of them when you join the campaign and then keep them up to date. This is similar to what we do for unique creatures, so I started reading that code to see how it worked.
What I found was that the unique creature sync code had some legacy cruft and was pulling far more than it needed to. As I was revisiting this code, it felt like time for a bit of a cleanup, so I got busy doing that.
As I was doing that, it gave me a good opportunity to add the backend data for links, which soon will allow you to associate an URL with creatures and markers. So I got stuck in with that too.
Because I was looking at links, it just felt right to think about the upcoming
talespire://goto/ links, which will allow you to add a hyperlink to a web page that will open TaleSpire and take you to a specific marker (switching to the correct campaign and board in the process). After thinking about what the first version should be, I added this into the mix.
So now things are getting exciting. I’ve got a first iteration of the board-panel made for gms…
NOTE: I’ll be adding bookmark search soon
We can add names to markers to turn them into bookmarks. And you can get a
talespire://goto link from their right-click menu.
I’ve got switching between campaigns working, but I need to do some cleanup on the “login screen to campaign” transition before I can wire everything up.
It would be great to ship all this late next week, but I’m not sure if that’s overly optimistic. We’ll see how the rest of it (and the testing) goes.
Until next time, Peace.
It’s time for another dev log, and things are moving along well.
Since the last patch, I’ve fixed touched a few things.
Links in and out of boards
We are working on two new and complementary features. The first is a
talespire:// URL that takes you to a specific position within a board, and the second, which allows you to attach a URL to creatures and markers.
Between these, you could do things like include links into boards from campaign management software like WorldAnvil, or link directly from your creature to their D&DBeyond page.
I’m currently working on the database, backend, and TaleSpire patches to make this work.
We have hashes of all board and creature data, but we hadn’t been using it until now. We now hash the data of downloads and use that to validate the download result. This should reduce the cases we’ve seen of invalid cache files.
Pixel-perfect camera focus
Moving the camera using double right-click is incredibly common; however, it used to only use physics ray-casts to work out where to move to. This didn’t work well in cases like the portcullis, where the collider definitely should be a single cuboid, but that doesn’t let you pick through the gaps.
I tried using the depth information from the pixel picker to get the position, but the accuracy was too low.
The new approach is a hybrid. We use the pixel picker to identify the thing under the cursor, and then we cast a ray only against the colliders in that object. This gives us the expected result and will be in the next patch.
We have some users with issues connecting to the game. To help with these cases, I’m updating a little tool we made to test and log the connection process. I might end up shipping this with the game or maybe build it into the game and using command-line arguments to switch to the tester on launch.
I also just have to shout out some work Ree did the other day. To get shadows with reasonable performance in TaleSpire, we misuse BatchRendererGroups. Due to them not being designed to work with Unity’s built-in rendering pipeline, they seem to have a bug where the layer information is not respected. In practice, this means that we end up running culling routines for tiles and props even when the camera has specified that it doesn’t need to include those assets (using layers). Because of this, culling code was running whenever the reflection probe was updating, even though only a couple of objects were being rendered.
Ree made a replacement probe that totally avoids this bug. It’s really cool as the bigger the board, the more time this can save.
This will be shipping in the next patch.
And the rest
As always, there are bugs to fix. One that jumps to mind was that, since the last patch, creatures can be unnamed. In those cases, the name displayed should be the name of the kind of creature. However, that wasn’t happening.
That’s all from me. Of course, the rest of the team is plowing ahead with all sorts of exciting things. It’s gonna be great to see those land :)
Have a good one folks
 We did not know about this limitation when we started as the documentation was very lacking.
First, to business!
At 3am PT, we will be taking down the servers for maintenance (click here for the time in your timezone).
We are scheduling one hour for the work, but it is likely to be less than that.
This patch will complete the changes we needed to make on the backend for upcoming creature features.
Today has gone well
I’ve added a way to change your TaleSpire username…
…and a button to rename campaigns is also complete.
Both of those will be in the next patch to the game.
Ree and I spent a bunch of time testing the current TaleSpire build with the upcoming backend patch. So that should go smoothly. We rediscovered a couple of existing bugs in the process, so we’ll try to get some fixes for those in too.
I’ve also been doing some experiments with erlang so that, hopefully, more of the server updates in the future can be done with zero downtime. We’ll see how that goes :)
I think that is everything. Doing updates properly is slow as hell, but it’s fun to be getting closer.
Have a good one folks.
Here is a fun little tweak that is coming in an update soon.
In TaleSpire, we love the tile-based approach to things. We also love what can be achieved with clipping but, due to tiles have similar sizes, it’s easy to cause z-fighting.
Yesterday, Ree suggested I try to add a tiny offset to the positions, so they were less likely to line up. But there is a caveat… we want the offset for a given tile to be the same every time you load. The reason for that is that it would suck if each time you joined the board, things looked very slightly different.
Tiles and props don’t have unique IDs (as that would be way too much data), and we can’t use their index in the arrays (as that changes when boards are modified), but we do have their position.
The position isn’t necessarily unique, though, but when batching, we also know the UUID that identifies the ‘kind’ of thing being batched. This is ideal, so we mix some bits from the id, some bits from the position, feed it through a bodged noise function, and scale the result waaaay down.
This tiny offset is stable and is enough to improve z-fighting in a bunch of cases.
The eagle-eyed among you will notice that this doesn’t fix cases where two of the same kind of asset exactly overlap. This is correct and is not something we are looking to improve as it’s not a useful tile configuration anyway.
So that’s that. It’s a blast to occasionally get these little things where a ten minute experiment can give such a cool result.
Aside from this, I’ve also pushed a patch to the database, which allows us to store data for a bunch of upcoming creature features (polymorph and persistent emotes among them).
Hope you are having a good day.
The last few days of works have felt really good. I’ve been back in the flow as I have got through the ‘working things out’ stage.
The first thing I did was write the serialization code for the new creature data. With this done, I could hook up the code to upgrade from the old format, refactor a whole bunch of stuff and get creature saving to the backend again.
With that taking shape, I switched to the backend.
We have an API description as erlang data, and we use that to generate both the erlang and c# code for communicating with the server. I extended this generator so it can make c# structs as previously it only made classes. I also improved the serializer on the erlang side, so it needed a little less hand-holding.
I then used the generator as I defined the new API for creating and updating unique creatures.
My old code for updating unique creatures was lazy/expedient (you decide :P). I previously pushed all of a creature’s data to the server on any change. This was obviously much more data than was needed in almost all cases, but it worked. This time I made many more entry points, each of which applied smaller changes. This will reduce work for the server and hopefully make things a little faster.
With a draft of the API, I could generate the c# code and then add it to TaleSpire to see if I had missed anything. Once I was satisfied with that, I head back to erlang to implement the server code.
While working with the database, I hit an error when calling a SQL function where one of the arguments took a user defined type. It turned out that epgsql, the library I use, didn’t support auto-conversion of erlang data to custom SQL types. They did, of course, have a way for you to add your own ‘codec’ to do this, so it was time for me to learn about that.
The manual was helpful but didn’t provide concrete examples for what I was trying to do. I read the codecs included with the library, and that helped me to draft out the basics. What is nice is that you get to write and read the binary data directly, and erlang’s binary pattern matching is wondeful, so I didn’t worry about that.
What was not clear from the existing codecs was how to pass data for a user-defined type rather than a built-in one. I first tried just sending the data for the two fields of the type, but I got an error along the lines of “-34234324 columns specified, 2 expected”, which told me I needed to pass the number of columns, and that it knew what I was trying to send. A bit of googling led me to the source code for the part of postgres throwing this error. What was great was that at line 520, we can see that it reads a four-byte int to get the number of columns. We can then see exactly what else we need to provide, the most interesting of which is the type-oid for each field we are sending.
At first, I just muddled through as the errors I got told me which type-oids it was expecting. However, this felt very fragile, so I jumped back into the code for the library to see what I was meant to do. What was lovely was that on the init of your codec, they pass you an object you can use to query the type-db for your connection. This let me cache the type ids on init, which was ace.
As I didn’t see an example of this anywhere else, I’ve put up this gist of what I came up with. Maybe it can help someone, or perhaps someone will correct me and show me how it’s done :)
With that hurdled officially jumped, I got back to the slog of writing and testing the code.
I finally got unique-creatures working again, so it was time to start work on how we will upgrade to this new code. This involved the following SQL code:
- Code to apply the changes to the DB
- Code to initialize all the new columns with data from the existing columns (where sensible)
- Make the old unique-creature SQL code forward compatible. This will let you keep playing on an older version without the new and old data going out of sync.
With that DB patch looking promising, it was time to test on my local setup. This means:
- Starting a build of the old DB and server
- Building some stuff and placing some creatures in TaleSpire
- Applying the SQL patch
- Checking that nothing has broken
- Shutting down TaleSpire and the server but leaving the DB and files up.
- Switching TaleSpire and server to the new branches
- Starting up the new TaleSpire and server builds.
- Checking that everything still works
This is now working well, so I’m feeling pretty happy. I still need to finalize some details with Ree when I meet up with him, but maybe we can push the DB patch next week. This gives us what we need to progress on things like polymorph, persistent emotes, 8 stats per creature, and more.
Right, I should get some sleep.
Have a good one!
 All of this has been done on a local build of our backend, so I wasn’t risking messing stuff up.
 Uniques creatures are stored in the DB instead of being packed with the non-uniques in a file. So to change what data is stored for creatures means changing both places.
 In some cases, it’s nice not to be allocating another object.
 Seriously - Low-level languages need to steal it. There is some general erlang’y ugliness to it, but the idea is excellent.
 I also verified them using
select typname, typelem from pg_type
 Erlang, like many venerable languages, grew up before the modern culture around documentation. Common-lisp felt very similar, and you have to get decent at reading other people’s code to learn how to use many things.
 with a few hacks in TaleSpire as I haven’t finished the polymorph code yet.
 We have to nail down the data we need for persistent emotes.