From the Burrow

TaleSpire Dev Log 280

2021-06-21 21:52:07 +0000

Heya folks,

We are currently working on the last things before we’ll be ready to ship the ‘props on bases’ feature. This is a super exciting one, and I expect it to be used and abused in very interesting ways.

Last night, however, I took a little detour to look at a bug that has been around since the chimera build. Lights that turn off when you get close to them.

Our lights attempt to be the same as the standard Unity lights, except without using Unity’s GameObjects for performance reasons. This means that, like Unity’s lights, ours use a different material depending on whether the camera is inside or outside the light’s area of influence[0].

The lights seemed to be turning off because we were not always updating the material at the right time. Let’s get into the weeds a little.

To avoid using GameObjects, we use CommandBuffers to render the light meshes at the right point during the frame. For dynamic lights[1] we have to rebuild the queue each frame[2] as the light’s position or properties are being changed, but for static lights, it’s different. Static lights, by definition, aren’t changing, so we have a separate CommandBuffer for them which we only update when the board is modified.

An important detail here is that you can’t just update an element of the CommandBuffer. It has to be cleared and rebuilt

This approach has a very dumb mistake in it, though. When the camera moves, the material the light is using might need to be changed. I’m not sure why I didn’t notice that while writing the system, though I expect I was rushing for Early Access and forgot. Regardless, this needed fixing.

Internally the TaleSpire board is split into zones, and we apply operations across these zones in parallel. Each zone communicates with the light-manager to enqueue its lights into the CommandBuffers of lights to be rendered[3]. What I tried to do was have one CommandBuffer for static-lights per zone and then only update them dependent on the camera position[4].

This worked, but I noticed something annoying when I looked at the profiler. The update time for the static lights was fine, but the rendering took a significant hit.

This is my test scene. It’s 4096 static lights. I’ve not added anything else so as not to confuse things.

lights

Here is what we have for light rendering before the fix (so with one CommandBuffer for static lights):

old version

And here is the same scene, same camera angle, etc, but with one CommandBuffer per zone:

new version

Doesn’t that suck? Even though the amount of work to do is the same, the overhead from many CommandBuffers made things take significantly longer[5].

I still hope to ship the fix for the lighting bug this week, but I’m going to have to look at it after we ship the ‘props on bases’ feature, as it’s clear I’ve got more experiments to do.

There is still plenty to try. What I’ll probably start with is switching to three CommandBuffers. One for dynamic lights, one for static lights from zones that have had to update recently, and a final one for static lights from zones that haven’t been updated in a while. This way, we minimize the overhead from CommandBuffers while also minimizing the number of lights being rewritten to the CommandBuffer each frame.

Alright, that’s all for now. Can’t wait to be back with more.

This is gonna be a fun week.

Seeya!

[0] More or less. This is close enough to be able for this discussion. [1] Dynamic lights are lights being moved or animated by scripts on the tile/prop [2] Yup, there are places we can optimize here, but this log skims over that detail as we are focused on static lights [3] This isn’t the exact architecture, so don’t sweat these details. We just want to talk about the issues. [4] Of course, this is really about the camera position in relation to the area influenced by any of the lights in the zone. [5] I’d recommend not caring too much about the wall-clock time in this case. Of course, 2 ms matters, but this is also running on a fast CPU. What really stings to me is that it was significantly faster before.

TaleSpire Dev Log 279

2021-06-10 14:30:28 +0000

While we work on the ‘props as creatures’ feature we talked about the other day, I’ve switched tasks to look at some bugs hitting folks in the community. To that end, the next patch will have the following:

A fix for a memory leak in copy/paste
Cases where exceptions should have caused TaleSpire to leave the board but didn’t.
Fixes in the board loading code to make it more robust
A fix to the campaign upgrader, which was having errors

The campaign upgrade issue was a regression caused by me when I changed how some internal data was structured. I should have double-checked the upgrader before pushing.

I also am tracking another regression that is causing issues with picking tiles and props when in certain positions. I’ve got a solid idea of where these issues lie, so I’m hopeful that I can get this fixed today.

Once that is done, I’ll push out a patch.

Hope you are all doing well,

Peace.

TaleSpire Dev Log 277

2021-06-01 22:00:22 +0000

Heya folks,

For the last few days, I’ve had an urge to work on bugs, so I put down the creature improvements and dipped back into the issue tracker.

The first bug I ended up fixing was one I spotted while working on other things. It turned out that long error logs were not being truncated when uploaded to the server, which caused the transfer to fail. This was a nice quick one to fix.

Next up, I was curious about a long-time bug where static lights would flicker briefly when placing new static lights[0]. The code was fiddly as the way I was iterating through the assets was good for performance but not for readability. In the end, it came down to some mistakes where I was writing the GPU data for the lights[1].

The last thing I did yesterday was to fix a bug in hide-volumes where, if you make them too small in any dimension, they became unclickable. This was simply that the physics engine doesn’t handle boxes where the size in any given dimension is zero. I’ve modified the tool so that you can’t make hide volumes that small anymore.

Today I focused on one specific bug. As the excellent report shows, there are cases where copying large slabs caused the game to crash. A board was provided, which, along with the instructions, made it trivial to reproduce the issue. The problem came down to an oversight I made while trying to reuse code.

The batching for slabs is heavily based on the code for batching a single zone. This worked great, but I had missed the fact that the batches allowed a max of 13000 instances per kind of object. This was far more than is needed per zone, but for certain slabs it’s not hard to go over that limit (50x6x50 single grass tiles, for example). To handle this, I wrote a new struct where the internal array did not have this limit at the cost of some additional allocations.

All these fixes are now merged, and so they should ship in the next update.

Until next time, Peace.

[0] Static lights are ones that do not animate. The crystals are good examples of this. [1] I tried to expand on the explanation here, but it got too unwieldy to explain without lots of surrounding code

TaleSpire Dev Log 276

2021-05-22 19:20:49 +0000

Hi folks!

After the Norwegian national holidays, I poked around fixing the bugs which caused the board corruption that hit some folks a few weeks ago. Part of the fix involved adding an extra piece of runtime data per tile/prop in the board.

This did not affect the size of the saved data, slab size, etc. However, as this did mean an extra 12 bytes of data per tile/prop at runtime, I spent a while looking at reducing that. From now on, we’ll refer to the tile/prop as a placeable.

The data in question was the world position of the origin of the placeable. Positions of props are snapped to the nearest hundredth of a unit, so we don’t need the full range of float values. This means we could opt for a different representation. Half-float does not have the precision we need when values are greater than around 1000, but we could multiply the value by 100 and cast to and int (essentially storing as fixed-point), the values happily fit within the range of a short so that would take the size increase down to 6 bytes.

Also, all placeables exist inside zones, so we could store the position relative to the zone. A zone is a cube 16 units across, so that means we only need 16 * 100 = 1600 values per component or 11 bits per component. That’s ~5 bytes for the three components.

I also looked into other runtime values stored placeable, which could be stored differently now that we have the origin position per placeable. However, I won’t overload this update by going into all that.

After all the poking, I could make things smaller, but it didn’t result in more placeable data per cache-line, so I didn’t think the extra costs would be worth it. I’ll definitely re-examine this when I switch to more of an SOA format for some of the data.

Other than this, Ree and I took some time to plan some changes to the persisted per-creature data. We are adding the data for a bunch of features in one go, even though it’ll take a while to implement the rest of each feature. This is because upgrading the file format takes a lot of care in order to avoid screwing things up.

We will be adding the data for:

Persistent emotes
Four more stats per-creature n
The ability to add props to bases and use them as creatures
Polymorph
Extra color data for things like chat bubbles

You’ll be hearing a lot more about those as work continues :)

Have a good one folks!

TaleSpire Dev Log 274

2021-05-12 13:49:09 +0000

Hey again folks!

Progress is slow but steady for me this week.

On Monday, I started adding support for multiple asset-packs to TaleSpire. This is required for future modding support. We knew this was coming, of course, and so a lot of the work had already been done. The main task was deciding how the asset-pack id would be stored in TaleWeaver, writing it into the correct places in the index, and updating TaleSpire to search for packs in a given directory and load them.

The TaleSpire part will still be improved as we don’t want to load all packs the game can find immediately. Each campaign might be using different packs, and loading unnecessary ones waste local and GPU memory.

Other than some board restores, my focus for this week is on the layout code in TaleSpire. The runtime data has a known issue that can make it very fragile to mistakes in asset-packs. The wrong change there can currently corrupt boards. It’s been easy enough to tiptoe around it for now as it’s only us providing assets, but this is unacceptable for modding, so it needs to be fixed. [0]

Naturally, this touches a whole mountain of code, so it’s going to keep me busy for a bit, but it’s gonna be a huge relief to get it done.

That’s all for me for today.

Seeya around folks :)

[0] The changes should not have any visible impact to the game or to the board format. It’s only changing how we juggle the data behind the scenes.

TaleSpire Dev Log 273

2021-05-01 19:38:27 +0000

Heya folks.

I spent the end of the week looking into the bug with markers which meant they didn’t spawn when joining the board.

This went well and allowed me to clean up a bit of the implementation behind the scenes. However, this also meant I bumped into a bunch of other bugs related to GM blocks. I’ve got through a few of those now, with only one remaining that is stopping me from shipping[0].

The patch fixing markers will come soon. One thing that will be missing from the initial patch is that published boards won’t include markers. This will be fixed later shortly (probably later in the week).

I would also like to start hashing the board state that lives in the database (markers, unique creatures, etc.) and skipping the download if the local cache has the latest data. It’s not critical, but every little helps.

In the last update, I mentioned that I was sent a published-board and some instructions to replicate a nasty board-breaking bug. Once I had this, I was able to repeatedly delete parts of the board and then test if the bug still occurred. I was able to get the large board down to a small chunk of tiles that still triggered the issue. I then started seeing something odd. Occasionally the delete would corrupt the tiles, and occasionally it wouldn’t. I slowed down and took more note of how I was following the steps, and I realized that if the selection box only just enclosed the tiles, the delete didn’t corrupt anything. However, if I used a huge selection box that encompassed the slab, then delete would corrupt the slab. This told me exactly where the problem was.

The board in TaleSpire is divided into 16x16x16unit zones. When we apply changes to the board, we often do so in parallel across the zones. A delete typically needs to scan through all the tiles in the zone to see which ones intersect the selection bounds. However, if the entire zone is enclosed by the selection bounds, we know that every tile must be too. This allows us to implement delete more efficiently in those cases. The bug was in this optimized version of delete. It wasn’t an exciting bug, just a simple case where I wasn’t incrementing an index at the right time[1], but plenty enough to cause havoc.

I then had to update the tests. First, I added a test for this specific case. But then I updated the fuzzer as that should have been capable of finding this on its own. The problem was simply that the bounds for the deletes being generated were not large enough in all dimensions to enclose zones[2]. With these fixes, we are now covered if some regression were to recreate this problem again in the future.

That’s all for now, folks. I’m excited to get the marker fixes out as I’d really like to get working on the marker panel soon.

Peace.

[0] it’s a bug which means that if you modify a sector of a board, then the gm-blocks in that sector aren’t synced to other gms when they join the board.

[1] There was a similar mistake in the undo code for this kind of delete.

[2] The model we fuzz against is a 1d board, this meant that we only ever needed thin selection bounds. It doesn’t hurt to use larger ones so we do that now.

TaleSpire Dev Log 272

2021-04-27 10:44:57 +0000

Hi everyone,

I had somewhat of an extended weekend as I had forgotten I had some real-life things happening on Monday. All in all, it was lovely to decompress few days, and I’m stoked to get started again.

And we are in luck. A wonderful member of the community has managed to create a board that reproduces one of the nastiest bugs we have right now. For a long time, there has something that can trigger data corruption when cutting a board region and then undoing that cut. However, what has been confounding us is that it has not been possible to replicate the problem consistently. Several lovely folks have got us close, but there was always something that made it very trial and error.

We now have been given a published board with three-step instructions on how to break it.

Needless to say, this now makes this bug tractable. The start will be a lot of trial and error as I try and trim this board down to the smallest thing that still shows the bug. Then I’ll dive into the data and see what is going on.

Board corruption bugs are the ones that upset me more than any others, so I can’t wait to squash this one.

Hope you are all doing well!

Ciao

p.s. In writing this, I’ve already been able to find out that the issues are not related to cut specifically, but delete. I’ve also halved the size of the board that produces the issue. We are on our way :D

TaleSpire Dev Log 271

2021-04-24 05:52:28 +0000

Heya folks,

A quick update for today. It’s been a great week for bug fixes and content.

The art team just landed the ‘Siege of the Cackling Horde’ pack. I saw the trailer the same time as you folks, so I was giddy :D Sometimes being involved in a thing means you have seen the whole messy process of bringing it to life, and it’s hard not to see the flaws. Just seeing the result is wild as you get to be surprised by it all. I love it.

The next bugs I’ll be tacking are around markers as they are not syncing correctly at the moment. I hope to have them working again early next week.

I’ve also got a server patch to finish, improving how we handle changelogs and client error logging. The latter is just about organizing the data we already have, but it should mean I can track down erroring boards without having to bother as many players.

However, first, it’s the weekend. So I wish you well and will see you next week with more fixes and features!

Peace.

TaleSpire Dev Log 269

2021-04-21 13:25:51 +0000

This dev-log is an update from just one of the developers. It does not aim to include all the other things that are going on.

Hi everyone!

This is my first one of these post-release, and it’s been a weird first week, that’s for sure. In some ways, the game has been much more stable than expected. The servers have held up well, and, from watching twitch, it seems a good bunch of people have been able to play, which is great. This meant I could take the weekend off and start the process of catching up on sleep. This was very welcome after the nervous lack of sleep caused by the approval delay.

Now I’m a bit more compos mentis I’m tackling bugs, focussing primarily on any that cause crashes or stop people joining boards.

One bug in line-of-sight is a bit of a blighter and which I’m now waiting on more information on. We have players for whom supportedRandomWriteTargetCount is one, this means that when calling SetRandomWriteTarget you’d think you want to call with index zero. However, as the docs explain:

The UAV indexing varies a bit between different platforms. On DX11 the first valid UAV index is the number of active render targets. So the common case of single render target the UAV indexing will start from 1. Platforms using automatically translated HLSL shaders will match this behaviour. However, with hand-written GLSL shaders the indexes will match the bindings. On PS4 the indexing starts always from 1 to match the most common case.

However, SetRandomWriteTarget checks if the index is greater-than or equal to supportedRandomWriteTargetCount, and if so throws an out of range exception.

I’m fairly sure this means that the one random-write-target that supportedRandomWriteTargetCount is referring to must be the render-target. If so, it means that I’ll need a new approach for line-of-sight on these GPUs. What a pain :D [0]

That aside, the next patch will have (amongst other things) a few small fixes to issues stopping players from joining boards.

Have a good one folks

[0] In future, we will be focusing on supporting lower-end machines. It is its own art, so we are trying to get the game proper functioning well first.

TaleSpire Dev Log 267

2021-04-09 23:40:40 +0000

Heya folks, just a quick one to let you know that work is progressing well. I’m currently fixing the last annoying little bugs in party line-of-sight, and then I’ll be wiring all this up to the fog-of-war code.

I can’t wait to be pushing this all out in an upcoming patch

Peace.

p.s. Just a reminder that fog-of-war will be super experimental. Both visually and functionality wise it is not ready for use in real campaigns. However, it’s gonna be great to have it in your hands so you can start kicking the tires :)

Previous Page: 17 of 65 Next

Mastodon