From the Burrow

TaleSpire Dev Log 211

2020-08-10 15:05:16 +0000

Finally, something decent to show again!

I won’t lie, the last couple of weeks of coding have been hard. I’ve been working on the new animation system, and, as you’ll see, this dragged in a LOT of other systems. While the final solutions are not overly complicated, finding out what they needed took a lot of tries, and the number of systems involved made it hard to keep in my head.

First, though, videos time!

The following clip shows 10k animated objects being spawned, all with their own scripts running independently, without dropping below 60fps. They are all in sync as I wanted to see lots of things happening, so I made time loop every 2 seconds, causing the animations to replay.

This might be a bit underwhelming to look at, but I’m so glad to see this. Let’s dive into why.

Scripting

For a long time, we knew we needed to be able to add behaviors to tiles and props. Without hackery, Unity doesn’t let you add more code at runtime, so we can’t just load c# code from asset-packs along with tiles and creatures. This got us looking into scripting languages.

We started looking at LUA. The C VMs are fast, but there are also c# VMs that have excellent integration with other c# datatypes.

Side note:

A lot of the time, my job is looking at something simple and saying the words, “And then the user spawns 10000 of these” (let’s call this the ‘10k-case’). Making 10k of something in TS is very easy, just drag out a 100x100 region. This is why everything in user-generated-content games is more complicated than it seems.

My role is to look at this scaling problem try and limit the worst-case complexity. The players will always be able to make enough things that the game performs poorly, but we usually can find things to improve the situation.

Looking at various scripting languages, my big concern is what happens when 10k are spawned as:

  • Each needs an unknown and potentially unbounded amount of memory.
  • Each might be used in ways that aren’t thread-safe.
  • Everything object created needs managing by the GC.

The last one is especially concerning. If you’ve ever shipped a game in a managed language (like c#), you’ve probably spent time minimizing memory allocations to stop the GC from impacting your frame rate. Without understanding the impact of a script, I was unsure we could make something that we knew would run well.

The next experiment was to use LUA like a scriptable composition system. We would provide performant ‘operators’ and modders would connect them up using a LUA script, which would be invoked less frequently (for state machines it would be on state change).

This was promising, but we are in this weird case where we have made LUA not like LUA anymore. So the people who like LUA and want to use it cant. And the people who hate LAU still have to use something they won’t like. And we still have the threading questions.

Spaghet

This made me take a day to hack together a simple compiler. It produced a very simple byte-code that we would run on a little VM built on top of Unity’s Job System. By using the job system, the VM code is optimized by LLVM and we can run the scripts on many threads in a way that fits in well with our other systems.

Even simple tests showed promise. We split the world into two kinds of scripts:

  • State-machine scripts
  • Realtime script

State-machine scripts (currently) run only in response to user input, and they are fully synchronized across the network.

Realtime scripts can run every frame and are not synchronized (as there would be far too many messages). Their programming model is similar to shaders in that the tile’s visual state is recreated from scratch every frame from the input.

The trick now is that each state of a state-machine script can have an associated realtime script. This means that when in that state, the realtime script is run automatically.

For example, for a door, the state only changes (and is synchronized) when the user opens or closes it. However, by animating the door in the realtime scripts, you get the expected animation on all player’s machines at the right times.

The fall of the GameObject

Unity’s traditional approach to making this is all object-oriented. It’s compelling and, in the right hands, is an incredible canvas for creativity. Alas, when performance is critical object-oriented programming (OOP) is problematic. OOP makes it very hard to control where things stored are in memory, and this is essential when trying to batch jobs together to run quickly. CPUs can do unfathomable amounts of work per frame if they aren’t spending their whole time waiting on things to arrive from RAM.

relative speedasnimation

Memory access latency from cpu caches v ram – Andreas Fredriksson

On top of this, Unity’s GameObject’s are (relatively) slow to create, fine for most games, but terrible for our 10k-case.

We worked very hard trying to keep using GameObjects as they have such significant advantages for content creation and experimentation. Arguably, this contributed to the delay of the Beta as working around the speed limitations had some very non-obvious interactions with our building systems. However, it was not to be, and even though the Beta shipped, we continuously see performance issues that we don’t have clear solutions too if we stick with the current approach.

Of course, we’ve known about this issue a long time, and the good news was that an answer was already in production. Unity’s ‘Entities’ system is their new approach to handling things in the game and promised impressive performance boosts. We were very much banking on that solving our issues.

This was a mistake on my part. Naturally, the Entities system is a complicated beast with its own tradeoffs, and Unity is taking its time to get it right. When we finally tested the Beta, it still had some concerning performance characteristics for our use case. That sucked.

The good news was that amongst all this new code, there were many high-performance tools that we could use to make our own system. The problem is that, with GameObjects, we got so many things for free and now we need to handle them ourselves. Here are a few concerns:

  • The new physics engine is excellent, but we use collision casts everywhere so most systems in the game are gonna need some modification
  • There is no culling or batching written for us. We need those.
  • There is no animation system at all
  • There is no way to specify lights expect using the classic approach
  • feel: This is a terrifying one. Game feel takes ages of tiny tweaks based on the creators’ intuition. If we lose that it’ll take a long time to recreate

We knew we had to just go for it. As tiles and props are where the perf issues lie, if we can make a new system for them, we can afford to keep the other stuff as GameObjects.

Too late now >:)

Anyway, there was nothing to do but to get started. I jumped over to TaleWeaver (our modding tool) and started work on the new data formats for everything we would need.

It was critical that we didn’t mess without our artist’s ability to work. Making games is hard, and everyone needs to be able to work as effectively as possible. Your smart little ideas and solutions must not tread on someone else’s ability to work. To me, this meant analyzing the GameObject representation in TaleWeaver, and writing out our own, TaleSpire specific, version of this data.

Luckily once again, new tools exist in Unity to help. The Entities package has a Blob data system that promised fast and job-safe data encoding un unmanaged memory. I made some tools that let it hook into Unity’s classic asset serialization system and got to work.

I extract animations, compile scripts, flatten transform hierarchies, and other simple stuff and pack the result into out blobs. Most of the slowdown here is split between learning new things in Unity, and working out what we need. We also made atlases from the tile icons.

I replaced Spaghet’s graph UI and iterated on the VM. Again, most of the time is just from working with new tools and trying to find the best ways to do the silly things we need.

On the TaleSpire side, I wrote a new asset database to accommodate the new formats we have made and hooked up enough of that to keep working[0]. I then started work on the batcher.

The batcher is a beast. Each frame we need to tell Unity what to draw and Unity, like any engine, needs that information in some format it can use. Luckily for us, most of our objects don’t move each frame, so if we can get it in the correct format when the board changes, we can keep reusing that data. From now on, we will use the term ‘statically-batchable’ to talk about all the things where we can use this technique. All of the objects where we do need to update the position/rotation/etc per frame we will dub ‘dynamically-batched’.

So I spent a day or so writing the static batcher. Each zone (16x16x16 unit region of space) in the game runs this independently and concurrently. We do not combine batches across zones. This means we make more draw-calls than we could, but this lets us enable and disable zones without recomputing the batches for any other zone.

Dynamically-batched content is harder. I had already decided that the animation system would be directly tied to the scripting system, that is, Spaghet will be animating the objects. That lets us simply say that any part of the tile that can be modified by a script during runtime is dynamically-batched, and the rest are static[1].

I spent some days getting the Spaghet wired up enough to continue. I then made jobs to run the realtime scripts, compute the transforms for the objects and write them into the batches. A fun thing to note is that the number of dynamically batched things is not changing frame to frame. This means that you can do the following:

  • Size a batch with enough room for every dynamic & static instance you need
  • Write the static batch info leaving enough room at the end for the dynamic ones
  • Every frame write the updated dynamically-batched transform into the reserved space

This is nice as you don’t need any extra allocations or draw calls. The downside is the complexity it adds to the system, but hey, we are here for the performance.

During all this, I kept finding small issues that meant I needed to change the data format to help the engine. This is the right thing to do, but switching back and forth between the projects was tough on the brain.

I’m gonna skip a pile of details for all of our sakes, but the result is that the thing we are now making is capable of getting us where we need to be. It’s not fast enough, or stable, but the tests are finally showing something promising.

  • Wrong thing to conclude: Unity is slow
  • Right thing to conclude: Unity is fast. But, for what we need, the default tools they have to drive are not.

Warning about performance numbers. We can’t make decent measurements of the game yet, I still need to write up a lot of new stuff before we can start getting metrics we can rely on. But what we can guarantee is that TaleSpire is getting faster.

I hope you enjoyed this. I also hope I managed to convey the reason I’ve been less mentally available recently. Things are going to stay ugly for me for most of this month, but I know we will have something cool in the end.

Have a great day.

Peace.

[0] Funnily enough, the old one is still in there now being used by the UI. I’ll drop it when I get to that part of the rewrite.

[1] there are obvious cases where we don’t have to recompute a tile’s transform every frame, for example, when an animation has finished playing. This is true and will be handled by Spaghet scripts knowing when they can go to ‘sleep’. I’ll cover this in another post.

TaleSpire Dev Log 210

2020-08-05 01:47:05 +0000

Hi folks, I skipped yesterday’s log as I was far too deep in code, and to be honest today has been much the same.

The TLDR is that I’m working on TaleWeaver and the modding tools. I’m doing this as we want to move to more directly managing the batching of what is drawn rather than leaving it all to Unity. Changing the way that works requires different data, a lot of which can be packed into the asset when it is exported from TaleWeaver. Changing this batching approach also means we need our own animation system, which is fine but related to the scripting system (Spaghet). Before you know it, we have major changes to scripting, animation, rendering, and more.

It’s been a heck of a lot to keep in my head at once. Every decision impacts how another system works. Complicating this, of course, is this should be at least reasonably performant, so you cant just use <insert your favorite design-pattern/paradigm/etc> and trust that it will come out rosy :P

All in all, I’ve just not had enough bandwidth to tackle anything else.

Ree has also been busy in the experiments around props. I’m already very excited to see what comes out of that.

Tomorrow I need to take some hours to look into some networking issues some users are having. Sorry for not getting to you sooner, you know who you are.

Warm regards from code-land.

TaleSpire Dev Log 209

2020-08-01 00:21:51 +0000

Phew, what a week. I’ve almost lost track of all the things that have been going on. Amongst it all, more code is getting written.

Ree has been able to get through enough other tasks that he’s got some time to start experiments with props behaviors. Props are one of the most significant missing pieces from the game right now, and whatever behavior we settle on, we are going to have to live with for a while, so it’s good to take our time with this and find something we like.

I’ve been back in TaleWeaver, looking at how board-assets work. Yesterday I wrote a tiny compiler for the state-machine scripts and added code for showing errors in the state-machine script graph.

Today I’ve written code to export the tile information in a new binary format. TaleWeaver now also makes atlases for the tile icons. This should allow TaleSpire to batch more draw calls in the UI, hopefully speeding that up a little bit.

The next stop for me is to take this new format, get it loading into TaleSpire, and rewrite the code that manages looking up tile information. I’ve also had great progress in the design of system that will run the realtime scripts, so everything is lining up to work well. Just got to keep hammering at it!

Hope this found you well.

Seeya

TaleSpire Dev Log 208

2020-07-28 01:08:09 +0000

Today, working with a total star from the community, we’ve been able to track down one cause of the “Stuck a main menu with a spinning hourglass bug”.

TLDR: Check your proxy settings. You may find that manual proxy is enabled, but the address is blank. This exposes a bug in Mono (a thing Unity relies on), which breaks the code that finds the TaleSpire servers.

Here is what the setting looks like when it’s wrong. We think this might have been set by a Windows update, but we are not sure.

NOTE: This will not fix cases where the spinning hourglass is only sometimes a problem. I would love to look into any hourglass related issue you have though, so please reach out to me (@Baggers) on the TaleSpire discord.

the problem

Now for the extended version of the story.

Many backers have never been able to play the TaleSpire beta due to a bizarre bug that means they get stuck waiting for TaleSpire to log in forever. Today I was once again struggling with this issue, so I decided to make a program to test the connection and make logs that would hopefully help us progress. I wanted the code to be as close to TaleSpire’s as possible, so I took the whole game and spent several hours ripping out pieces until I got down to just the essentials for the main menu. I added some logging and ended up with this:

TsConnectionTest

With a new weapon in hand, I teamed up with a community member who had experienced the issue. For the next hour, they ran the test app and send me the logs. I would then modify the app and send them a new build. Together we finally got this trace.

Object reference not set to an instance of an object =>   at System.Net.AutoWebProxyScriptEngine.InitializeRegistryGlobalProxy () [0x0005b] in <ae22a4e8f83c41d69684ae7f557133d9>:0
  at System.Net.AutoWebProxyScriptEngine.GetWebProxyData () [0x00007] in <ae22a4e8f83c41d69684ae7f557133d9>:0
  at System.Net.WebProxy.UnsafeUpdateFromRegistry () [0x0001a] in <ae22a4e8f83c41d69684ae7f557133d9>:0
  at System.Net.WebProxy..ctor (System.Boolean enableAutoproxy) [0x0000d] in <ae22a4e8f83c41d69684ae7f557133d9>:0
  at System.Net.WebProxy.CreateDefaultProxy () [0x00012] in <ae22a4e8f83c41d69684ae7f557133d9>:0
  at System.Net.Configuration.DefaultProxySectionInternal.GetSystemWebProxy () [0x00000] in <ae22a4e8f83c41d69684ae7f557133d9>:0
  at System.Net.Configuration.DefaultProxySectionInternal.GetDefaultProxy_UsingOldMonoCode () [0x00036] in <ae22a4e8f83c41d69684ae7f557133d9>:0
  at System.Net.Configuration.DefaultProxySectionInternal.GetSection () [0x00015] in <ae22a4e8f83c41d69684ae7f557133d9>:0
  at System.Net.WebRequest.get_InternalDefaultWebProxy () [0x00022] in <ae22a4e8f83c41d69684ae7f557133d9>:0
  at System.Net.HttpWebRequest..ctor (System.Uri uri) [0x0008d] in <ae22a4e8f83c41d69684ae7f557133d9>:0
  at (wrapper remoting-invoke-with-check) System.Net.HttpWebRequest..ctor(System.Uri)
  at System.Net.Http.HttpClientHandler.CreateWebRequest (System.Net.Http.HttpRequestMessage request) [0x00006] in <7ebf3529ba0e4558a5fa1bc982aa8605>:0
  at System.Net.Http.HttpClientHandler+<SendAsync>d__64.MoveNext () [0x0003e] in <7ebf3529ba0e4558a5fa1bc982aa8605>:0

A quick google took us here https://github.com/mono/mono/issues/10030. A bug in Mono which can occur when a user has ProxyEnable in the registry, but no entry for ProxyServer. We ran the following in the command prompt to clarify the issue, but you can see it more easily by just going to Window’s Proxy settings (hehe the command line is always the first place I end up).

reg query "HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings" /v ProxyEnable
reg query "HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings" /v ProxyServer

Anyhoo, they then disabled the manual proxy setting, clicked ‘Save’ at the bottom of the settings page, and booted up TaleSpire. Instantly it worked as expected!

This is great, but I now need to find a workaround we can put in the code. I’m hoping that HttpClient will allow me to disable the proxy. Then I should be able to look up the keys and force disable the proxy if it’s the issue.

This is almost certainly not an issue in later versions of Unity, as they will probably be using a later version of Mono. This is a perfect candidate for the kind of issue that really makes you want to update the Unity version. However, that is always a bunch of work. We’ll see how this workaround goes.

That’s all I have for tonight. If all goes well I should be back on rendering code tomorrow.

Seeya

TaleSpire Dev Log 207

2020-07-26 00:52:34 +0000

Good evening all. For the last few days, I’ve been struggling with the graph UI and how I want to serialize the scripts. The TLDR is that I’m making progress again, but I’m about two days behind where I wanted to be.

One afternoon was spent trying to understand some odd behavior I was seeing when dragging assets onto nodes in the spaghet graph. I was able to make a decent reproduction of the issue, and the library author was incredibly quick at fixing it.

The graphs themselves are usually stored as ScriptableObjects in Unity. This is ideal for the state-machine scripts, which I want to share between tiles, however, it did not fit my design for the realtime-scripts, which are closely tied to the assets the manipulate. I struggled with the serialization logic here for about a day, and I’m not satisfied with it yet. Regardless I need to make progress with TaleSpire, so I need to crack on.

Today I’ve started work on the UI, which lets you pick the script for a tile and then assign a realtime-script to each state. It’s looking something like this right now.

setting up behaviors

Basic, but enough to make progress.

Once I have this information serializing, I will finally start writing the new asset format. That will let me switch back to TaleSpire and rewrite the asset database. It should then be faster and compatible with the job system, which opens up some performance improvement possibilities in the future. However, we won’t be chasing those particular performance improvements, as we need to write the new animation system first.

Back with more real soon.

TaleSpire Dev Log 206

2020-07-22 23:50:29 +0000

Hey folks. I’ve now ported enough of my old Spaghet experiments to the new node graph that I can compile code again.

Here is a useless script:

spaghet0

On the left, we have some nodes (the node visuals are still WIP), and on the right Visual Studio’s debugger showing the result from the compiler. I’ve indicated the debug representation as it’s a bit easier to see the generated operations. The actual result is just a byte array, so I haven’t bothered to expand it (spoiler, it’s full of bytes).

I’ve also added some code to check for type errors and to report loops in the graph. This feels like the bare minimum requirements to be able to code in a sane way.

spaghet1

spaghet2

Tomorrow I’ll look into the output nodes. First, I’ll add ones for setting an asset’s transform and another for driving an animation. Both will let you drag the assets in questions straight from the Unity hierarchy so that it fits the standard Unity workflow.

That’s all for now. Seeya tomorrow.

TaleSpire Dev Log 205

2020-07-21 23:58:15 +0000

Good evening folks,

It’s been another decent couple of days convincing computers to do things.

Most of my time this week has been spent getting to grips with NodeProcessorGraph, which so far I’m loving. The main developer has been super responsive, making me even happier about sticking with this library.

I didn’t write up yesterday as it was one of those days that you know you need to learn a system very quickly and so you just slam your brain against it all day until you realize why things are the way they are :) A necessary endeavor, but not one that leaves you much in the mood for writing.

Today, however, I’ve been knocking together very simple nodes for the state-machine scripts that tiles can use. The closest thing I have to a visual is this:

oo nodes

This graph only expresses the state of the tile (or prop in future), and what actions transition between them. To do something visual, you will need the realtime-scripts.

A quick aside. In TaleSpire, tiles and props are things called BoardAssets. BoardAssets are collections of Assets, which act as a single unit. The BoardAsset holds the information of each Asset required and it’s local position/rotation/etc within the unit (along with a bunch of other BoardAsset specific data). The cool thing is that a BoardAsset may be made of Assets from totally different asset-packs. This means you will be able to remix the Assets that people like us provide.

By separating the two, we can share the state machine among many tiles easily and then, per-tile, choose the visual behavior that occurs during each state by mapping states to the realtime-scripts which live on the Assets.

Codesharing of realtime-scripts will be done through making functions. These will be sharable as strings so that we can all do the same kind of thing that we do with slabs today. That’s the idea at least :P I’ll be writing that bit soon.

Tomorrow will be focussed on the realtime-scripts. I want to see if I can get the compiler for those working again in the next few days. When that is done, I’ll see what the next logical step is. It’ll probably be writing the state-machine compiler, which will be pretty straightforward.

Ok, enough rambling for now.

Seeya tomorrow!

TaleSpire Dev Log 204

2020-07-20 00:11:50 +0000

On Friday, I got the code written to extract Unity’s animation data and pack it into the AssetBundle file in the way we wanted. This data will be utilized by the upcoming animation system to be used by tiles and props.

Before we get there, and as mentioned in the last dev-log, we need to update the boardAsset format that TaleSpire uses.

For the format rewrite, the first thing to review was what we were going to store. This quickly led me to look at the scripting system again. I tried to write a high level of some of the stuff I’m looking at, but it gets rather vague, so I’ll just talk about something concrete instead.

I am 80% sure I want tile/prop animation to be driven by the scripting system (Rather than tile/prop I’m just gonna say tile from now on for brevity). I already intended to have two kinds of scripts for tiles, state-machine scripts, and realtime-scripts.

The state-machine scripts are driven by user interaction and are synchronized across the network. Realtime scripts run every frame and are unsynchronized. I’ve imagined the realtime scripts to be like shaders in that they’ll run fast, in parallel, and don’t hold their own state frame-to-frame.

I’m looking into being able to specify a realtime script as the behavior for a given state-machine state. So if you are in a given state, the game runs the associated realtime script. The realtime script would set things such as the position/rotation/scale of parts of the tile, parameters of lights, and be able to drive animations.

The last item in that list of examples is the interesting one for today. If I can implement animation via the scripts, it would allow me to flesh out that part of the codebase now, and then look at custom animation code as a performance optimization for later.

Because of all this, I’ve started looking back into Unity’s support for node-graphs, as Spaghet is meant to be a visual scripting language. In the last 11 months, it seems like NodeGraphProcessor has come a long way. I like a lot of what I see there, and so tomorrow I’m going to start digging into how to use it to implement the state-machine scripts.

Once again, not making hard claims about the delivery of specific features is paying off as, if required, I’ll be able to reorder a bunch of work without causing issues. I’m sometimes sorry that this means we can’t give you as many concrete dates as would otherwise be ideal; however, the ability to roll with the punches is really helpful.

Hope you folks have had a good weekend, more news as it happens :)

Seeya

TaleSpire Dev Log 203

2020-07-17 15:02:45 +0000

Phew, this one took a while to get out.

The art team is hard at work, prepping the next big asset patch. Should be another fun one!

On Wednesday, I had some boring life stuff to sort out, so I wasn’t working that day.

On Thursday, Jonny and I had a catchup meeting and roughed out a plan for the animation system for tiles & props. To get working on this, I first need to extract the animation data into a format we can work with from the job system. It took a while to realize that, if you load a prefab from a Unity AssetBundle, the full curve information is no longer available. This is ok, as in TaleWeaver we work with the original prefab files, but it was still a slow journey learning the ins and outs of this part of Unity.

With that worked out, I wrote the first pass on the extractor. This was looking ok, but the standard asset serializer only supports classes, and I want the data in a job-friendly format. Now, Unity does have a serialization scheme they use inside their ECS when serializing Entities to their Scene format. They call these BlobAssets. These are fast, immutable, and pretty much ideal. However, they are not compatible with the AssetBundle serializer we mentioned early. To get around this, I have written an adapter that makes this possible. That took a long time, and I still would like to add a bit more data validation. However, I’m happy that I can now write the data in the way I want.

The next task is to rewrite the animation extraction code using this new store. I’ve been going slow here as I need to be able to uniquely a given resource (e.g. an animation clip) inside an asset, and I want to make sure that these are stable to the kinds of changes that are made when developing a tile or prop.

With that done, I’ll be writing a new system for packing the board asset data. I’ve looked at this several times when I was investigating CaptnProto and FlatBuffers, now I have a good workflow with BlobAssets I’m going to use that instead. The current JSON format will continue being used by TaleWeaver to allow the art team to iterate on that format as required; however, when we export an asset pack, we will use the new system instead.

And once all that is finally done, I can start writing the animation system :)

I have also been researching Unity’s new Physics engine again and am slowly designing how we are going to interface with that. It’s going well, but I wanted to get animations working first as some assets (like doors) move colliders using animations.

Alright, until next time.

Peace

TaleSpire Dev Log 202

2020-07-14 20:28:46 +0000

Allo folks,

Some good progress today. The new batcher seems to be laying out tiles correctly now, so I can kind of build again. I haven’t hooked into the new physics engine so I can only build on the build plane currently.

What was nice was I could do some quick performance tests to see what ballpark we are in. I was able to spawn 10000 tiles in one frame without dropping below 60fps which is very promising. There are things the code doesn’t have to deal with yet, such as animations and physics. However I’m feeling confident that we’ll be able to get a large speed up in tile spawning over what we have in the beta today.

Here is a very wip, potentially misleading clip :P

Next, I’ll be adding jobs to build the per-zone physics data. It won’t be the code that ships, but like this, it will give me some insights into how the final system might behave.

Seeya folks!

Mastodon