From the Burrow

TaleSpire Dev Log 153

2020-02-18 17:51:31 +0000

Since the end of last week, my focus has been looking into unknowns on the code side of the project. To that end, I’ve jumped over to the server-side of things.

The first task change was that, in the alpha, all the communication with our server (which only handle persistence) was over https. This means that polling was used in some places, and overall it was too slow. The simplest change to be able to push messages, given our current stack, was to move much of this communication to websockets. Once picking a c# library and getting basic communication set up, I needed to do something about all the request based code that was previously using https.

In order to make the changes on the c# side more iterative, I decided to write a simple request system that sent its messages over the websocket connection. When making the alpha, I had made a code generator in erlang that took a spec (written as an erlang data-structure) and generated erlang and c# code entry-points. This has made keeping both sides in line trivial and, now, it made it easy to update the code as I just had to update the code generator.

With that done, I spent some time fixing bugs from the refactor until I could log in, create boards, etc again. I then switched tasks again to certificates. We had used amazon’s certificate authority for production previously as they were free for AWS infrastructure, and our servers were behind one of amazon’s load-balancers. Now that we are using websockets, we would have to switch to their ‘network load balancer’, which I don’t have any experience with yet. For now, we’ve just picked up a good ol’ fashioned wildcard cert and will work with that. We spread load across servers in the new approach anyway, so using the load-balancer for that is no longer an issue (though there are other benefits). We can look back into their TCP level ‘network load balancer’ at a later date. As an aside, we have previously used letsencrypt in staging, and while it did work well enough, I would need to make some changes to use it in production. This ends up being one of those cases where it’s cheaper to buy something simple that works than to use the free thing.

Also, on the server side, I’ve been reviewing our AMIs and docker images. One question I’m trying to get resolved right now can be found over here. It’s a simple thing, but I have failed to find an official text stating the answer. If you have some expertise, it would be super useful to hear.

Yesterday I started fixing a bug, which caused us to assign identical script ids to tiles in different zones of the board when a drag spanned multiple zones. It’s simple enough but still requires some focus not to introduce a dumber bug in the process :p I also fixed a small bug that, for tiles only 0.5units, we spawned them too close together.

Today I’ll be finishing off the script Id issue and then possibly looking at rewriting the unique creature support.


TaleSpire Dev Log 151

2020-02-13 01:55:28 +0000

Today has been a tricky one. I found a significant bug in the building synchronization code that is definitely going to cost me a day or two. Given that I’m already behind on where I wanted to be this is a bit stressful. As I have an understanding of this problem but still have a good few unknowns on the server side I’m going to task switch for a few days and see if I can get a better overview there.

Not a fun update to write but here’s to the next few days regardless.

TaleSpire Dev Log 150

2020-02-10 18:16:38 +0000

Heya folks, yesterday I got the Spaghet that run our doors and chest hooked up and syncing correctly.

Spaghet is our little scripting language we made to deal with the question “if someone drags out a 30x30 slab of scripted tiles, how much resources are we using (both CPU & networking)”.

Scripts come in two kinds:

  • state-machines: Their progress is driven by user interaction and is synchronized across the network.
  • realtime: Which run every frame and are unsynchronised

Each script gets a small chunk of private storage (currently 32bytes) and runs on top of unity’s job system, which means we can run them concurrently across cores with no GC. We can also trivially pass the state of a script across the network and, with a little book-keeping, reapply it on the other side.

Currently, we are only using 1 script (the one that controls doors and chests), but we have a lot in place to be able to expand on this as we head through the beta. For now, I’m just happy to see it working.

Today after finishing some Spaghet work, I switched to looking again at copy-paste. Here I’m just trying to get it to the point that it is spawning the copied slab as expected without worrying about the feel of the tool. After that, we can look at this as more of a UX task.

Hmm, I think that’s all for now. Back with more tomorrow

TaleSpire Dev Log 149

2020-02-08 12:52:03 +0000

Well, I’ve learned something about myself in this whole process. The closer I get to deadlines, the harder it is to convince myself to write updates when ‘the answers’ aren’t yet available. I’ve been struggling with a bunch of bugs and issues in the last couple of weeks and each time I see that I really should be writing an update I see the unsolved problems and think “If I can just get a bit further I’ll have something more concrete to talk about”. I guess dev logs getting sporadic around releases is just a thing that’s gonna happen for now. We shall see.

That said, hello again! A lot has been happening behind the scenes recently.

One big push on my side has been moving using ids rather than actual object references for creatures/players/etc across the whole project. So often in the future, the game will be receiving info on things that aren’t loaded on your local client, and it needs to be able to handle that. Luckily this only effects: building, sync, creatures, initiative mode, movement, gm request, and.. yeah well, everything. So, in short, I’ve been rewriting a significant portion of the codebase. It’s gone well, but it’s a lot of stuff to have in the air at once.

I’ve also finished implementing the sync of boards both to the server and between players. For now, it’s the same method as we used in the alpha. We are saving the whole board to a single file we load all at once. Naturally this wont work for big boards but I need to do some server work before I can add per-zone sync. We could ship the beta with this and then upgrade as we go. This may happen. The most important thing, for now, is that we get something that will get us out the door and is in the right shape to be moved to the new approach.

In fact, so much of the work feels different this time round in that we know what we need to make, but it’s a lot more cognitive weight :D. In the alpha, you make something, and then all the implications come pouring out; this time, you have to build and build with the main comfort being that, at least, you know it’s all needed.

Automated testing has been a lifesaver too. I recently added an option to the tests so that halfway through the test, it will serialize the board, deserialize it as a new board and hook it up like a networked client. It then continues the rest of the random actions and compares the two boards afterward. With these random tests, I’ve been able to find piles of dumb mistakes.

Other stuff that has been worked on in the last few weeks:

  • Rewrote the undo/redo stack and the code that moved asset data between the active and inactive set. It had gotten far too complicated and was impeding progress. The new system is coarser but simple and fast.

  • Ree has been working on a control to allow you to move creatures between floors and across otherwise impassable obstacles easily.

  • Wrote new managers for tracking board, campaign, and network state. Some of the work on campaign state is getting it ready for the coming backend changes, and a pile of it is making sure that our connection to the realtime network stack isn’t so directly tied to whether we are considered in the board or not.

  • (Finally) hooked up single tile delete!

  • We want to support up to 16 clients building. Whilst writing the code that hands out those slots, I realized that we have a lot of what we need for spectator mode. We’ll have to do a bunch of work on the UX side for this but it’s more feasible now.

  • The atmosphere system is under heavy modification by Ree. He’s looking into how music is handled now. While we’ve been warned the first version will be pretty bare-bones, I’m personally excited for where this will go.

  • We have now been able to jobify lots of tasks that only update shader state. This includes tile drop-in a highlighting. This can be much faster now.

  • Moved the creature state to the new board representation. Attacks, stats, death, etc all go via the new system now.

  • Rewrote cutscenes sync

  • Disassembling class hierarchies for assets: Don’t do OOP kids :p More seriously, I had tangled the implementation of creature & tile board assets in a way that made progress harder. I have undone that now and, while there is seemingly more duplicate code, it’s way easier to manage and iterate on (and most of that code has subtleties that mean it shouldn’t be merged anyway)

  • Removed Lua

  • Rewrote how Spaghet scripts index their private state

  • Water prototypes! (see the bottom of this page)

And waaay more. Lots of bugs are being worked on in the mix too.

It’s been a very productive time, but I’m still a good week behind where I wanted to be at this point. For context, the first ‘proper’ board load with the new system was 6 days ago, and the first networked play (with the new system) was yesterday. To be fair, it’s, of course, the case that for those to work tons of other stuff has to be working, so this isn’t too unreasonable. However it’s still a little tight. The best thing to do it work hard and keep you all in the loop.

Under the next one of these, Seeya

p.s. Here’s a quick extra that we are showing off on the Kickstarter this week! More news on this as it happens

TaleSpire Dev Log 148

2020-01-18 13:05:45 +0000

I’ve been having a great time coding down at @Ree’s place, but it has meant I’ve neglected these updates. Let’s fix that.

We met up so we could collaborate on merging our feature branches together. Up until now, I’ve been focused on performance, and the data layer and @Ree focused on tooling. The goal isn’t to complete the merge, but instead to do any tasks that are easier when we are in the same room.

I’ve successfully dumped all of my code into the project, got it all building, test passing, etc. The managers for various systems are created now, but the building tools are still using the old code for now.

One massive source of complexity in the old version stemmed from our use of Photon, the 3rd party networking layer (which is great btw). By default, when a person leaves the game Photon automatically clears up all the networked objects they created[0]. This is no good for creatures (which were networked objects) as the other players still need to see them, so we disabled auto-delete of networked objects. This naturally meant we had to handle cleanup, which includes things like dice.

With the new system, we won’t ever have the whole board loaded at once, and this means some creatures won’t be loaded either. This also means we don’t want Photon to be helpful and make sure all networked objects are spawned as soon as someone joins the board. To deal with this, we removed the Photon network component from the creatures and gave a networked ‘handle’ to each player. This is attached to a creature when you select it and syncronizes the transform while the creature is held. On the other end, if the creature is loaded, it seems to behave exactly the same sa it did previously. If not, then no worries, once the creature is spawned, it will have the latest position will be sent anyway.

After a misguided prototype, I teamed up with @Ree to work out the handle api, and then everything came together very quickly.

This change actually lets us turn back on auto-delete in Photon and delete a bunch of cleanup code. Removing code is the best.

NOTE: The handle approach may seem super obvious, but previously there had been reasons that it wasn’t a good fit. However, those reasons are out of date now thankfully :)

On the subject of creatures, @Ree has been working on a new creature controller, which is much better suited for moving around these increasingly vertical scenes. It’s based on the excellent Kinematic Character Controller asset for Unity. It’s an impressively stable base to build upon [1]. Our creature control still looks the same (we haven’t switched to slidy tank controls or anything :p), but this lets us handle moving through scenes more reliably than before. You still have the ability to lift the piece and throw it over walls when needed.

With the simplification of the networking code, I looked into upgrading to the latest version of Photon. I got partway through the migration before I hit an undocumented change. The StreamBuffer class used to be a subclass of System.IO.Stream and in the latest version, it isn’t. We have a bunch of code that relies on it being a stream, and so all that will need to be rewritten. This was already underway, but we need to keep the codebase working while all that gets finished and hooked up. For now, that means that I have to give up and do it later. Lost a few hours to this but ah well.

Another thing I have been doing during this merge is ripping out code. Old fog-of-war, line-of-sight, board format upgrading, all that and more got ripped out. There is no point fighting things if they aren’t required to keep tooling development progressing. It also is making it easier to work with or refactor code that used to have to interact with those removed systems.

There has been plenty more than this going on. @Ree is changing how the board grid works; I’ve been working on build speed and assemblies… and so on and so on.

In all this is going very well. We are plowing through tricky stuff and are ready to get back to working separately again, albeit this time with much more frequent merging as we approach Beta.

I’m taking Sunday as a rest day, but I’ll be back on Monday with more dev news.


[0] strictly, it’s the ones they own, which can be different if ownership has been transferred. [1] also amazingly cheap for what it is.

TaleSpire Dev Log 147

2020-01-15 08:20:14 +0000

Hey again folks, another quick ‘kept of working’ post :)

Yesterday was spent working on Paste and Deletion of single tiles.

The latter is fun as we don’t keep unique ids for individual tiles as the memory usage adds up fast. This means we have to take some network-safe aspects to use to identify the tile. This is slower than looking up an id in a hashmap, of course, but given the frequency it happens the cost isn’t that egregious. The upside is not spending that memory, and not having to maintain whatever structures would have given us a direct lookup.

The above is why I’ve previously mentioned how late I left it to add the delete-single-tile feature to the new codebase, I wanted to be sure what data I would have available to work with. The less data we store per-tile, the better. So in the ideal case, we’d have very little to work with.

Whilst I’ve coded the hard, data-wrangling parts of paste, I’ve not finished the implementation as it requires hooking into the board-tool system in TaleSpire and @Ree has been writing on his branch. So that will get finished when we merge those branches.

Which leads us to right now. I’m on the train heading to @Ree’s so we can work on that merge. It’s gonna be a gnarly few days, but hopefully, we can get something working by Sunday.

One of the things that has let us work separately like this is that we have not worried about performance on the tooling branch and not worried about tooling on my data branch. We, of course, had to be communicating a lot to make sure neither of us were doing things that couldn’t be made fast/useful after the merge. So far, however, it’s been working pretty well. This means that once they are merged, I can move to start writing the optimized versions of the operations the tooling uses. And poor @Ree has to make my stuff feel nice :p

Alright, that’s the news for now



This didn’t fit above, but I wanted to mention it anyway. UnmanagedMemoryStream is cool!

It’s a c# type that takes a pointer and length in its constructor and gives you a Stream type that is compatible with tons of .net’s existing APIs. This has really come in handy when I’ve wanted to compress data from a NativeArray without unnecessary copying. Check it out!

TaleSpire Dev Log 146

2020-01-13 23:37:03 +0000

Hi again,

Today has mostly been spent on implementing ‘paste’. Although not finished, it is coming along well, and I think I’ve hit most of the surprises the implementation has for me.

I also think I’ve worked out some details to deleting single tiles that mean we can defer shifting data around until later. Amusingly (to me), I’ve left deleting of single tiles until now as it is, in some ways, more complex than deleting a selection of tiles. I’ll probably write a little more about that another day.

This is all rather vague, isn’t it? I’ve been sitting here thinking of more to say about it, but really it’s just been a day of coding and pondering.

Yup… that really is all for today.


TaleSpire Dev Log 145

2020-01-13 00:18:03 +0000

I didn’t write a log yesterday as I was really struggling with undo/redo histories, and it was just too painful!

In short, I was fighting with the behavior and then making sure the board was behaving the same as the simplified model we use for tests. I finally got that behaving this morning and so have turned my attention to copy/paste.

The short version is that I think I almost have Copy done, and so tomorrow morning, I’ll wrap that up and start implementing Paste. The main thing that takes time has been just working out how best to make it fast and then triple checking that it will behave correctly with our deterministic board update system.

The nice thing with how copy works is that you only need to send a fixed number of bytes of information across the wire for any size selection. As operations are applied in order, we send the selection and the point in the history the selection was made, and this results in the same tiles being selected on every client.

An annoying case, however, is pasting slabs of tiles from text strings (like you can find at TalesBazaar). It’s a really powerful way to share content, but all the tile data does have to be sent to each client. It’ll be alright though, it’s just a case of making it as pain-free as possible :)

Ok, I’m getting super tired, and I’m not convinced that what I’m writing is coherent, so I’m gonna get some sleep.

Goodnight folks.

TaleSpire Dev Log 144

2020-01-10 23:13:49 +0000

Today I found and fixed a few nasty bugs, two of which were caught by the automated test generation mentioned yesterday.

I started off by writing the code to handle the undo/redo history. When that looked like it was working I set the automated test generator to make sequences of board changes between 80 and 100 operations long, this very quickly resulted in a series of operations that caused an exception.

The trouble was that it took 90 operations to trigger the bug which likely would be a huge pain to step through, so I made the worlds dumbest test minimizer.

When I get an error in an automated test I have it dump out the sequence of ops as code I can paste as a new test. This is what it gave me

public void ManualMix3()
    using (var asm = new BoardModelTestAssemblage())
        var actions = new List<VTuple<OpName, int, int>>() {
            VTuple.New(OpName.Add, 13, 0),
            VTuple.New(OpName.Delete, 6, 17),
            VTuple.New(OpName.Add, 56, 1),
            VTuple.New(OpName.Add, 23, 4),
            VTuple.New(OpName.Add, 27, 13),
            VTuple.New(OpName.Delete, 3, 17),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 39, 13),
            VTuple.New(OpName.Delete, 36, 7),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 57, 10),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 34, 15),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 11, 17),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 38, 6),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 0, 19),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Add, 33, 2),
            VTuple.New(OpName.Delete, 45, 17),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Add, 5, 1),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 22, 6),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Delete, 38, 3),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 47, 3),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 3, 1),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 14, 12),
            VTuple.New(OpName.Delete, 50, 18),
            VTuple.New(OpName.Add, 13, 16),
            VTuple.New(OpName.Delete, 45, 9),
            VTuple.New(OpName.Add, 42, 12),
            VTuple.New(OpName.Delete, 2, 8),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Delete, 34, 10),
            VTuple.New(OpName.Add, 50, 18),
            VTuple.New(OpName.Add, 48, 16),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Add, 13, 1),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Delete, 7, 12),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 11, 6),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 52, 7),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 31, 16),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 29, 1),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 55, 14),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 4, 18),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 26, 2),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Add, 47, 17),
            VTuple.New(OpName.Delete, 25, 6),
            VTuple.New(OpName.Delete, 40, 12),
            VTuple.New(OpName.Add, 16, 4),
            VTuple.New(OpName.Delete, 32, 19),
            VTuple.New(OpName.Add, 0, 13),
            VTuple.New(OpName.Add, 52, 5),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 55, 18),
            VTuple.New(OpName.Delete, 1, 19),
            VTuple.New(OpName.Add, 45, 11),
            VTuple.New(OpName.Redo, 0, 0),

I wrote a little function that just, at random removed one operation from actions and ran the test again. If it still triggered the error it kept the new, shorter, list and repeated the process. If it didn’t trigger the error it tried a different operation. It was horrendously slow as it was doing it in dumbest, most brute force way imaginable but..It spat out this:

using (var asm = new BoardModelTestAssemblage())
	var actions = new List<VTuple<OpName, int, int>>() {
		VTuple.New(OpName.Add, 39, 13),
		VTuple.New(OpName.Delete, 36, 7),
		VTuple.New(OpName.Undo, 0, 0),
		VTuple.New(OpName.Delete, 34, 15),
		VTuple.New(OpName.Undo, 0, 0),

Much better :D

I then simplified the values to make my job even easier and was able to find the bug itself within 15 minutes. It wasn’t one thats exciting to read about, just some simple iteration mistake when undoing deletes of tiles, BUT it was super cool that we can do this.

If you find this fun look up things like quickcheck to see how the pros do it :p

With that and some other bugs fixed I am back to data wrangling.

So in all, today went well. I wish I had got further but that’s just how it goes.

Seeya tomorrow

TaleSpire Dev Log 143

2020-01-09 23:58:13 +0000

Today I got a useful portion of the automated testing working. It generates lists of changes (add/delete/undo/redo) and applies them to:

  1. An instance of the actual board data-structure
  2. An instance of a simplified model

There is also a second instance of the actual board data-structure connected to the first via a dummy network interconnect. This means that once we have applied a randomly generated list of changes, we can compare all 3 to see that all the results line up.

This isn’t a substitute for other tests as all could be wrong in the same way, but it’s already allowed me to improve undo-redo and track down a few small bugs.

I don’t have any automated test case minimization yet, but so far, it’s been relatively easy to minimize the cases that came up.

Next, I need to look at how data is moved to the ‘inactive set’.

Each client has an undo/redo history with a fixed length (currently 50). When the history gets longer than 50 the history event is no longer reachable. This means it cannot be undone. It also means that we don’t necessarily need to store it in the same way as we do for ‘active’ asset tile & prop data. Having a different data set, with a potentially different layout, can let us have optimizations that only make sense on that set.

To be moved to the inactive set, we need to know that the data isn’t going to be modified by an undo/redo still in the active portion of the history. This evening I’ve been implementing the code to handle this. It’s not done yet, but I’m hoping to wrap that up tomorrow morning. I’ll then start testing longer streams of operations to make sure they behave correctly.

I’ll then be back on the bug hunt. Before Christmas, I saw a case where undo/redo started getting me some very odd results (big chunks of tiles missing). I’m not sure if the bug was in the data itself or the code handling the progressive spawning/deletion of the game objects. I’m still hoping the former as that’s way easier to test, but I’m not sure quite how the bugs I’ve fixed the last two days would account for that. Ah well, we’ll see soon enough.

Thanks all for tonight,