From the Burrow

TaleSpire Dev Log 148

2020-01-18 13:05:45 +0000

I’ve been having a great time coding down at @Ree’s place, but it has meant I’ve neglected these updates. Let’s fix that.

We met up so we could collaborate on merging our feature branches together. Up until now, I’ve been focused on performance, and the data layer and @Ree focused on tooling. The goal isn’t to complete the merge, but instead to do any tasks that are easier when we are in the same room.

I’ve successfully dumped all of my code into the project, got it all building, test passing, etc. The managers for various systems are created now, but the building tools are still using the old code for now.

One massive source of complexity in the old version stemmed from our use of Photon, the 3rd party networking layer (which is great btw). By default, when a person leaves the game Photon automatically clears up all the networked objects they created[0]. This is no good for creatures (which were networked objects) as the other players still need to see them, so we disabled auto-delete of networked objects. This naturally meant we had to handle cleanup, which includes things like dice.

With the new system, we won’t ever have the whole board loaded at once, and this means some creatures won’t be loaded either. This also means we don’t want Photon to be helpful and make sure all networked objects are spawned as soon as someone joins the board. To deal with this, we removed the Photon network component from the creatures and gave a networked ‘handle’ to each player. This is attached to a creature when you select it and syncronizes the transform while the creature is held. On the other end, if the creature is loaded, it seems to behave exactly the same sa it did previously. If not, then no worries, once the creature is spawned, it will have the latest position will be sent anyway.

After a misguided prototype, I teamed up with @Ree to work out the handle api, and then everything came together very quickly.

This change actually lets us turn back on auto-delete in Photon and delete a bunch of cleanup code. Removing code is the best.

NOTE: The handle approach may seem super obvious, but previously there had been reasons that it wasn’t a good fit. However, those reasons are out of date now thankfully :)

On the subject of creatures, @Ree has been working on a new creature controller, which is much better suited for moving around these increasingly vertical scenes. It’s based on the excellent Kinematic Character Controller asset for Unity. It’s an impressively stable base to build upon [1]. Our creature control still looks the same (we haven’t switched to slidy tank controls or anything :p), but this lets us handle moving through scenes more reliably than before. You still have the ability to lift the piece and throw it over walls when needed.

With the simplification of the networking code, I looked into upgrading to the latest version of Photon. I got partway through the migration before I hit an undocumented change. The StreamBuffer class used to be a subclass of System.IO.Stream and in the latest version, it isn’t. We have a bunch of code that relies on it being a stream, and so all that will need to be rewritten. This was already underway, but we need to keep the codebase working while all that gets finished and hooked up. For now, that means that I have to give up and do it later. Lost a few hours to this but ah well.

Another thing I have been doing during this merge is ripping out code. Old fog-of-war, line-of-sight, board format upgrading, all that and more got ripped out. There is no point fighting things if they aren’t required to keep tooling development progressing. It also is making it easier to work with or refactor code that used to have to interact with those removed systems.

There has been plenty more than this going on. @Ree is changing how the board grid works; I’ve been working on build speed and assemblies… and so on and so on.

In all this is going very well. We are plowing through tricky stuff and are ready to get back to working separately again, albeit this time with much more frequent merging as we approach Beta.

I’m taking Sunday as a rest day, but I’ll be back on Monday with more dev news.

Ciao.

[0] strictly, it’s the ones they own, which can be different if ownership has been transferred. [1] also amazingly cheap for what it is.

TaleSpire Dev Log 147

2020-01-15 08:20:14 +0000

Hey again folks, another quick ‘kept of working’ post :)

Yesterday was spent working on Paste and Deletion of single tiles.

The latter is fun as we don’t keep unique ids for individual tiles as the memory usage adds up fast. This means we have to take some network-safe aspects to use to identify the tile. This is slower than looking up an id in a hashmap, of course, but given the frequency it happens the cost isn’t that egregious. The upside is not spending that memory, and not having to maintain whatever structures would have given us a direct lookup.

The above is why I’ve previously mentioned how late I left it to add the delete-single-tile feature to the new codebase, I wanted to be sure what data I would have available to work with. The less data we store per-tile, the better. So in the ideal case, we’d have very little to work with.

Whilst I’ve coded the hard, data-wrangling parts of paste, I’ve not finished the implementation as it requires hooking into the board-tool system in TaleSpire and @Ree has been writing on his branch. So that will get finished when we merge those branches.

Which leads us to right now. I’m on the train heading to @Ree’s so we can work on that merge. It’s gonna be a gnarly few days, but hopefully, we can get something working by Sunday.

One of the things that has let us work separately like this is that we have not worried about performance on the tooling branch and not worried about tooling on my data branch. We, of course, had to be communicating a lot to make sure neither of us were doing things that couldn’t be made fast/useful after the merge. So far, however, it’s been working pretty well. This means that once they are merged, I can move to start writing the optimized versions of the operations the tooling uses. And poor @Ree has to make my stuff feel nice :p

Alright, that’s the news for now

Ciao

p.s.

This didn’t fit above, but I wanted to mention it anyway. UnmanagedMemoryStream is cool!

It’s a c# type that takes a pointer and length in its constructor and gives you a Stream type that is compatible with tons of .net’s existing APIs. This has really come in handy when I’ve wanted to compress data from a NativeArray without unnecessary copying. Check it out!

TaleSpire Dev Log 146

2020-01-13 23:37:03 +0000

Hi again,

Today has mostly been spent on implementing ‘paste’. Although not finished, it is coming along well, and I think I’ve hit most of the surprises the implementation has for me.

I also think I’ve worked out some details to deleting single tiles that mean we can defer shifting data around until later. Amusingly (to me), I’ve left deleting of single tiles until now as it is, in some ways, more complex than deleting a selection of tiles. I’ll probably write a little more about that another day.

This is all rather vague, isn’t it? I’ve been sitting here thinking of more to say about it, but really it’s just been a day of coding and pondering.

Yup… that really is all for today.

Ciao!

TaleSpire Dev Log 145

2020-01-13 00:18:03 +0000

I didn’t write a log yesterday as I was really struggling with undo/redo histories, and it was just too painful!

In short, I was fighting with the behavior and then making sure the board was behaving the same as the simplified model we use for tests. I finally got that behaving this morning and so have turned my attention to copy/paste.

The short version is that I think I almost have Copy done, and so tomorrow morning, I’ll wrap that up and start implementing Paste. The main thing that takes time has been just working out how best to make it fast and then triple checking that it will behave correctly with our deterministic board update system.

The nice thing with how copy works is that you only need to send a fixed number of bytes of information across the wire for any size selection. As operations are applied in order, we send the selection and the point in the history the selection was made, and this results in the same tiles being selected on every client.

An annoying case, however, is pasting slabs of tiles from text strings (like you can find at TalesBazaar). It’s a really powerful way to share content, but all the tile data does have to be sent to each client. It’ll be alright though, it’s just a case of making it as pain-free as possible :)

Ok, I’m getting super tired, and I’m not convinced that what I’m writing is coherent, so I’m gonna get some sleep.

Goodnight folks.

TaleSpire Dev Log 144

2020-01-10 23:13:49 +0000

Today I found and fixed a few nasty bugs, two of which were caught by the automated test generation mentioned yesterday.

I started off by writing the code to handle the undo/redo history. When that looked like it was working I set the automated test generator to make sequences of board changes between 80 and 100 operations long, this very quickly resulted in a series of operations that caused an exception.

The trouble was that it took 90 operations to trigger the bug which likely would be a huge pain to step through, so I made the worlds dumbest test minimizer.

When I get an error in an automated test I have it dump out the sequence of ops as code I can paste as a new test. This is what it gave me

[Test]
public void ManualMix3()
{
    using (var asm = new BoardModelTestAssemblage())
    {
        var actions = new List<VTuple<OpName, int, int>>() {
            VTuple.New(OpName.Add, 13, 0),
            VTuple.New(OpName.Delete, 6, 17),
            VTuple.New(OpName.Add, 56, 1),
            VTuple.New(OpName.Add, 23, 4),
            VTuple.New(OpName.Add, 27, 13),
            VTuple.New(OpName.Delete, 3, 17),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 39, 13),
            VTuple.New(OpName.Delete, 36, 7),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 57, 10),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 34, 15),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 11, 17),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 38, 6),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 0, 19),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Add, 33, 2),
            VTuple.New(OpName.Delete, 45, 17),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Add, 5, 1),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 22, 6),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Delete, 38, 3),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 47, 3),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 3, 1),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 14, 12),
            VTuple.New(OpName.Delete, 50, 18),
            VTuple.New(OpName.Add, 13, 16),
            VTuple.New(OpName.Delete, 45, 9),
            VTuple.New(OpName.Add, 42, 12),
            VTuple.New(OpName.Delete, 2, 8),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Delete, 34, 10),
            VTuple.New(OpName.Add, 50, 18),
            VTuple.New(OpName.Add, 48, 16),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Add, 13, 1),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Delete, 7, 12),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 11, 6),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 52, 7),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 31, 16),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 29, 1),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 55, 14),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 4, 18),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 26, 2),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Add, 47, 17),
            VTuple.New(OpName.Delete, 25, 6),
            VTuple.New(OpName.Delete, 40, 12),
            VTuple.New(OpName.Add, 16, 4),
            VTuple.New(OpName.Delete, 32, 19),
            VTuple.New(OpName.Add, 0, 13),
            VTuple.New(OpName.Add, 52, 5),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 55, 18),
            VTuple.New(OpName.Delete, 1, 19),
            VTuple.New(OpName.Add, 45, 11),
            VTuple.New(OpName.Redo, 0, 0),
        };
        asm.ApplyActions(actions);
    }
}

I wrote a little function that just, at random removed one operation from actions and ran the test again. If it still triggered the error it kept the new, shorter, list and repeated the process. If it didn’t trigger the error it tried a different operation. It was horrendously slow as it was doing it in dumbest, most brute force way imaginable but..It spat out this:

using (var asm = new BoardModelTestAssemblage())
{
	var actions = new List<VTuple<OpName, int, int>>() {
		VTuple.New(OpName.Add, 39, 13),
		VTuple.New(OpName.Delete, 36, 7),
		VTuple.New(OpName.Undo, 0, 0),
		VTuple.New(OpName.Delete, 34, 15),
		VTuple.New(OpName.Undo, 0, 0),
	};
	asm.ApplyActions(actions);
}

Much better :D

I then simplified the values to make my job even easier and was able to find the bug itself within 15 minutes. It wasn’t one thats exciting to read about, just some simple iteration mistake when undoing deletes of tiles, BUT it was super cool that we can do this.

If you find this fun look up things like quickcheck to see how the pros do it :p

With that and some other bugs fixed I am back to data wrangling.

So in all, today went well. I wish I had got further but that’s just how it goes.

Seeya tomorrow

TaleSpire Dev Log 143

2020-01-09 23:58:13 +0000

Today I got a useful portion of the automated testing working. It generates lists of changes (add/delete/undo/redo) and applies them to:

  1. An instance of the actual board data-structure
  2. An instance of a simplified model

There is also a second instance of the actual board data-structure connected to the first via a dummy network interconnect. This means that once we have applied a randomly generated list of changes, we can compare all 3 to see that all the results line up.

This isn’t a substitute for other tests as all could be wrong in the same way, but it’s already allowed me to improve undo-redo and track down a few small bugs.

I don’t have any automated test case minimization yet, but so far, it’s been relatively easy to minimize the cases that came up.

Next, I need to look at how data is moved to the ‘inactive set’.

Each client has an undo/redo history with a fixed length (currently 50). When the history gets longer than 50 the history event is no longer reachable. This means it cannot be undone. It also means that we don’t necessarily need to store it in the same way as we do for ‘active’ asset tile & prop data. Having a different data set, with a potentially different layout, can let us have optimizations that only make sense on that set.

To be moved to the inactive set, we need to know that the data isn’t going to be modified by an undo/redo still in the active portion of the history. This evening I’ve been implementing the code to handle this. It’s not done yet, but I’m hoping to wrap that up tomorrow morning. I’ll then start testing longer streams of operations to make sure they behave correctly.

I’ll then be back on the bug hunt. Before Christmas, I saw a case where undo/redo started getting me some very odd results (big chunks of tiles missing). I’m not sure if the bug was in the data itself or the code handling the progressive spawning/deletion of the game objects. I’m still hoping the former as that’s way easier to test, but I’m not sure quite how the bugs I’ve fixed the last two days would account for that. Ah well, we’ll see soon enough.

Thanks all for tonight,

Ciao

TaleSpire Dev Log 142

2020-01-08 21:14:58 +0000

Today is a short but sweet one.

Ree has been hunting down bugs in the new building tools and I’ve been working on the automated test suite which we are using to validate the building code[0].

That and of course the pre-orders are out, so we have been keeping an eye on things there.

Righto, I’m gonna get some sleep.

Ciao!

[0] I’ll probably put a little more detailed info out on that another day

TaleSpire Dev Log 140

2020-01-03 14:44:39 +0000

Today I’ve continued on server work:

I rewrote the sign-in code to promote the connection to a secure websocket if sign-in succeeds.

I’ve started wiring up erlbus so that the client will subscribe to the campaign and board and receive messages broadcast by other clients or the server.

One interesting note with that is that when messing with erlbus from the repl you might run into a badrpc error when trying out ebus:messages. This was happening as ebus:messages uses rpc:multicall behind the scenes and it wasn’t finding the function on one of the nodes. The node with the issue was the one I was connecting my remote shell from as, naturally, that doesn’t have the app compiled. The quick fix is just to use -hidden in your erl arguments as multicall contacts [node() | nodes()] by default and when you are hidden you wont show up in nodes() unless that behavior is specifically requested.

That’s the lot for today (other than the usual misc bug fixes). I’m flying back to Norway on Monday so most of the day will be a write-off due to travel. I may get a little done on Sunday but tomorrow I’ll be visiting some friends I haven’t seen in a few years (which is gonna be great!).

Ciao!

TaleSpire Dev Log 139

2020-01-02 17:38:36 +0000

Over the break, I’ve put down the front-end code to begin getting a handle on the changes that will be made to the backend. Most of this has been focused on reading, here are some of the things I’ve been dipping into:

Reading things

  • https://erlang.org/doc/man/gen_event.html some of the http handling code got a bit centralized a was slowing me down when making changes. I hadn’t used gen_event’s before as I found them a bit confusing. They are somewhat different than gen_server and gen_fsm as you don’t implement the manager but instead just the callbacks. It’s pretty neat once you grok it, though.
  • http://blog.differentpla.net/blog/2014/11/07/erlang-sup-event/ one part of understanding gen_event is knowing how to include it in your supervision trees. This covers those bits that mostly seem left out of other tutorials.
  • https://github.com/cabol/erlbus/ erlbus is an important piece of the changes I’m making, and so I’ve been getting familiar with it’s approach. It also contains one of the more lucid examples of using websockets in erlang, which was helpful in other tests.
  • https://www.phoenixframework.org/blog/the-road-to-2-million-websocket-connections An inspiring read. Little bits and bobs scattered throughout that were useful. Also, it’s just nice to see what certain pieces we use can potentially deliver (although we naturally aren’t doing this).
  • https://github.com/uwiger/gproc/blob/master/doc/erlang07-wiger.pdf This paper describes the evolution and technical realities that led to gproc which is something I intend to use in place of the standard global (across multiple erlang nodes) process registry.
  • http://erlang.org/doc/apps/stdlib/stdlib.pdf this naturally is WIP. Skimming the standard lib of any language is a great way to find out things you didn’t know were there, avoiding redundant work is nearly always a blessing
  • https://www.amazon.com/WebRTC-Cookbook-Andrii-Sergiienko/dp/1783284455 I’ve been skimming bits of this again as it’s likely to form part of the first implementation of the p2p voice & video chat
  • https://ninenines.eu/docs/ dear god ninenines is the best. Cowboy, gun, ranch. All fantastic quality, super robust and well documented. I have no idea how I’d function without their stuff.

Making things

As we use Photon for realtime networking in TaleSpire, we only need to focus on lower frequency events, such as persistence, for now. The alpha used a hacked together REST’ish api, which did the job but has all the expected issues (e.g. having to poll for changes). We are switching to using websockets for the Beta as it’s a relatively incremental step, will let us overcome some shortcomings, and still makes sense given the scale we will be at. It, too, will be a target for replacement in the future, but that is a subject for another day.

All of the server api the game uses is described by a data-structure that we then generate erlang & c# code from. I’ve updated the generator to create websocket handlers rather than the http/s ones. In doing so I’ve also been undoing some of the code that centralized some of the http management.

The last thing I did was update the code that polls to see which domain it is at before attempting to pick the right certificates and start serving over https.

The next step will be to rewrite the session handling code as it currently assumes a stateless connection.

Until next time,

Peace.

TaleSpire Dev Log 138

2019-12-06 09:09:52 +0000

I’m currently writing this on the train home,

I’ve been down working with Jonny for the last time (in person) this year, and it’s been a great few days. We made a bunch of progress in different places, so lets natter about that.

Tiles

First off, we were looking at tile placement and control. We may have found a nice change which could help with some tricky cases our alpha testers would get in when working with walls. My ultra vague language is due to it being so early in the prototype phase that we are not ready to talk about it yet. This is not the first time in the last 4 months when we have had a fix that play revealed to be worse than the initial problem. When we are more confident, we’ll do a full writeup (or maybe a short video) to go through the different things (Just placing tiles has surprising nuances to it :D).

Meatier News

As that is all arm waving and no substance, here is some real news: You will be able to place creature miniatures off-grid. This has been play-tested for a while internally now and, although it does introduce some challenges where it comes to UI/messaging, it does feel pretty cool. For those who have wrangled narrow corridors in the alpha, this should feel rather freeing :p

Do note that we are still keeping the tiles to the grid, however. Without this limitation, the worst-case complexity for various systems becomes totally unmanageable. Remember that doubling the length (or resolution) of the side of a cube increases the volume 8 times. So if you have anything that operates over the volume like say, fog of war or line of sight, your day would have gotten much worse (I know, I go on about this in almost every post).

The Right Questions

I also think I know how cross-zone pathfinding will work now. I’ve been struggling a little with how to maintain a single graph as zones are loaded in and out. Jonny did me the simple favor of asking if it could be computed on demand. Building the nav-mesh from scratch each time a creature is picked up sounds like a lot of work, but it actually reduces the complexity a lot and opens up some opportunities for parallelization. We just need to keep the per-zone input data sorted in a way that helps this on-demand process as much as possible, and that bit is easier. More on this when I have tested it.

Monster Branch Hunter

We have also started the long process of getting everything merged into master. The part of this that required us to work closely was moving to Unity assemblies for the core project as suddenly their ‘magic folders’ stop working[0]. This means reorganizing the whole codebase, which wasn’t too bad[1]. However, this can make for seriously ugly merge conflicts and so it was best to stop development for a couple of hours, get the change merged in, and test on both our machines.

Testing

I’ve also started working on another approach to ironing out bugs from the board code. It’s very much inspired by my uninformed scanning of property-based testing. We make a dirt-simple but accurate model of the code we wish to test, and then we generate random streams of operations that are applied to both the real version and the model, and we compare the results. The model gets to ignore all details of multithreading, being performant, etc. Its only job is to be understandable and give the behavior and results intended from the real version. In previous tests, I had already made a fake network interconnect so I could apply operations to one board and make sure the other board ended up with exactly the same result after sync so now I can wire things up like this

    [Randomly generated board operations]
             /              \
            |                |
            v                v
    [Simple Model]     [Real Board 0] <--fake network interconnect--> [Real Board 1] 

We feed the model and ‘Real Board 0’ the same operations, but then compare the results for all three boards. This gives us a pretty decent amount of testing for almost no additional work.

I will stress that this isn’t property-based testing. I want to learn it, but I don’t have the time to take on a new technique right now. However, I know that even this limited approach can give good results, and I think I’ll be able to add some very basic test-case minimization too.

Christmas Approaches

And with that, it’s almost the end of the year. There is more to come, but I’m going to posting a bit less until January as I’ve managed to schedule everything to happen in one month, and apparently it’s arrived. For the next week, I’m on holiday, and then I’ll be heading to the UK for Christmas. During my time in the UK I’ll be working some evenings on server stuff, so expect a very significant shift in content then!

I hope this finds you well folks.

Peace.

[0] by default, Unity automatically makes assemblies that separate game and editor code based on special folder names. It’s handy but naturally goes away when you decide to handle assemblies yourself.

[1] There are a bunch of ways to layout your project, but we found it simplest to make 2 folders, ‘Runtime’ and ‘Editors’, and move all game code to the former and all the editor code to the latter. The directory structure on both sides is mirrored. We have a little helper menu for writing the boilerplate code for inspector editors, so we updated that to respect the new structure and added a right-click menu entry to jump to the editor folder if it existed.