From the Burrow

TaleSpire Dev Log 147

2020-01-15 08:20:14 +0000

Hey again folks, another quick ‘kept of working’ post :)

Yesterday was spent working on Paste and Deletion of single tiles.

The latter is fun as we don’t keep unique ids for individual tiles as the memory usage adds up fast. This means we have to take some network-safe aspects to use to identify the tile. This is slower than looking up an id in a hashmap, of course, but given the frequency it happens the cost isn’t that egregious. The upside is not spending that memory, and not having to maintain whatever structures would have given us a direct lookup.

The above is why I’ve previously mentioned how late I left it to add the delete-single-tile feature to the new codebase, I wanted to be sure what data I would have available to work with. The less data we store per-tile, the better. So in the ideal case, we’d have very little to work with.

Whilst I’ve coded the hard, data-wrangling parts of paste, I’ve not finished the implementation as it requires hooking into the board-tool system in TaleSpire and @Ree has been writing on his branch. So that will get finished when we merge those branches.

Which leads us to right now. I’m on the train heading to @Ree’s so we can work on that merge. It’s gonna be a gnarly few days, but hopefully, we can get something working by Sunday.

One of the things that has let us work separately like this is that we have not worried about performance on the tooling branch and not worried about tooling on my data branch. We, of course, had to be communicating a lot to make sure neither of us were doing things that couldn’t be made fast/useful after the merge. So far, however, it’s been working pretty well. This means that once they are merged, I can move to start writing the optimized versions of the operations the tooling uses. And poor @Ree has to make my stuff feel nice :p

Alright, that’s the news for now

Ciao

p.s.

This didn’t fit above, but I wanted to mention it anyway. UnmanagedMemoryStream is cool!

It’s a c# type that takes a pointer and length in its constructor and gives you a Stream type that is compatible with tons of .net’s existing APIs. This has really come in handy when I’ve wanted to compress data from a NativeArray without unnecessary copying. Check it out!

TaleSpire Dev Log 146

2020-01-13 23:37:03 +0000

Hi again,

Today has mostly been spent on implementing ‘paste’. Although not finished, it is coming along well, and I think I’ve hit most of the surprises the implementation has for me.

I also think I’ve worked out some details to deleting single tiles that mean we can defer shifting data around until later. Amusingly (to me), I’ve left deleting of single tiles until now as it is, in some ways, more complex than deleting a selection of tiles. I’ll probably write a little more about that another day.

This is all rather vague, isn’t it? I’ve been sitting here thinking of more to say about it, but really it’s just been a day of coding and pondering.

Yup… that really is all for today.

Ciao!

TaleSpire Dev Log 145

2020-01-13 00:18:03 +0000

I didn’t write a log yesterday as I was really struggling with undo/redo histories, and it was just too painful!

In short, I was fighting with the behavior and then making sure the board was behaving the same as the simplified model we use for tests. I finally got that behaving this morning and so have turned my attention to copy/paste.

The short version is that I think I almost have Copy done, and so tomorrow morning, I’ll wrap that up and start implementing Paste. The main thing that takes time has been just working out how best to make it fast and then triple checking that it will behave correctly with our deterministic board update system.

The nice thing with how copy works is that you only need to send a fixed number of bytes of information across the wire for any size selection. As operations are applied in order, we send the selection and the point in the history the selection was made, and this results in the same tiles being selected on every client.

An annoying case, however, is pasting slabs of tiles from text strings (like you can find at TalesBazaar). It’s a really powerful way to share content, but all the tile data does have to be sent to each client. It’ll be alright though, it’s just a case of making it as pain-free as possible :)

Ok, I’m getting super tired, and I’m not convinced that what I’m writing is coherent, so I’m gonna get some sleep.

Goodnight folks.

TaleSpire Dev Log 144

2020-01-10 23:13:49 +0000

Today I found and fixed a few nasty bugs, two of which were caught by the automated test generation mentioned yesterday.

I started off by writing the code to handle the undo/redo history. When that looked like it was working I set the automated test generator to make sequences of board changes between 80 and 100 operations long, this very quickly resulted in a series of operations that caused an exception.

The trouble was that it took 90 operations to trigger the bug which likely would be a huge pain to step through, so I made the worlds dumbest test minimizer.

When I get an error in an automated test I have it dump out the sequence of ops as code I can paste as a new test. This is what it gave me

[Test]
public void ManualMix3()
{
    using (var asm = new BoardModelTestAssemblage())
    {
        var actions = new List<VTuple<OpName, int, int>>() {
            VTuple.New(OpName.Add, 13, 0),
            VTuple.New(OpName.Delete, 6, 17),
            VTuple.New(OpName.Add, 56, 1),
            VTuple.New(OpName.Add, 23, 4),
            VTuple.New(OpName.Add, 27, 13),
            VTuple.New(OpName.Delete, 3, 17),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 39, 13),
            VTuple.New(OpName.Delete, 36, 7),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 57, 10),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 34, 15),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 11, 17),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 38, 6),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 0, 19),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Add, 33, 2),
            VTuple.New(OpName.Delete, 45, 17),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Add, 5, 1),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 22, 6),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Delete, 38, 3),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 47, 3),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 3, 1),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 14, 12),
            VTuple.New(OpName.Delete, 50, 18),
            VTuple.New(OpName.Add, 13, 16),
            VTuple.New(OpName.Delete, 45, 9),
            VTuple.New(OpName.Add, 42, 12),
            VTuple.New(OpName.Delete, 2, 8),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Delete, 34, 10),
            VTuple.New(OpName.Add, 50, 18),
            VTuple.New(OpName.Add, 48, 16),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Add, 13, 1),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Delete, 7, 12),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 11, 6),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 52, 7),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 31, 16),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 29, 1),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 55, 14),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 4, 18),
            VTuple.New(OpName.Redo, 0, 0),
            VTuple.New(OpName.Add, 26, 2),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Add, 47, 17),
            VTuple.New(OpName.Delete, 25, 6),
            VTuple.New(OpName.Delete, 40, 12),
            VTuple.New(OpName.Add, 16, 4),
            VTuple.New(OpName.Delete, 32, 19),
            VTuple.New(OpName.Add, 0, 13),
            VTuple.New(OpName.Add, 52, 5),
            VTuple.New(OpName.Undo, 0, 0),
            VTuple.New(OpName.Delete, 55, 18),
            VTuple.New(OpName.Delete, 1, 19),
            VTuple.New(OpName.Add, 45, 11),
            VTuple.New(OpName.Redo, 0, 0),
        };
        asm.ApplyActions(actions);
    }
}

I wrote a little function that just, at random removed one operation from actions and ran the test again. If it still triggered the error it kept the new, shorter, list and repeated the process. If it didn’t trigger the error it tried a different operation. It was horrendously slow as it was doing it in dumbest, most brute force way imaginable but..It spat out this:

using (var asm = new BoardModelTestAssemblage())
{
	var actions = new List<VTuple<OpName, int, int>>() {
		VTuple.New(OpName.Add, 39, 13),
		VTuple.New(OpName.Delete, 36, 7),
		VTuple.New(OpName.Undo, 0, 0),
		VTuple.New(OpName.Delete, 34, 15),
		VTuple.New(OpName.Undo, 0, 0),
	};
	asm.ApplyActions(actions);
}

Much better :D

I then simplified the values to make my job even easier and was able to find the bug itself within 15 minutes. It wasn’t one thats exciting to read about, just some simple iteration mistake when undoing deletes of tiles, BUT it was super cool that we can do this.

If you find this fun look up things like quickcheck to see how the pros do it :p

With that and some other bugs fixed I am back to data wrangling.

So in all, today went well. I wish I had got further but that’s just how it goes.

Seeya tomorrow

TaleSpire Dev Log 143

2020-01-09 23:58:13 +0000

Today I got a useful portion of the automated testing working. It generates lists of changes (add/delete/undo/redo) and applies them to:

  1. An instance of the actual board data-structure
  2. An instance of a simplified model

There is also a second instance of the actual board data-structure connected to the first via a dummy network interconnect. This means that once we have applied a randomly generated list of changes, we can compare all 3 to see that all the results line up.

This isn’t a substitute for other tests as all could be wrong in the same way, but it’s already allowed me to improve undo-redo and track down a few small bugs.

I don’t have any automated test case minimization yet, but so far, it’s been relatively easy to minimize the cases that came up.

Next, I need to look at how data is moved to the ‘inactive set’.

Each client has an undo/redo history with a fixed length (currently 50). When the history gets longer than 50 the history event is no longer reachable. This means it cannot be undone. It also means that we don’t necessarily need to store it in the same way as we do for ‘active’ asset tile & prop data. Having a different data set, with a potentially different layout, can let us have optimizations that only make sense on that set.

To be moved to the inactive set, we need to know that the data isn’t going to be modified by an undo/redo still in the active portion of the history. This evening I’ve been implementing the code to handle this. It’s not done yet, but I’m hoping to wrap that up tomorrow morning. I’ll then start testing longer streams of operations to make sure they behave correctly.

I’ll then be back on the bug hunt. Before Christmas, I saw a case where undo/redo started getting me some very odd results (big chunks of tiles missing). I’m not sure if the bug was in the data itself or the code handling the progressive spawning/deletion of the game objects. I’m still hoping the former as that’s way easier to test, but I’m not sure quite how the bugs I’ve fixed the last two days would account for that. Ah well, we’ll see soon enough.

Thanks all for tonight,

Ciao

TaleSpire Dev Log 142

2020-01-08 21:14:58 +0000

Today is a short but sweet one.

Ree has been hunting down bugs in the new building tools and I’ve been working on the automated test suite which we are using to validate the building code[0].

That and of course the pre-orders are out, so we have been keeping an eye on things there.

Righto, I’m gonna get some sleep.

Ciao!

[0] I’ll probably put a little more detailed info out on that another day

TaleSpire Dev Log 140

2020-01-03 14:44:39 +0000

Today I’ve continued on server work:

I rewrote the sign-in code to promote the connection to a secure websocket if sign-in succeeds.

I’ve started wiring up erlbus so that the client will subscribe to the campaign and board and receive messages broadcast by other clients or the server.

One interesting note with that is that when messing with erlbus from the repl you might run into a badrpc error when trying out ebus:messages. This was happening as ebus:messages uses rpc:multicall behind the scenes and it wasn’t finding the function on one of the nodes. The node with the issue was the one I was connecting my remote shell from as, naturally, that doesn’t have the app compiled. The quick fix is just to use -hidden in your erl arguments as multicall contacts [node() | nodes()] by default and when you are hidden you wont show up in nodes() unless that behavior is specifically requested.

That’s the lot for today (other than the usual misc bug fixes). I’m flying back to Norway on Monday so most of the day will be a write-off due to travel. I may get a little done on Sunday but tomorrow I’ll be visiting some friends I haven’t seen in a few years (which is gonna be great!).

Ciao!

TaleSpire Dev Log 139

2020-01-02 17:38:36 +0000

Over the break, I’ve put down the front-end code to begin getting a handle on the changes that will be made to the backend. Most of this has been focused on reading, here are some of the things I’ve been dipping into:

Reading things

  • https://erlang.org/doc/man/gen_event.html some of the http handling code got a bit centralized a was slowing me down when making changes. I hadn’t used gen_event’s before as I found them a bit confusing. They are somewhat different than gen_server and gen_fsm as you don’t implement the manager but instead just the callbacks. It’s pretty neat once you grok it, though.
  • http://blog.differentpla.net/blog/2014/11/07/erlang-sup-event/ one part of understanding gen_event is knowing how to include it in your supervision trees. This covers those bits that mostly seem left out of other tutorials.
  • https://github.com/cabol/erlbus/ erlbus is an important piece of the changes I’m making, and so I’ve been getting familiar with it’s approach. It also contains one of the more lucid examples of using websockets in erlang, which was helpful in other tests.
  • https://www.phoenixframework.org/blog/the-road-to-2-million-websocket-connections An inspiring read. Little bits and bobs scattered throughout that were useful. Also, it’s just nice to see what certain pieces we use can potentially deliver (although we naturally aren’t doing this).
  • https://github.com/uwiger/gproc/blob/master/doc/erlang07-wiger.pdf This paper describes the evolution and technical realities that led to gproc which is something I intend to use in place of the standard global (across multiple erlang nodes) process registry.
  • http://erlang.org/doc/apps/stdlib/stdlib.pdf this naturally is WIP. Skimming the standard lib of any language is a great way to find out things you didn’t know were there, avoiding redundant work is nearly always a blessing
  • https://www.amazon.com/WebRTC-Cookbook-Andrii-Sergiienko/dp/1783284455 I’ve been skimming bits of this again as it’s likely to form part of the first implementation of the p2p voice & video chat
  • https://ninenines.eu/docs/ dear god ninenines is the best. Cowboy, gun, ranch. All fantastic quality, super robust and well documented. I have no idea how I’d function without their stuff.

Making things

As we use Photon for realtime networking in TaleSpire, we only need to focus on lower frequency events, such as persistence, for now. The alpha used a hacked together REST’ish api, which did the job but has all the expected issues (e.g. having to poll for changes). We are switching to using websockets for the Beta as it’s a relatively incremental step, will let us overcome some shortcomings, and still makes sense given the scale we will be at. It, too, will be a target for replacement in the future, but that is a subject for another day.

All of the server api the game uses is described by a data-structure that we then generate erlang & c# code from. I’ve updated the generator to create websocket handlers rather than the http/s ones. In doing so I’ve also been undoing some of the code that centralized some of the http management.

The last thing I did was update the code that polls to see which domain it is at before attempting to pick the right certificates and start serving over https.

The next step will be to rewrite the session handling code as it currently assumes a stateless connection.

Until next time,

Peace.

TaleSpire Dev Log 138

2019-12-06 09:09:52 +0000

I’m currently writing this on the train home,

I’ve been down working with Jonny for the last time (in person) this year, and it’s been a great few days. We made a bunch of progress in different places, so lets natter about that.

Tiles

First off, we were looking at tile placement and control. We may have found a nice change which could help with some tricky cases our alpha testers would get in when working with walls. My ultra vague language is due to it being so early in the prototype phase that we are not ready to talk about it yet. This is not the first time in the last 4 months when we have had a fix that play revealed to be worse than the initial problem. When we are more confident, we’ll do a full writeup (or maybe a short video) to go through the different things (Just placing tiles has surprising nuances to it :D).

Meatier News

As that is all arm waving and no substance, here is some real news: You will be able to place creature miniatures off-grid. This has been play-tested for a while internally now and, although it does introduce some challenges where it comes to UI/messaging, it does feel pretty cool. For those who have wrangled narrow corridors in the alpha, this should feel rather freeing :p

Do note that we are still keeping the tiles to the grid, however. Without this limitation, the worst-case complexity for various systems becomes totally unmanageable. Remember that doubling the length (or resolution) of the side of a cube increases the volume 8 times. So if you have anything that operates over the volume like say, fog of war or line of sight, your day would have gotten much worse (I know, I go on about this in almost every post).

The Right Questions

I also think I know how cross-zone pathfinding will work now. I’ve been struggling a little with how to maintain a single graph as zones are loaded in and out. Jonny did me the simple favor of asking if it could be computed on demand. Building the nav-mesh from scratch each time a creature is picked up sounds like a lot of work, but it actually reduces the complexity a lot and opens up some opportunities for parallelization. We just need to keep the per-zone input data sorted in a way that helps this on-demand process as much as possible, and that bit is easier. More on this when I have tested it.

Monster Branch Hunter

We have also started the long process of getting everything merged into master. The part of this that required us to work closely was moving to Unity assemblies for the core project as suddenly their ‘magic folders’ stop working[0]. This means reorganizing the whole codebase, which wasn’t too bad[1]. However, this can make for seriously ugly merge conflicts and so it was best to stop development for a couple of hours, get the change merged in, and test on both our machines.

Testing

I’ve also started working on another approach to ironing out bugs from the board code. It’s very much inspired by my uninformed scanning of property-based testing. We make a dirt-simple but accurate model of the code we wish to test, and then we generate random streams of operations that are applied to both the real version and the model, and we compare the results. The model gets to ignore all details of multithreading, being performant, etc. Its only job is to be understandable and give the behavior and results intended from the real version. In previous tests, I had already made a fake network interconnect so I could apply operations to one board and make sure the other board ended up with exactly the same result after sync so now I can wire things up like this

    [Randomly generated board operations]
             /              \
            |                |
            v                v
    [Simple Model]     [Real Board 0] <--fake network interconnect--> [Real Board 1] 

We feed the model and ‘Real Board 0’ the same operations, but then compare the results for all three boards. This gives us a pretty decent amount of testing for almost no additional work.

I will stress that this isn’t property-based testing. I want to learn it, but I don’t have the time to take on a new technique right now. However, I know that even this limited approach can give good results, and I think I’ll be able to add some very basic test-case minimization too.

Christmas Approaches

And with that, it’s almost the end of the year. There is more to come, but I’m going to posting a bit less until January as I’ve managed to schedule everything to happen in one month, and apparently it’s arrived. For the next week, I’m on holiday, and then I’ll be heading to the UK for Christmas. During my time in the UK I’ll be working some evenings on server stuff, so expect a very significant shift in content then!

I hope this finds you well folks.

Peace.

[0] by default, Unity automatically makes assemblies that separate game and editor code based on special folder names. It’s handy but naturally goes away when you decide to handle assemblies yourself.

[1] There are a bunch of ways to layout your project, but we found it simplest to make 2 folders, ‘Runtime’ and ‘Editors’, and move all game code to the former and all the editor code to the latter. The directory structure on both sides is mirrored. We have a little helper menu for writing the boilerplate code for inspector editors, so we updated that to respect the new structure and added a right-click menu entry to jump to the editor folder if it existed.

TaleSpire Dev Log 137

2019-12-04 16:03:31 +0000

Hey! I best recap the last two days before even more get away from me.

Monday

Monday was spent debugging some of my uses of unsafe code. Unity’s Job system has some great tools to detect race conditions, but they also let you disable it if you have a use case that demands it. This is a wonderful attitude, and I’m very grateful for that. However, boy can I corrupt some memory with this stuff.

The first thing was debugging a new NativeCollection I had made. This is a Stack with a fixed max size that can have multiple concurrent writers or readers (but not both at the same time). I use this to hand out and return large numbers of Ids to a limited resource, and to do so from concurrent jobs.

The id, in one case, is an index into a big array that stores state that is later pushed to the GPU for use in the shaders. Each visible tile has one entry in this array and gives that spot back when their presentation is destroyed. This can happen with large numbers of tiles across multiple zones simultaneously.

Because each tile gets a unique index into that array, we know that they will be the only thing accessing it, so we don’t want Unity to protect us from race conditions that would occur otherwise[0]. This means we add the [NativeDisableContainerSafetyRestriction] attribute.

That, of course, allows you to really make a mess, and sure enough, I did :p.

In this case, it wasn’t due to that attribute, however. I was using MemCpyReplicate to set the default shader state across a portion of the array and I may have incremented the pointer to the strut rather than the pointer into the array. So I happily tried to copy invalid portions of memory into the array.

I was actually super lucky that this caused Unity itself to crash very consistently. These kinds of bugs are horrifying if they stay under the radar.

With that done, all tests passed again, and I got back to work.

Tuesday -> Wednesday morning

Tuesday started with traveling to visit @Ree. It’s always great when we get to do this as certain kinds of tasks are so much easier.

In a related subject.. yesterday I was working on the line of sight shader :)

This starts with rendering the scene into a cubemap from the head of the creature you have just placed. Each thing rendered is colored based on an id (zero for all occluders and >0 for creatures or potentially other points of interest). We then run a compute shader to sample the cubemap and aggregate which ids are in there.

To start with, we want to store 32bit ids, which meant using a floating-point texture format. This seemed to work, but every time I read the value from the compute shader, the value was always clamped between 0 and 1.

It’s my first time doing this in Unity, so I spent ages trying to find out what I’d done wrong until I gave up and ran renderdoc on it. I should remember to do this first as it turned out Unity was rendering the faces in non-float textures and then copying the data over :

RenderToCubemap doesn’t seem to warn about this, but @Ree got me on the right path by saying we should try a RenderTexture instead. RenderToCubemap has an overload for this, and there is even this interesting line in the docstring for the overload that takes the cubemap directly

If you want a realtime-updated cubemap, use RenderToCubemap variant that uses a RenderTexture with a cubemap dimension, see below.

I’m guessing that this is a hint to the intermediate copies (and maybe a temporary FBO too?).

After moving over to that, I started getting values higher than one out of the texture YAY! The values are wrong, but frankly, I don’t care about it today. The pieces of the puzzle have all been started, and I can work on those from home. It’s best to use this time I have over here to touch on as many other things as possible.

The rest of today

There have been more chats working out details of performance around tile hide and cutaway and much musing on how to merge out branches in a way that doesn’t slow us down.

That’s my next task, to update the main dev branch to use Unity’s assemblies and begin moving in the new data model code.

We have to be very careful though as it’s soon Christmas and we really mustn’t block either us from being able to work as we won’t be available to help each other during that time.

That’s all for now. Seeya!

[0] We aren’t, in this case, too concerned about false sharing or other forms of contention that may hurt us. This will get more of a review later, however.

Mastodon