From the Burrow

TaleSpire Dev Log 164

2020-03-18 01:51:20 +0000

Heya folks,

Time for another little update on what I’m poking at.

This week I’m focusing on server-side work, and the first thing I needed to look at was the new server setup. Previously the servers themselves were in a private subnet with the load-balancer being the only thing that was public-facing. With the move the websockets, I’m no longer using Amazon’s load-balancer, and so now the servers are public-facing again.

I needed a way for messages to be delivered between servers as players might be connected to separate instances in the same AZ. This should be made easy by erlang’s distribution system. However, we also want to make sure that we don’t accidentally connect servers, which shouldn’t be connected.

The first answer to this is to use erlang’s “cookie” setting. The cookie simply stops and two erlang nodes connecting if their cookies don’t match. Knowing that we set the cookie in the settings file before we push the build to aws. We also generate a random name for the node at that point. It’s a bit clumsy but will do the job for now.

One wrinkle is that our REPL is just another Erlang node, and so it’s cookie must match the server’s. To enable this, I wrote some elisp to look at the setting file of the node we are connecting to and extract the details we needed to initiate the connection. With this done, we keep the ability to connect and quickly query details or push changes if required.

I’ve also spent a little time fixing bugs on the client but nothing of significant interest right now.

The next step is to set up the production database, get two production servers set up, and check that the messages being sent across erlbus are being delivered to both nodes. I’ll also need to set up something simple for node discovery. Hopefully, that won’t take long.

Just a small Corona related update to wrap up. I’m definitely starting to come down with something, but I have no idea how much it’s going to interfere with development yet. Hopefully, it won’t, but I’ll keep you posted.

I hope you are doing well,

Back soon with more

TaleSpire Dev Log 163

2020-03-07 00:02:23 +0000

Hi again folks!

Work is going well. For the last two days, I’ve been focusing on handling network failures and testing board sync.

Sometimes you just lose connection for a moment; in that case, we want to get you back into the game as fast as possible. It’s quite likely that, since the client dropped out, nothing significant on the board has changed. In those cases, we can quickly confirm we are still up to date, make a couple of small tweaks and carry on as if nothing happened.

If there have been changes to the state of the board that aren’t trivially to reconstruct, we simply reload the board, which will bring your client back to the state it should be.

Another thing that can happen when someone leaves is host migration. When you have multiple people playing in TaleSpire, one client is the ‘host’. The host does the book-keeping that decides the order that all changes to the board happen in. When that client leaves that job moves to another client.

However, what if there were messages that were sent to the host before it dropped out. Did they make it? If they did, have they made it back to the clients yet? How do we know when to check? Then, if we are the sender of a lost message, we need to undo the change it made. Getting that nailed down was another of today’s tasks. I have a first version in, but it could do with more testing.

I also found some smaller bugs in board sync, which are now fixed, and I squashed a few other little things on the way.

Next, I’m finishing off the changes to board-sync to make it handle failures more gracefully, and I also need to have a look at pasting of slabs saved as strings as I think that is misbehaving right now.

Until next time, Ciao!

p.s. I really was tempted to go into more detail about these fixes, but doing so would require a bunch more context, and I’d like to work on a couple more things before calling it a night.

TaleSpire Dev Log 162

2020-03-04 10:02:35 +0000

Howdy folks,

Alright, so yesterday I working on a few bugs in the code that handles the Unity GameObjects of the tiles (I’ll be referring to this as the ‘presentation’ as opposed the data-model which is separate).

I did a little work, so that undo/redo reuses portions of the List that holds the game objects. It wasn’t too tricky, but as the presentation adds/removes assets progressively over several frames, I had to take that into account.

I fixed a bug where rotated slabs weren’t be pasted correctly as it was considering the wrong zones for the paste. This was simply because, when considering what zones to paste to, I hadn’t rotated the bounds of the slab.

I fixed a few bugs relating to how, when we have updated the state of a script, we locate the asset so we can update the visuals. In the end, I’ve opted to have a per zone presentation hashmap. This is a bit annoying as I wanted to avoid yet maintaining another data structure if we could instead navigate to the asset another way (even if it was a bit less efficient); however, this will get us out the door.

Fixed a bug where progressive loading was skipping operations that it was meant to apply to the presentation. This was a dumb mistake of just spuriously advancing through the queue in two places. Probably a screw up during refactoring.

Added a bunch of asserts performing sanity checks on the data-model. These are only included in the debug build, so I get crashes nearer the cause of the issue, rather than always having to infer what screwed up.

There was also other stuff, but it gets too into the weeds to cover in a daily dev-log.

Today I’ll be going through a tonne of the code handling the network connections and board synchronization and trying to add sensible behaviors to the failure cases. For example, when we join and try to pull the board from another player:

  • what if they aren’t ready to sync
  • what if they are already uploading the board
  • what if they start, but then that fails
  • what if they never receive the message to sync.
  • etc

It needn’t be too fancy, but we need to have some logic dictating what to do, how many times to retry, and so on.

That’s all for now, Seeya

TaleSpire Dev Log 161

2020-03-02 00:05:24 +0000

Things have been going really well.

The rewrites to fix the fundamental bugs are done, and I’ve been chasing down lots of regular bugs, mistake, typos, and other fuckups :)

It feels really nice to be back in a place where each issue is somewhat contained. A bunch are, naturally, related changes for the bug fix; however, as I’ve tracked each one down, it’s been more “Oh, of course” or “Why am I such a muppet”, rather than “Ah shit”. PROGRESS!

My plan for this next week is twofold. Firstly I’m going to spend some time finding and squashing the most obvious bugs, and then I’ll look at getting merged into master.

It’s looking likely that Ree and I are going to be able to meet up again soon. That’s going to be a bunch of fun as we’ll get to work on a lot of very user-facing stuff, and we’ll be inching closer to the build we’ll ship for the beta. Can’t wait!

Ok, that’s all for now.

Have a good one.

TaleSpire Dev Log 160

2020-02-29 14:33:09 +0000

I forgot to write this last night, so here is the update from yesterday.

I’ve rewritten copy and paste. I don’t have proper test coverage of these yet so I will need to look into that. First though I want work through the obvious bugs that show up through play. This should keep me busy for a day.


TaleSpire Dev Log 159

2020-02-28 00:32:33 +0000

Today work progressed pretty well. I’ve updated the code that handles tile deletion and have moved over to fixing up tests. Copy and paste still need to be updated, but I’m more likely to find fundamental issues in the code I already have, so I definitely want to get that fixed up first.

I’m fixing up small mistakes from the refactor, and once that is done (and all current tests pass), I am going to start adding tests to try and find bugs when multiple builder’s operations interleave. Ideally, I want to use my automated testing code to find the bugs, but there are some limits here.

To perform automated tests, I made a very simplified implementation of the board, and it’s core operations, we call this the model. I then take the model, an instance of the actual board class, and a stream of random board operations. I apply the stream of operations to both the model and the board and then compare the state of each to ensure we got the same results.

The model, as mentioned, is very simplified. One simplification is that it doesn’t support multiple builders. This means that while it’s great for testing against one board, it’s not so useful for testing against two boards that are being manipulated by separate players.

We still can get some useful data, though. If we take a board and have two players add and delete things, it should be the same as if a single player performed those same actions in the same order. This means for those operations, we can use our model. However, undo/redo operate on a player’s own action history and so can’t be modeled the same way.

Regardless of the limitation, it’s still useful to use the automated tester, so I’m definitely going to implement the add/delete case mentioned above.

I should have that working tomorrow if all goes well.

Seeya then!

TaleSpire Dev Log 158

2020-02-26 17:11:37 +0000

Hi folks.

Today has gone well, I feel like the code that handles adding tiles and the undo/redo of ‘add tiles’ has taken shape well. Whilst I’m not stoked about increased complexity, I’m glad to see code shaping up in the right ways.

I’m going to start on the updated version of tile deletion tomorrow. It seems like it will be yield without too much issue, but I’m prepared for something to appear and confound me :)

I’m leaving paste for last. In theory, it shares a lot with the code that handles adding of assets, but previously there have been some tricky corner cases there. Time will tell.

This is one of those fixes that can’t be meaningfully tested until I update all the tile operations, so I’ll slog through it for now. I can’t wait to update the automated tests to scour this thing for bugs.

Seeya tomorrow

TaleSpire Dev Log 157

2020-02-25 23:52:41 +0000

A good time-adjusted evening to you all,

Today I’ve carried on with the bug-fixing. The day started out with more planning, as my initial plan didn’t cover some cases. I then spent the afternoon working on the code that handles adding assets. This seems to be going well so far.

Tomorrow I’ll continue with the ‘add assets’ code and then look at delete. I hope to finish that on Thursday and get onto copy/paste.


TaleSpire Dev Log 156

2020-02-24 19:31:11 +0000

It’s a nice quick update today.

I have set aside this week to work through the nasty bugs that delayed the release. I spent today wandering around my flat rambling about possible approaches, pausing to scribble furiously on a tablet, and then repeating those steps for 9 hours.

Between that and a few chats with Ree, I feel like I’ve got something that should work. Tomorrow I’ll start implementing and see if I’ve missed anything.

Seeya then!

TaleSpire Dev log 154

2020-02-22 12:40:24 +0000

Heya folks,

As planned, I started off Wednesday working on a bug, which meant that scripts were not getting unique ids. This manifested resulted in you clicking a door to open it, but instead, a chest in another room would open. Cute, but obviously broken :p

As I chased this down, I became more and more annoyed with how centralized all the state for these state-machines was. Originally I had started implementing the real-time scripts (ones that recompute their state every frame and arent synchronized over the network). There I want the data to be packed together so I could quickly iterate over large numbers of them. We didn’t need the real-time scripts to be ready for the beta launch, and so I had switched my focus to the state-machine scripts (used for doors, chests, etc). I reused a lot of code from the real-time scripts, and part of what we inherited was how their private data is stored. By storing this all together, syncing a zone also meant syncing this store. This worked but didn’t feel great. One result of this is that you end up having to download all the state for all the scripts in the whole board to be able to have the doors in one zone work properly.

As the delay has given us a little extra time, this became a great candidate for a refactor as it makes things better for the beta launch and also makes per-zone-sync (which is coming after beta launch) easier to implement.

If this has raised your eyebrows, that is fine. This is a slippery notion that a few of us were discussing on the discord the other day. It’s actually possible for a delay to make a second delay more likely as you now don’t have to ignore fundamental issues and just crank out something that works. You can suddenly address those issues, but in doing so, you will inevitably run into new edge cases, and bugs. Obviously, those new issues are now in your pile of things to do before the delayed release.

In this case, I’m hoping it’s worth it.

With the state for the scripts moved to the zones, I turned to creatures. We have two kinds of creatures in TaleSpire, normal and unique.

When a creature is made unique, it can be moved between boards in a campaign, and it has a little extra data associated with it. Traditionally uniques were the only creatures stored in the database; non-unqiues were saved into the board.

The issue with non-uniques being saved in the board is a question of how much needs to be synced. If their data is stored on the zone they are in, then simply rotating your character means you have to sync that whole zone (which may contain a thousand other tiles). If you decide to store creatures together outside of zones, then loading a single zone requires pulling all the data for all the creatures in the board, and there could be thousands of creatures in the whole board.

For now, we have instead moved non-unqiues to the database too. This lets us sync a single creature at a time and allows us to filter what we pull based on where they are in the world.

Being in the DB also gives us opportunities for tooling that could let GMs query info about creatures across the whole board without having to pull it all first.

Lastly, I finally spent the time to tweak the server and client, so that I can host the whole backend on my laptop and connect to it from Unity. This is awesome as I can iterate on both parts at the same time without having to push dev servers to AWS and wait on that. This cost me about a day but, in my opinion, it’s been totally worth it.

Right, now I’m going to go chill out for the weekend. Next week I’m planning to dedicate all my time to fix the bug in board sync that delayed the beta.

Seeya all on Monday!