Today work progressed pretty well. I’ve updated the code that handles tile deletion and have moved over to fixing up tests. Copy and paste still need to be updated, but I’m more likely to find fundamental issues in the code I already have, so I definitely want to get that fixed up first.
I’m fixing up small mistakes from the refactor, and once that is done (and all current tests pass), I am going to start adding tests to try and find bugs when multiple builder’s operations interleave. Ideally, I want to use my automated testing code to find the bugs, but there are some limits here.
To perform automated tests, I made a very simplified implementation of the board, and it’s core operations, we call this the model. I then take the model, an instance of the actual board class, and a stream of random board operations. I apply the stream of operations to both the model and the board and then compare the state of each to ensure we got the same results.
The model, as mentioned, is very simplified. One simplification is that it doesn’t support multiple builders. This means that while it’s great for testing against one board, it’s not so useful for testing against two boards that are being manipulated by separate players.
We still can get some useful data, though. If we take a board and have two players add and delete things, it should be the same as if a single player performed those same actions in the same order. This means for those operations, we can use our model. However, undo/redo operate on a player’s own action history and so can’t be modeled the same way.
Regardless of the limitation, it’s still useful to use the automated tester, so I’m definitely going to implement the add/delete case mentioned above.
I should have that working tomorrow if all goes well.
Today has gone well, I feel like the code that handles adding tiles and the undo/redo of ‘add tiles’ has taken shape well. Whilst I’m not stoked about increased complexity, I’m glad to see code shaping up in the right ways.
I’m going to start on the updated version of tile deletion tomorrow. It seems like it will be yield without too much issue, but I’m prepared for something to appear and confound me :)
I’m leaving paste for last. In theory, it shares a lot with the code that handles adding of assets, but previously there have been some tricky corner cases there. Time will tell.
This is one of those fixes that can’t be meaningfully tested until I update all the tile operations, so I’ll slog through it for now. I can’t wait to update the automated tests to scour this thing for bugs.
A good time-adjusted evening to you all,
Today I’ve carried on with the bug-fixing. The day started out with more planning, as my initial plan didn’t cover some cases. I then spent the afternoon working on the code that handles adding assets. This seems to be going well so far.
Tomorrow I’ll continue with the ‘add assets’ code and then look at delete. I hope to finish that on Thursday and get onto copy/paste.
It’s a nice quick update today.
I have set aside this week to work through the nasty bugs that delayed the release. I spent today wandering around my flat rambling about possible approaches, pausing to scribble furiously on a tablet, and then repeating those steps for 9 hours.
Between that and a few chats with Ree, I feel like I’ve got something that should work. Tomorrow I’ll start implementing and see if I’ve missed anything.
As planned, I started off Wednesday working on a bug, which meant that scripts were not getting unique ids. This manifested resulted in you clicking a door to open it, but instead, a chest in another room would open. Cute, but obviously broken :p
As I chased this down, I became more and more annoyed with how centralized all the state for these state-machines was. Originally I had started implementing the real-time scripts (ones that recompute their state every frame and arent synchronized over the network). There I want the data to be packed together so I could quickly iterate over large numbers of them. We didn’t need the real-time scripts to be ready for the beta launch, and so I had switched my focus to the state-machine scripts (used for doors, chests, etc). I reused a lot of code from the real-time scripts, and part of what we inherited was how their private data is stored. By storing this all together, syncing a zone also meant syncing this store. This worked but didn’t feel great. One result of this is that you end up having to download all the state for all the scripts in the whole board to be able to have the doors in one zone work properly.
As the delay has given us a little extra time, this became a great candidate for a refactor as it makes things better for the beta launch and also makes per-zone-sync (which is coming after beta launch) easier to implement.
If this has raised your eyebrows, that is fine. This is a slippery notion that a few of us were discussing on the discord the other day. It’s actually possible for a delay to make a second delay more likely as you now don’t have to ignore fundamental issues and just crank out something that works. You can suddenly address those issues, but in doing so, you will inevitably run into new edge cases, and bugs. Obviously, those new issues are now in your pile of things to do before the delayed release.
In this case, I’m hoping it’s worth it.
With the state for the scripts moved to the zones, I turned to creatures. We have two kinds of creatures in TaleSpire, normal and unique.
When a creature is made unique, it can be moved between boards in a campaign, and it has a little extra data associated with it. Traditionally uniques were the only creatures stored in the database; non-unqiues were saved into the board.
The issue with non-uniques being saved in the board is a question of how much needs to be synced. If their data is stored on the zone they are in, then simply rotating your character means you have to sync that whole zone (which may contain a thousand other tiles). If you decide to store creatures together outside of zones, then loading a single zone requires pulling all the data for all the creatures in the board, and there could be thousands of creatures in the whole board.
For now, we have instead moved non-unqiues to the database too. This lets us sync a single creature at a time and allows us to filter what we pull based on where they are in the world.
Being in the DB also gives us opportunities for tooling that could let GMs query info about creatures across the whole board without having to pull it all first.
Lastly, I finally spent the time to tweak the server and client, so that I can host the whole backend on my laptop and connect to it from Unity. This is awesome as I can iterate on both parts at the same time without having to push dev servers to AWS and wait on that. This cost me about a day but, in my opinion, it’s been totally worth it.
Right, now I’m going to go chill out for the weekend. Next week I’m planning to dedicate all my time to fix the bug in board sync that delayed the beta.
Seeya all on Monday!
Since the end of last week, my focus has been looking into unknowns on the code side of the project. To that end, I’ve jumped over to the server-side of things.
The first task change was that, in the alpha, all the communication with our server (which only handle persistence) was over https. This means that polling was used in some places, and overall it was too slow. The simplest change to be able to push messages, given our current stack, was to move much of this communication to websockets. Once picking a c# library and getting basic communication set up, I needed to do something about all the request based code that was previously using https.
In order to make the changes on the c# side more iterative, I decided to write a simple request system that sent its messages over the websocket connection. When making the alpha, I had made a code generator in erlang that took a spec (written as an erlang data-structure) and generated erlang and c# code entry-points. This has made keeping both sides in line trivial and, now, it made it easy to update the code as I just had to update the code generator.
With that done, I spent some time fixing bugs from the refactor until I could log in, create boards, etc again. I then switched tasks again to certificates. We had used amazon’s certificate authority for production previously as they were free for AWS infrastructure, and our servers were behind one of amazon’s load-balancers. Now that we are using websockets, we would have to switch to their ‘network load balancer’, which I don’t have any experience with yet. For now, we’ve just picked up a good ol’ fashioned wildcard cert and will work with that. We spread load across servers in the new approach anyway, so using the load-balancer for that is no longer an issue (though there are other benefits). We can look back into their TCP level ‘network load balancer’ at a later date. As an aside, we have previously used letsencrypt in staging, and while it did work well enough, I would need to make some changes to use it in production. This ends up being one of those cases where it’s cheaper to buy something simple that works than to use the free thing.
Also, on the server side, I’ve been reviewing our AMIs and docker images. One question I’m trying to get resolved right now can be found over here. It’s a simple thing, but I have failed to find an official text stating the answer. If you have some expertise, it would be super useful to hear.
Yesterday I started fixing a bug, which caused us to assign identical script ids to tiles in different zones of the board when a drag spanned multiple zones. It’s simple enough but still requires some focus not to introduce a dumber bug in the process :p I also fixed a small bug that, for tiles only 0.5units, we spawned them too close together.
Today I’ll be finishing off the script Id issue and then possibly looking at rewriting the unique creature support.
Today has been a tricky one. I found a significant bug in the building synchronization code that is definitely going to cost me a day or two. Given that I’m already behind on where I wanted to be this is a bit stressful. As I have an understanding of this problem but still have a good few unknowns on the server side I’m going to task switch for a few days and see if I can get a better overview there.
Not a fun update to write but here’s to the next few days regardless.
Heya folks, yesterday I got the Spaghet that run our doors and chest hooked up and syncing correctly.
Spaghet is our little scripting language we made to deal with the question “if someone drags out a 30x30 slab of scripted tiles, how much resources are we using (both CPU & networking)”.
Scripts come in two kinds:
- state-machines: Their progress is driven by user interaction and is synchronized across the network.
- realtime: Which run every frame and are unsynchronised
Each script gets a small chunk of private storage (currently 32bytes) and runs on top of unity’s job system, which means we can run them concurrently across cores with no GC. We can also trivially pass the state of a script across the network and, with a little book-keeping, reapply it on the other side.
Currently, we are only using 1 script (the one that controls doors and chests), but we have a lot in place to be able to expand on this as we head through the beta. For now, I’m just happy to see it working.
Today after finishing some Spaghet work, I switched to looking again at copy-paste. Here I’m just trying to get it to the point that it is spawning the copied slab as expected without worrying about the feel of the tool. After that, we can look at this as more of a UX task.
Hmm, I think that’s all for now. Back with more tomorrow
Well, I’ve learned something about myself in this whole process. The closer I get to deadlines, the harder it is to convince myself to write updates when ‘the answers’ aren’t yet available. I’ve been struggling with a bunch of bugs and issues in the last couple of weeks and each time I see that I really should be writing an update I see the unsolved problems and think “If I can just get a bit further I’ll have something more concrete to talk about”. I guess dev logs getting sporadic around releases is just a thing that’s gonna happen for now. We shall see.
That said, hello again! A lot has been happening behind the scenes recently.
One big push on my side has been moving using ids rather than actual object references for creatures/players/etc across the whole project. So often in the future, the game will be receiving info on things that aren’t loaded on your local client, and it needs to be able to handle that. Luckily this only effects: building, sync, creatures, initiative mode, movement, gm request, and.. yeah well, everything. So, in short, I’ve been rewriting a significant portion of the codebase. It’s gone well, but it’s a lot of stuff to have in the air at once.
I’ve also finished implementing the sync of boards both to the server and between players. For now, it’s the same method as we used in the alpha. We are saving the whole board to a single file we load all at once. Naturally this wont work for big boards but I need to do some server work before I can add per-zone sync. We could ship the beta with this and then upgrade as we go. This may happen. The most important thing, for now, is that we get something that will get us out the door and is in the right shape to be moved to the new approach.
In fact, so much of the work feels different this time round in that we know what we need to make, but it’s a lot more cognitive weight :D. In the alpha, you make something, and then all the implications come pouring out; this time, you have to build and build with the main comfort being that, at least, you know it’s all needed.
Automated testing has been a lifesaver too. I recently added an option to the tests so that halfway through the test, it will serialize the board, deserialize it as a new board and hook it up like a networked client. It then continues the rest of the random actions and compares the two boards afterward. With these random tests, I’ve been able to find piles of dumb mistakes.
Other stuff that has been worked on in the last few weeks:
Rewrote the undo/redo stack and the code that moved asset data between the active and inactive set. It had gotten far too complicated and was impeding progress. The new system is coarser but simple and fast.
Ree has been working on a control to allow you to move creatures between floors and across otherwise impassable obstacles easily.
Wrote new managers for tracking board, campaign, and network state. Some of the work on campaign state is getting it ready for the coming backend changes, and a pile of it is making sure that our connection to the realtime network stack isn’t so directly tied to whether we are considered in the board or not.
(Finally) hooked up single tile delete!
We want to support up to 16 clients building. Whilst writing the code that hands out those slots, I realized that we have a lot of what we need for spectator mode. We’ll have to do a bunch of work on the UX side for this but it’s more feasible now.
The atmosphere system is under heavy modification by Ree. He’s looking into how music is handled now. While we’ve been warned the first version will be pretty bare-bones, I’m personally excited for where this will go.
We have now been able to jobify lots of tasks that only update shader state. This includes tile drop-in a highlighting. This can be much faster now.
Moved the creature state to the new board representation. Attacks, stats, death, etc all go via the new system now.
Rewrote cutscenes sync
Disassembling class hierarchies for assets: Don’t do OOP kids :p More seriously, I had tangled the implementation of creature & tile board assets in a way that made progress harder. I have undone that now and, while there is seemingly more duplicate code, it’s way easier to manage and iterate on (and most of that code has subtleties that mean it shouldn’t be merged anyway)
Rewrote how Spaghet scripts index their private state
Water prototypes! (see the bottom of this page)
And waaay more. Lots of bugs are being worked on in the mix too.
It’s been a very productive time, but I’m still a good week behind where I wanted to be at this point. For context, the first ‘proper’ board load with the new system was 6 days ago, and the first networked play (with the new system) was yesterday. To be fair, it’s, of course, the case that for those to work tons of other stuff has to be working, so this isn’t too unreasonable. However it’s still a little tight. The best thing to do it work hard and keep you all in the loop.
Under the next one of these, Seeya
p.s. Here’s a quick extra that we are showing off on the Kickstarter this week! More news on this as it happens
I’ve been having a great time coding down at @Ree’s place, but it has meant I’ve neglected these updates. Let’s fix that.
We met up so we could collaborate on merging our feature branches together. Up until now, I’ve been focused on performance, and the data layer and @Ree focused on tooling. The goal isn’t to complete the merge, but instead to do any tasks that are easier when we are in the same room.
I’ve successfully dumped all of my code into the project, got it all building, test passing, etc. The managers for various systems are created now, but the building tools are still using the old code for now.
One massive source of complexity in the old version stemmed from our use of Photon, the 3rd party networking layer (which is great btw). By default, when a person leaves the game Photon automatically clears up all the networked objects they created. This is no good for creatures (which were networked objects) as the other players still need to see them, so we disabled auto-delete of networked objects. This naturally meant we had to handle cleanup, which includes things like dice.
With the new system, we won’t ever have the whole board loaded at once, and this means some creatures won’t be loaded either. This also means we don’t want Photon to be helpful and make sure all networked objects are spawned as soon as someone joins the board. To deal with this, we removed the Photon network component from the creatures and gave a networked ‘handle’ to each player. This is attached to a creature when you select it and syncronizes the transform while the creature is held. On the other end, if the creature is loaded, it seems to behave exactly the same sa it did previously. If not, then no worries, once the creature is spawned, it will have the latest position will be sent anyway.
After a misguided prototype, I teamed up with @Ree to work out the handle api, and then everything came together very quickly.
This change actually lets us turn back on auto-delete in Photon and delete a bunch of cleanup code. Removing code is the best.
NOTE: The handle approach may seem super obvious, but previously there had been reasons that it wasn’t a good fit. However, those reasons are out of date now thankfully :)
On the subject of creatures, @Ree has been working on a new creature controller, which is much better suited for moving around these increasingly vertical scenes. It’s based on the excellent Kinematic Character Controller asset for Unity. It’s an impressively stable base to build upon . Our creature control still looks the same (we haven’t switched to slidy tank controls or anything :p), but this lets us handle moving through scenes more reliably than before. You still have the ability to lift the piece and throw it over walls when needed.
With the simplification of the networking code, I looked into upgrading to the latest version of Photon. I got partway through the migration before I hit an undocumented change. The StreamBuffer class used to be a subclass of
System.IO.Stream and in the latest version, it isn’t. We have a bunch of code that relies on it being a stream, and so all that will need to be rewritten. This was already underway, but we need to keep the codebase working while all that gets finished and hooked up. For now, that means that I have to give up and do it later. Lost a few hours to this but ah well.
Another thing I have been doing during this merge is ripping out code. Old fog-of-war, line-of-sight, board format upgrading, all that and more got ripped out. There is no point fighting things if they aren’t required to keep tooling development progressing. It also is making it easier to work with or refactor code that used to have to interact with those removed systems.
There has been plenty more than this going on. @Ree is changing how the board grid works; I’ve been working on build speed and assemblies… and so on and so on.
In all this is going very well. We are plowing through tricky stuff and are ready to get back to working separately again, albeit this time with much more frequent merging as we approach Beta.
I’m taking Sunday as a rest day, but I’ll be back on Monday with more dev news.
 strictly, it’s the ones they own, which can be different if ownership has been transferred.  also amazingly cheap for what it is.