This Week In Veloren 137
This week, we break down how the 0.11 release party went and examine server metrics.
- AngelOnFira, TWiV Editor
Contributor Work
Thanks to this week's contributors, @zesterer, @Sam, @Capucho, @xMAC94x, @imbris, and @Pfau!
This week, 0.11 was released! The release party was the largest yet, with 181 players on the server at once. You can read the blog post with the changelog here. If you haven't already, check out the trailer:
During the release party, some devs chatted about what new changes there were, as well as some more technical details about the launch. You can check that out here:
Analyzing the release party by @AngelOnFira
Images by @YuriMomo
This release party was by far the best yet from the perspective of the infrastructure team.
In previous releases, our server started having major issues as we reached our record player counts. There were also several issues we didn't see before having that many players online, such as deadlocks in players connecting, or having certain systems taking longer each tick, causing a snowball effect.
With this release however, we did not experience any of these issues. The largest problem we found was that our entity sync system was taking too much time, since it hasn't been parallelized properly. But it didn't cause larger issues in the release party. Even with the 181 players, the server was still operating nominally!
We ran the release party on a 48 core compute server in a Hetzner datacenter. For the last release party, we used a 32 core server. Throughout that party, we were still struggling under the load of several systems. It's hard to say how much this server helped improve the performance overall, but it clearly helped with several of the larger systems.
In the image above, we can see that the heaviest systems are terrain and physics. Each tick is supposed to be around 33 milliseconds, which will create 30 ticks per second. Terrain is taking 523 ms, while physics is taking 252 ms. This is because they are combining work over all 48 cores to add to this time.
Above, we get a larger breakdown of the "slow systems" that are taking up time on the main thread. We can see that times for the terrain and physics systems are quite small since they're parallelizing nicely. However, entity sync is much higher. We can also see that it becomes a larger problem later during the release party. This should be fixed before the next large server party.
Above, we can see our bandwidth usage. At peak, we were transmitting around 40 megabytes per second. This is about a third of what the cloud server can offer with its gigabit internet, but we should expect it not to meet those limits, as cloud instances share the uplink with other VMs on the same host machine. Going forward, bandwidth is something we'll work on improving. This is both to take the load off the server, but make it easier for clients as well.