Archive

Archive for July, 2009

Patchday @ public.beta (1.21)

July 30th, 2009 14 comments

The gravestone overview

The gravestone overview

We are updating the beta world to 1.21

Changes are:

  • fixed the ranking table for L99 sorting
  • fort battles distribute XP and items
  • reports for fortbattles don’t show the stats, instead this is shown in the fort (crosslinks available)
  • items are distributed equally to the money brought in the battle (i.e if 10000$ are in the battle, you might get an item of that value divided by the number of survivers)
  • log failed logins (currently not displayed in the settings screen)
  • duelable again after dying in the fort (no more 48h protection)
  • Telegrams to player names containing special characters (f.e. Hungarian á)
  • using email addresses with ccTLD (com.uk) is now allowed
A nice strategical video about a fort battle where a lot of players are on-line and interacting, really awesome: strategical fort battle on YouTube

ccTLD

Categories: Uncategorized Tags:

Houston, we fixed the glitch!

July 28th, 2009 Comments off

Maybe some of you already know this bug, but it has not occurred too often so i will first tell you the beginning of the story. After rolling out the fort battles globally and some times has gone, we got some reports about hanging fort battles. The times goes down to 0 seconds and then the game just hangs. The next round will never be toggled and because of this nobody could move anywhere not only inside the battles but also outside the battles. So this was of course a serious problem. Indeed we had already set up some nagios monitoring scripts that told us just in time when a battle began to hang but we were forced to restart the java server (that seemed to work so it was something non-deterministic). And as we already blogged, there is currently no journaling implemented so a restart of the server equals a complete battle reset. Don’t worry: I’m currently working on the journaling 😛

divider

RL deadlock
RL deadlock

We can proudly announce that the bug with hanging fort battles should be fixed now. Because of the fact that it was a non-deterministic problem, it was really pain in the ass to find it, but that was more because it was some kind of “nested problem”. First at all we looked the logs up, we searched for any kind of suspicious entries but we don’t found anything suspicious and that surprised us a bit. First at all we guessed its a deadlock, so we watched out for blocked threads. For all those who don’t know what a “deadlock” is, just take a look at the picture on the right and then it should be bloody clear 😀

We invested some time on trying to reproduce this bug, but without any success. So we had to wait until the symptoms reappear on one of the running daemons to perform some checks and trace analysis on the running application. We thought with our monitoring tools this should be no problem, but it was the opposite. Every time the monitoring notified about a occurrence a sys-admin was faster and restarted the server so it was already running when we logged in 😀 After notifying those to not do anything so we can perform our analysis then problem don’t wanted to reappear for quite a time.

divider

diagram
The visual explanation of our problem.

During this time we have already implemented a Watchdog that checks for deadlocks or deadlocked monitor periodically in the running application. And then the long awaited moment came, just after we restarted some of the java servers because we made a  quickfix update, we got two crashes at the same time. First thing we did was calling the sys-admin staff and remind them to not do anything, guess a not too exhausting wish 😀 We were pretty sure to grep some messages in the logs about deadlocks, but what do we see? Yes, nothing! 🙂 This was in fact strange, so I used jstack to get some information about the current threads, states and whose stack traces but the output affirmed that no thread was deadlocked, so: WTF! So by painfully going manually through the code, tracing what the issue could be, we got the right suspicion about the problem, which was then confirmed by staring at the server screen: the master thread was not there, not running, nothing, so it died. But WTF: This should be logged so something is really messed up. But sometimes the easiest solution for this behavior is the right one: This was never written to the log file and indeed, after checking I found out that the stderr pipe was not logged so i fixed the init script (with some kind of sub-shell piping) and we ended up with some ConcurrentModificationException. This was exactly the opposite of what we guessed at the first point, we had no deadlocks, we just didn’t synchronized enough 😀

Shortly we tracked down the problem (thanks to the now existing log entries) to just one small block of code that we now synchronize and the queasy adventure ended abrupt. But hey, just look at the good part:

  1. our logging is now working just fine
  2. we are prepared for deadlocks
    Categories: Uncategorized Tags:

    Optimizing Fortbattles

    July 27th, 2009 4 comments

    The flash client for the fort battles has been developed in an extremely short time (compared to the rest) and due to this, it is not as efficient as it could be. I fixed the worst issue in the drawing algorithms of the client about 2 weeks ago: The sectors.

    Sectors and onlinestate of players

    Sectors and onlinestate of players

    The reason was the algorithm of how the sectors have been drawn: For each cell, the algorithm checked the borders of the sector, and if the cell is next to different sector, a border is drawn on that side of the cell. So this check was done 4 times for each edge of the rectangle and for every rectangles on the map. This is a stable and simple approach, however, also one of the slowest approaches.

    In order to fix that, I changed the approach of drawing the sectors: Instead of checking each cell one by one (about 600-700 cells), I wanted to draw each sector one by one (there are about 50 sectors). Therefore, I needed to determine the sector boundaries at first. This can be done ahead of time when the map is loading, so that algorithm for detecting these boundaries can be somewhat complex as long as the description of the boundaries can be easily used afterwards. Detecting the outline is in general a simple algorithm – also known as left-hand rule [1]: If you are in a maze and you want to find all possible ways, you can simply stretch out your hand and keep it on the wall. That way, you are traversing the complete maze – at least if it is an ordinary maze. Though this excludes mazes where there is a center maze where the walls are not connected with the walls on the entry of the maze, but for my problem, this doesn’t matter.

    Actually, I tried at first a much simpler approach even, but that would have worked only for convex sector shapes, and we have a couple of concave sectors (If you don’t know the meaning of concave and convex, you might look it up here[2]).

    I find graphics programming often very nice because even the bugs can look pretty:

    Wrongly implemented left hand rule

    Wrongly implemented left hand rule

    So what went wrong here was that my “left hand” left the “wall”. Quite wrong. After a while the result looked like this:

    The correctly found sector outlines

    The correctly found sector outlines

    Which is looking more correct. It also looks funny because I also used a small trick to detect problems in my border tracing: During the implementation it happened, that my algorithm did not notice that it traced a part that it had visited before, so my border description contained redundant information.

    To see these multiple lines, I drew the outlines with random vector offsets at each point. This gave the output such a scribbled look. You can also see that way, that I already stripped the points away between the corners – if a line stretches over a couple of cells, the initial algorithm would record points at each corner of the rectangle. These points are not required to draw the sector, so I tweaked the algorithm to reduce the amount of drawn vertexes:

    removing the unneeded points when drawing a straight line saves time during drawing.

    Removing the unneeded points when drawing a straight line saves time during drawing.

    If the algorithm had still these unneeded points, it would have become visible when drawing the outlines with random vector offsets.

    The performance win of this improved technique was not crucial but in deed measurable. There are other parts that could be optimized, but as this improved already one of the worst algorithms, this can now wait a bit.

    [1]: Maze solving algorithm: http://en.wikipedia.org/wiki/Maze_solving_algorithm

    [2]: Convex Hull: http://en.wikipedia.org/wiki/Convex_hull

    Categories: 1.20, Fort battles Tags: , , ,

    Global rollout: 1.20 + hotfixes

    July 24th, 2009 3 comments

    The global rollout was done without major or even minor problems during the update phase. This pleased us a bit, after the final rushing and some problems we reported about, it was not easy to make a statement. It just can’t be as worse as the 1.18 rollout, but after such experience you never know 😀

    After we spent some time to harden the 1.20 release and crush the most casual bugs we now rolled out a hotfixed version. We fixed some user reported bugs as well as those we caught up on our central syslog monitoring tool where every unexpectedly raised exception and error will be logged. Using this technique and fighting against every entry improved the code quality a lot and helps us to get a global overview about whats going on. But now i want to tell you about what we fixed exactly:

    1. fixed a bug in the town name validation method that allowed to build two towns with the same name under special conditions.
    2. fixed a bug that could leave open connections to the memcached back and maybe reach the max-connections limit. we now call the connection close in the destructor of our memcached backend to hopefully fix this glitch. I will do some testing about polled and persistent memcahed connections in the future and maybe improve the implementation and technical deployment in one of the following releases.
    3. added a fort monitoring script to report invalid states just in time so we can react very fast and try to track down the problem.
    4. added a notification in the GUI if the players email is not valid or confirmed yet. We have to make the sad news that we limited in the same breath the game interaction and functionality for accounts without confirmed emails. Sorry guys we really have to 🙁
    5. fixed some wrongly named images that was noticeable in the fort overview as missing building images.
    6. fixed problem with terminating the called php process from java. Hopefully. This could be the fix for the not ending fort battle glitch where players get stuck.
    7. last but not least i improved our deployment scripts with some nice and really useful colored output. Its much easier to determine whats going on during a global update if somebody only in the first place has to take care about the colors in the prompt. It was really confusing before, just seeing a huge couple of white text lines.I had to perform some manual eye-powered RegEx matching to catch probable update and deployment error messages. Now its quite more relaxed to just hit enter and look and the nice colors. awesome.just some small pleasures of life 🙂 But sorry guys, too poor that i guess none of you would ever see this, so here it is:

    west update screenshot

    Categories: Uncategorized Tags:

    Bandays on beta

    July 24th, 2009 1 comment
    banplayDon’t be fooled by this screenshot – we banned more than just one player

    Due to a recent ugly take over of a town on the beta, we devs have just been busy cleaning up the beta. Seems like a multi took a town and kicked out all other players. We won’t restore ownership of that town, but we got rid of a couple of players on beta now. Apparently we gathered quite a few multis / passwordsharers on the beta – turned out that noone was responsible for that job till now. We are going to take care about that so that our precious development time is not going to be spent on managing the beta world. Although it’s been a quite interesting break. And I guess that some players are now at least a bit satisfied (or not…).

    Anyway, I had no experience with banning players, so I took a lesson on how to hunt multis today. Don’t say you haven’t been warned.

    By the way, this is not what I was referring to when I was saying that I would appreciate more cheaters on the beta. Multi account, password sharing and stuff like that are still bannable offenses.

    And meanwhile … – I haven’t been blogging so much recently because there was not so much to blog about, but I have  a couple of topics I am working on… for example the one on “how to cheat”. But that’ll need some time. But it’s on my list, really!

    Categories: Uncategorized Tags:

    Player limit on beta raised

    July 10th, 2009 4 comments

    Just a small announcement: We have raised the limit on the beta world to 2000 players.

    http://public.beta.the-west.net/?page=register

    Categories: Uncategorized Tags:

    Thursday chaos

    July 9th, 2009 3 comments

    I originally intended to write a post on cheating (and partially it’s done), but due to heavy workload and other stuff (I got a brand new computer that I assembled and tested during the last two evenings), I think I should give you a small update. And why not now, after such an ugly buggy Thursday like today was?

    So one week after the 1.20 release we ran into serious troubles today on 1.20.

    The reason is, that the fort building stuff is still more at a beta level than we expected it to be. The reason was probably our beta testing configuration. As someone pointed out in a comment here, it’s not exactly easy to test building forts on the beta world. That’s really true and we should have thought earlier of that. Well, this hits us now hard, especially those ones playing on world 1.

    What happened?

    Well, first there was some buggy code for limiting the building levels in the fort – there were only few and wrong limits for the buildings. This caused for example a bug that you couldn’t sleep in the barracks once it was above level 6. That shouldn’t have happened.  Congratulations to the town that managed to achieve that. We should hand out awards for such doings.

    Anyway, that required us to fix the data in the database, which included a lot of new codelines and update scripts. Technically the update scripts worked, except for a mistake in the headquarter level and a bug regarding the building process which sometimes made the building becoming level 0. That was really bad. I would love to say that the players simply used materials during the building process that were of such poor quality that the buildings collapsed occasionally – but everyone know that it was just some bad code of our side.

    Well, sorry on our behalf. We spent some time on fixing the results by extracting the building levels from a database table backup which repaired at least most of the broken buildings – but some remain to be below previous levels. That’s ugly and shouldn’t happen on the productive worlds. Yet it is hard to avoid, especially after introducing a huge amount of code after a quite long time without a public release. At least it did only hit the new feature – strictly speaking, it could be really much much worse.

    Speaking of worse – we will work on denting out the other bugs tomorrow. We have the aim to update other worlds sometime next week, so we only try to fix the most important things. For example, I don’t think that I will fix the animation bugs in the flash client. The animation might be at some points being ugly, but it is still bearable while the fort building process simply needs more attention right now.

    I just hope that the Friday will be better than today.

    Categories: 1.20, beta Tags:

    The 1.20 release

    July 4th, 2009 19 comments

    1.20 has been released on the German world 1 – quite successfully I would say as there haven’t been so much problems, although quite a few bugs of different severity haven’t been found during the beta testing and we had / have to fix a lot of bugs.

    I have to say that I’ve been a bit disappointed by the beta testers regarding cheating: I would expect beta players to try to cheat in the game wherever possible. For example: It’s been possible for attackers to set the starting position on the flag points just by modifying the requests that are sent to the server. I only got aware of that when some players were affected by another bug that occurred when they tried to join more than one fort fight. That happened only after the 1.20 release on world 1.

    That’s been a quite bad bug that shouldn’t have been released, yet it got through. Over the months of development I just always “moved” the responsibility of checking illegal positions to other code parts that I had not been working on at that moment – i.e. when getting the values from the players and I wrote it into the database, I expected to filter illegal positions on the fortbattle server. When I implemented the fort battle server, I expected that the values in the database would be correct. Due to the different times when I wrote these parts, I never implemented that checking anywhere. Nice, isn’t it? Yet, such glitches are normally found by evil cheaters quite quickly and they can be used to their advantage. Just imagine a group of attackers spawning on the flag just when the battle starts. Wouldn’t be so nice for the defenders…

    So I would like to remind our beta testers: Try to break our rules on purpose! Don’t avoid unfair play but enforce it! You won’t get banned for that (unlike as on the public game worlds). Some of you did already do some testing like that, but we need more of that (like making fort member towns ghost towns (sorry for not having fixed that issue yet on the beta server, but we are not community managers and we had quite busy weeks…)).

    Anyway, I wanted to write here now about what all went wrong during the release and how it happened.

    Thursday, 7:45 am

    I turned up at work at that time because it was the plan to make the release in the morning time so that the first forts would be finished around 4 pm / 5 pm. We thus expected that then the first battles would happen at Friday around 4 pm / 5 pm. I started working on several minor bugs, like fixing a glitch in the fort management and a problem with that particular piece of javascript when executed with Internet Explorer browsers.

    9:30 am

    About at 9:30 am Anthraxx pushed the button to make the release. About 9:31 am Eiswiesel ran into our office and yelled “Build time for Forts is 30 seconds”. That’s how it’s been on the beta worlds. Uops. We immediately fixed that wrong value, but when we did that, 3 forts had already been built by players. 8 other followed because we forgot to change the values in the task queues that had already been inserted into the database. These players had of course a minor advantage over other players since they were the first to have forts. Some players asked to undo that unfair development but that is not a simple request. We just decided that these lucky ones were just lucky winners in the fort build lottery. Their prize was the immediate attack of other players.

    And that was how it came that the first fort battle was to be fought at 9:37 am on Friday instead of 4pm as we planned to. But that was not really to our disadvantage however.

    10 am

    A few minutes later the first bugs reports dropped in. It was my change at that morning that on the one hand stabilized the IE browser but also introduced a bug that made it impossible to invite other towns. But other bugs got visible too – IE 6 users were unable to build forts because the name field of the founded fort did not turn up on their screens. A little bit later other players reported that a county had 4 forts while the one below had 3. Oups. Another oups when players found a cozy little location of another fort:

    fort los angelus

    Quite a nice location I’d say, but also quite wrong.  Later it also turned out that the county with 4 forts had not 4 but 3 forts and there was one fort missing on the map that is shown in the minimap. The other county with 2 forts in the minimap had however 3 forts and the 3rd one doesn’t show up in the minimap.

    We still have no clue what to do with Fort Los Angelus, neither do we know what is causing the minimap to behave that strange.

    Thursday afternoon

    Ok, so the first battle was to be fought on Friday morning. I checked if the policy port (1028) was reachable and after a few tweaks it worked. I should have checked the other port as well then (1578? Too lazy to look it up now), but I didn’t. Instead I spent some time on implementing battle logs so that we can later replay the battles. Don’t ask for a playback function, I still don’t know if the recorded actions are complete since I lack a playback function myself. I did also fix some other errors, just like my 3 colleagues in the office.

    Around the late afternoon a bug turned up that always existed but never became visible: Due to the fort battles, for the first time in west it made sense for players to gather at a single location. When about 150 players had been luring at the first fort that was declared war on, the player list functionality broke due to too many players at one spot. That was also eventually fixed at afternoon.

    When I left the office I had an intensive 11 hour day behind me and I had a bad feeling about the 200 players who were about to fight the first fort battle on a public server. Due to the mishap in the morning time, I had to turn up around 8 am on Friday as well (I normally come in around 10 am).

    Friday, 9 am

    I still had not checked the second port when the first battle was about to be fought. I just expected that if the policy port was running that the other port would be forwarded as well. Well, checking the port was not that simple because I had no application to test that in a valid way anyway. Had no time to write such a tool.

    And this is what went wrong when the battle started at 9:37:

    We instantly noticed that we couldn’t connect with flash to the game. That was bad. Looking into the server logs we saw that the server was bombarded with policy file requests that got answered promptly. But no connect request came through. After a couple of WTFs all over the place, hastily hacking on console windows and checking connectivity we realized that of the 3 ports that the server opened, 2 were forwarded correctly and that was the policy port – and the maintenance port that shouldn’t be available to the Internet at all. The game port that is used for the flash communication was simply not addressed by anyone.

    After we figured that out we were able to fix it in time and about 10 minutes later the connection worked to my great relief. We could view the battle taking place and the final report to be sent to the spectators.

    Later on Friday we noticed that players reported that they were playing on the wrong locations when they joined multiple battles. Also we noticed that the final report was not successfully sent to the users (but that was not noticed by the players due to the lack of information that there should be a report shown). Turned out that the reports became larger than 65kb which was the upper limit for reports at that point. I fixed that bug along with the bug of illegal player positions.

    Over the day several battles were fought and the battle server stabilized. We made a few minor changes then – and now I am curious how stable the java server is going to be over the weekend. That is quite a stress test that is currently running. But I am quite optimistic that since the server has survived now for quite some time that it will run stable for some time. Yet I am not sure if we fixed all memory leak issues and so I don’t know if the server is not going down within the next 48 hours. I can’t predict anything there yet, but we’ll see.

    I hope that the next week is going to be a bit more calm so that we can fix a couple of other problems and do a bit planning for upcoming features. I would expect that after this huge version release that 1.21 is not going to introduce such large features but will only provide minor features that we couldn’t get into 1.20 yet – but we’ll have to see what we will work on for that release.

    We will also see next week when we are going to unroll 1.20 on other worlds and also other languages. This could take the whole week maybe, but I am not sure on that. I hope that the players of other worlds are patient – it shouldn’t take so long anymore.

    PS: I am getting constantly requests to increase the number of beta players on the beta world. I am aware that a lot of players want to get into the public beta but you have to understand that I can neither change the player limit nor can I make the decision myself. It has however been decided that the player limit is going to be raised in the future. I can’t really tell you more on that because I simply don’t know more about it myself.

    Categories: 1.20, Fort battles, The West Tags:

    upcoming release of 1.20

    July 1st, 2009 13 comments

    So, today we made the decision that 1.20 is to be released on the German world 1 during the morning time of Thursday. Other worlds will follow if it turns out to be running stable. Of course, since we have the beta world we are a bit more confident among the devs, but to be honest: There are a lot of places where we fear that something could go wrong.

    As for my part I would expect that the fort battle server is going to have troubles in the first fight. When the first battle is being fought, there’ll be a lot of players involved and also a lot of spectators will try to watch it. However there’s a limit of how many players are going to be able to watch it. We’ve spent quite some time to prevent that an overflow of incoming connections are going to shut down the server. Instead it should simply denying new connections if there are too many connections open. We have a limit of 64 spectators for one battle and a limit of 512 connections in total. So in theory two battles could be fought at the same time or 4-5 when only fewer people are involved.

    During the week I did also not only work on the server but also a bit on the client here and there. I added some artworks from our artists and also added the feature to the client that you can see who is actively playing in the game and who is not connected. There are also now popup information on the map so that it is easier to identify players on the map. Next to that we fixed bugs bugs and more bugs. Have I mentioned fixing bugs?

    I am really curious how the release is going tomorrow, especially when the first fort battle is starting.