Rebel Racing Clubs discovery
Clubs are a popular part of Rebel Racing, but how do we ensure players can search for them efficiently? Martin, one of our Hutch senior engineers talks tech stacks!
Hi, I’m Martin Wong and I’ve been at Hutch for 4 years. Previously I’ve worked on PC and console. Life at Hutch started on mobile as a client engineer, but these days I work on the server backend solving interesting problems and trying not to break the internet!
This blog will talk about how we went from a design requirement of being able to search for and recommend clubs to users and how we ended up with the Elastic Stack as part of our tech stack. We’ll also look at some of the interesting problems solved along the way.
Show Me Some Clubs
The design spec was to have something that allows users to search for clubs that they could join, that recommends clubs to the player based on certain criteria, but also allow the users to set filters to help narrow down their search.
From a technical point of view this meant we needed a solution that was fast, allowing powerful searches and the ability to control the ranking of the results. Our games at Hutch are published worldwide, so text searches would need to support multiple languages. Server engineers here, even though assigned to a game project, still aim to build core features that can be shared across projects, and this was a prime example. The solution was guided by detailed game designs from two of our titles, but we envisioned that this would also be used for future projects so it also needed to be flexible yet scalable.
What Tech Solution to Choose?
None of our existing tech solutions were fit for the job, both traditional and NoSQL databases were just not going to cut it, it was time to bring in some new tech.
After some research I quickly zero’d in on a few solutions that could do the job. These had a few bits in common, they were based on inverted indices and used the Lucene query language under the hood. Inverted indices allow the ability to search for just about any indexed data very fast. Indexing involves breaking a document down and indexing it in a way that allows really fast look-ups, which is basically the same technique used by search engines. Then to complement this, Lucene is used providing the ability to perform powerful search queries.
Three solutions were evaluated; Azure Cognitive Search, Elastic Search (self-managed) and Elastic Cloud (managed). The clubs search problem was applied to each solution, we looked at how well they solved the problem and if there were any features missing that needed extra work. This involved mocking up and indexing over a million clubs and performing load tests and carrying out scaling operations.
In the end, the Elastic Cloud managed service was chosen. All were similar in their ability to meet the criteria in the design spec, but deciding factors came down to performance, cost, ease of maintenance and scaling, and flexibility. As a shared piece of tech engineering time was also a factor, we wanted something that was relatively easy to learn, maintain and develop further so that other engineers could adopt the tech and hence help spread the knowledge.
Searching in Different Languages
One of the interesting problems we encountered was multi-language text searching. So basically to be able to search for words in a language effectively, the correct language analyser should be used. Language analysers support features such as word stemming, a technique used to reduce a word to its stem, similar to its root, this allows a search word to match a word in a document even though the exact word wasn’t used. For example,
Race, racing, races
All these variations will all have the same stem race. This means if any of the words in the list are used, eg. races, when a clubs document is indexed, the word will be normalized using stemming and indexed as race. When it comes to searching for a word, even if a different variation is used instead, eg. racing, that would also be normalized as race, so would match the indexed document.
The problem was that language analysers cannot be assigned to a field dynamically, you can’t simply say use the correct analyser for the club name based on the language used.
There are various ways of getting around this restriction, the solution we went for involved using an Ingest Pipeline, which is a set of rules that are automatically applied when data is ingested during the indexing process.
The solution involved 2 steps:
When creating a mapping file (a schema defining what fields are searchable, their data types and what language analyser is associated with a field), we define some extra fields for text search fields, eg. club name and description fields. For each of these fields, additionally define language variants and assign the relevant language analyser. Eg. for name, also define name_fr and assign it a French analyser.
When it comes to ingesting and indexing we set up rules, for example, if the locale field contains fr then copy the contents of name into name_fr so the relevant language analyser is used.
For searching, a similar step is applied. The game client performs a search with a requirement, for example, give me all clubs containing the word racing in the name field. The client is not burdened with any knowledge of language analysers, it is on the server backend where it is redirected when the Elastic Search query is built based on the locale value passed from the client.
This was found to be a good solution. In terms of space efficiency, whilst multiple mapping rules are created for the various language variations, a document describing a club would generally only contain the original generic field plus the language variant field that we copied into. The locale field we use for the ingest rules, by default, is the locale the user’s game client is set to so is assumed to be the predominant language used in the text search, whilst not perfect, is fine for most cases.
Keeping Things Synchronized
For the best user experience we decided to run multiple Elastic Search instances globally, so users would perform queries against the closest instance giving the fastest response. This led to another interesting problem of ensuring data integrity and keeping it all synchronized.
Here is the configuration we went with. Two regions are shown to illustrate cross region behaviour, but in practice, a game is run in several regions.
When clubs are created/modified, they are written to Cosmos DB, which is the source of truth. As shown in the diagram, we can see that Cosmos DB is set up in multiple regions, which enables auto geo-replication of clubs for faster access. To simplify things, writing only happens in a single region to avoid complicated cross region merge rules. Optimistic concurrency via an ETag mechanism ensures that club records are correct.
Cosmos takes care of the clubs records in storage but for the regional Elastic Search indices, background processor jobs are run that periodically ingest the clubs data from Cosmos into Elastic. Some processors perform ingestion from various sources, for example, users can gain club points for their club, when searching for clubs to join it is desirable to get an indication of how strong a club is, so the total club points from all the members is shown. This value, however, can change very frequently but for cost and user experience reasons, updating an index too frequently can have performance costs. So to help mitigate this, we have processors that update point totals periodically, this seen as acceptable as a potentially slightly out-of-date point totals still give a relative indication of club strength, it does not need to be a real-time value.
We could enhance the clubs experience even further. There are other Elastic Search features we are yet to explore, for example, the built-in support for synonyms, allowing custom alternatives to words, perhaps a user searching for clubs mentioning Beetle cars would also like to be shown clubs mentioning the well known movie featuring a certain famous Beetle... Herbie!
Now that we have the Elastic Stack all set up and at our fingertips, this also opens up further opportunities. One natural application would be to pump our logs into Elastic and allow that to be searchable. Other applications might include providing analytics to gain useful insights on data which our current tools might not have access to. An even more fascinating question would be, if we stored certain information on all the existing players in a game, could we use this somehow to feedback into the game?
Look out for more Tech blog posts coming soon!
And if you're a Server Engineer or a Unity Games Engineer looking to start a new chapter in your career, check out our current vacancies here: hutch.io/careers. We'd love to hear from you!**