Activision has been taking deep dives into its matchmaking process for Call of Duty multiplayer in an effort to enable players to understand why they end up in the matches they do.
In April, Activision released a white paper that focused on how ping times (or speed of your connection) matters in matchmaking. And today it’s dropping another white paper into how skill matters in core multiplayer matchmaking.
The developers said today that Call of Duty will always strive to improve its approach to matchmaking to ensure that all levels of skill in the game are having the best possible experience. A healthy player population is good for everyone in the community. I like the openness around the development that the team is sharing; recently, Activision also allowed Modulate share results on its efforts to use AI to curb toxicity in voice chat in Call of Duty.
Skill is one consideration as part of the process, but not the primary factor. The illustration on the top of this story shows what else matters in the matchmaking algorithm. Activision ran tests to figure out how players behaved with variations in the matchmaking essentials.
Lil Snack & GamesBeat
GamesBeat is excited to partner with Lil Snack to have customized games just for our audience! We know as gamers ourselves, this is an exciting way to engage through play with the GamesBeat content you have already come to love. Start playing games now!
As a player, this is interesting to me. I play a lot of Call of Duty every year and I’m at level 167 in multiplayer this year (I haven’t played as much as usual). That’s equivalent to about 33 hours of multiplayer alone. During the pandemic, I really enjoyed chatting with three other friends while in Warzone matches, but my skill levels fluctuate based on what happens in the game. When Activision changed Warzone to two-shot kills for snipers, I suffered, as I was competent at sniping. When they changed it back to more one-shot kills, things got better for me.
Ping is king. Connection time is the most critical and heavily weighted factor in matchmaking. If it takes too long for a response to happen when you pull a trigger, players will notice the lag and will drop out. Time to match is the second most critical factor in matchmaking, as nobody wants to wait for a match to start. Also important is playlist diversity, as players want to play on different maps.
Players can edit which maps or modes they want to play in Quick Play settings. I don’t care so much for Search and Destroy, but I love Team Deathmatch, Domination, and Hardpoint. Skill and performance matter as players want the chance to have an impact on the game in every match.
The input device maters too, as you can choose a controller or mouse and keyboard. Lastly, platform matters as well, as to wheter it’s a PC or console. You can also play with voice chat enabled or disabled and this can affect play.
Skill gap
If there is a wide skill gap between the best players and the worst players, Activision found, to no surprise, that there was an increased likelihood of a player quitting. The likelihood for a player to quit during a match increased significantly across 80% of players within the study.
Within the sample study group, Activision found that 90% of players in the loosened skill group did not return at a higher rate than the control group. This indicated that low- to mid-skill players left the game in greater numbers.
It also found player pool changes have a negative impact on matchmaking for players of all skill levels as players can only match with and against who is online, regardless of any system designed to form lobbies.
Another example was a test to tighten skill in Call of Duty: Modern Warfare III. This had inverse results consistent with the results of the loosening test. Activision saw negative impacts for high-skilled players. As a result, this change was not rolled out as a standard approach in Call of Duty: Modern Warfare III, as the company strived for a balance in its approach to matchmaking.
Historical record
Skill as a consideration has been a factor in matchmaking for Call of Duty from as early as Call of Duty 4: Modern Warfare, which came out in 2007. In the early years of the franchise, the dev’s ability to formally experiment was limited, and so the company iterated game by game on its matchmaking approach.
That has changed. Call of Duty has over the last five years done a lot of work to test, examine, and refine its matchmaking process.
Since the release of Call of Duty: Modern Warfare (2019), the company’s testing capabilities have improved substantially. Activision can now able to run experiments with modern testing methodologies, which the company will explore in an upcoming entry in another white paper series, later this year (target timing may shift).
The next Matchmaking Series white paper will focus on Ranked Play later this summer.
Skill defined
For the purposes of core multiplayer matchmaking, Activision defines skill as how well a player can perform against the rest of the population in a given game mode, based on their previously observed performance. At a technical level, devs are interested in a value with the following properties:
- It should be constrained between two numbers, otherwise it is difficult to reason about the space of all possible skill values, making analysis of the distribution more difficult.
- It should be highly predictive: if we base your skill on a specific in-match performance metric (such as “kills per unit time”), it should also be a reliable predictor of your future performance as measured by this metric.
- It should be summable such that the average skill of multiple players is predictive of their combined skill. This allows for very efficient and predictive team balancing. Team balancing is very important for forming games where the outcome is unpredictable. Blowouts result in players leaving the game which adversely affects the player pool. Team balance itself is covered in more detail later in the document.
- It should be capable of adapting to a player’s ever-changing performance quickly.
- It should be resilient: the overall skill distribution should remain accurate in all situations. Simple skill algorithms can shift, inflate, deflate and even collapse when exposed to large population changes such as influxes of fresh players.
How skill is calculated
In Call of Duty, devs calculate skill based on a player’s relative performance on a specific metric. After each match, they compute this performance metric for each player. All players in the match are then compared to one another, regardless of team. Based on these comparisons each player’s recorded skill value is then updated.
The value of this skill adjustment is inversely proportional to the likelihood of a player achieving the outcome they did against the other players in the lobby. Note that the performance metric used only ever involves match performance; player progression or total time spent playing the game are not factored into skill.
This skill calculation involves several carefully selected parameters to achieve the five desired properties, referenced above. Seemingly sufficient performance metrics can have large downsides, that the company will explore next.
Here’s an evaluation of some simple performance metrics and their potential pitfalls when applied to Teach Deathmatch:
Match Total Kills. This value tells us how well a player did relative to the other opponents in their lobby at the main objective of the game. However, it has poor cardinality, as many players can achieve the same number of kills. This makes updating skill difficult, as many players will appear equally good based on this
performance metric. It also does not reflect a player’s ability to survive, which is an important outcome in Call of Duty as well. For example, a player with 10 kills and one death, is better than a player with 10 kills and 20 deaths.
Kill / Death Ratio. This value has much better cardinality, and it reflects both the primary and secondary objective of the game-mode. However, it does not account for self-kills, which is an easy mechanism of reverse boosting (artificial dropping your skill to get easier matches).
Kills / (Deaths by enemy). This value ensures players cannot artificially drop their skill by simply self-killing. However, a large problem remains; the magnitude of this value is the same for a player with 10 kills and one death regardless of if they played the full match or joined in the last minute.
The devs need to adjust for all the factors that contribute to or detract from a team’s performance while being resilient towards gaming the system. To achieve this, we are constantly iterating on our erformance metrics to optimize the player experience per game-mode.
How does skill change over time?
Player skill can vary over time for a variety of reasons. I actually get better if I play every day. Sadly, this does not happen.
Skill variation may happen because someone is experimenting with a new loadout, they haven’t played recently, or they are simply tired or distracted. It is therefore important that a player’s skill value is updated on an ongoing basis, and that it can be updated and reach equilibrium quickly.
Overcorrection can lead to large fluctuations in the skill of players that someone is matched with and against and can result in unfair matches. However, when a player’s abilities are stable, it is equally important that skill calculations find the stable midpoint quickly.
These two goals, stability and rapid correction, are largely at odds. A balance must be found between the stability and flexibility that best suits each core multiplayer game mode in Call of Duty.
However, even if skill could be tracked perfectly and all matches were made with completely equal opponents, many players will still experience significant loss or win streaks. For instance, in any string of five perfectly equal games, the equivalent of a binomial distribution of a coin flipped five times, about 3% of players will experience a five-game loss streak, and about 3% will experience a five-game win streak.
Why even track skill?
One of the core design principles of Call of Duty is Player First. Players of all levels should have a fun and competitive experience with the game. Team balance is the first and most important reason to track skill. If devsdon’t know how players will perform in a match, then they can’t provide a balanced in-match experience for players.
This results in blowouts, which we know are not fun for players on the losing end. Activision found that balancing skill against other matchmaking factors quantifiably increases the extent to which most players play and enjoy Call of Duty. When skill is utilized in matchmaking, 80% to 90% of players experience better end-of-match placement, stick with the game longer and quit matches less frequently. And of course, the goal is to keep us all coming back.
All these factors strongly encourage the long-term health of the Call of Duty player base, helping the title avoid the feedback loop of low-to-average skill players continually leaving the game as the average skill of the population rises. By avoiding this feedback mechanism, the remaining 10% to 20% of the player population benefits.
If low skill players engage with titles less, then higher and higher skilled players become the new low skill players (relatively speaking). As a result, they then experience the negative outcomes of being the lowest skilled players in the core multiplayer population, likely resulting in those players then returning at reduced rates.
This ultimately becomes a feedback loop, likely resulting in a player population of only the best of the best, and a very unwelcoming experience for any new players. As this would adversely impact the overall player pool, the net result would be a negative experience for all players. It would be a lion’s den.
Team balance
Team balance is vital for ensuring games are fair for players. The goal is to make the outcome of a match as unpredictable as possible. This reduces the probability of blowouts occurring, which are known to negatively correlate with self-reported “fun.”
In the absence of team balance, larger parties end up with a significant advantage, where even a slightly above average party would be statistically likely to be above the average team sampled from the population. For instance, a six-player party who are all in the 60th skill percentile would be rated higher than approximately 80% of randomly sampled six-player teams.
The observed win rate of a team in TDM given the differential between the sum skill of both teams. The X axis is in raw skill. In Figure 2 we can see that win rates are significantly affected by small team skill differences. For instance, let’s consider a lobby of 12 50th percentile players. If just one of those players
was an 80th percentile player instead, corresponding to a 0.1 increase in raw team skill, that players team would have a 70+% chance of winning.
What impact does skill have on the player experience?
Activision is always working to improve the quality of matchmaking in Call of Duty and rely on data driven approaches to evaluate success. There are two primary methods by which the devs have come to understand the impact of skill in matchmaking:
Testing of different skill matching approaches
Comparing match outcomes between titles in the Call of Duty franchise that have different skill implementations. When discussing this, the devs talk about tightening and loosening skill constraints. This is adjusting two parameters in the system.
One is allowing for the average skill of a party being added to a lobby to be farther from the average skill of the lobby (loosening) or requiring it to be closer (tightening).
The other is allowing for a lobby’s percentile skill disparity to drift further as a result of adding a new party (loosening) or restricting how far we let this drift (tightening).
Activision continually runs tests on various parts of the matchmaking system to find optimal configurations to improve fun while maintaining efficient matchmaking. While there is no direct measure for ‘fun’, the devs use data that indicates that players are enjoying the game, such as how long they continue to play the game, match-level quit rates, player surveys and match outcomes.
Call of Duty is a game where players can play together in parties. In some experimentation methodologies, this results in some mixing of the cohorts and the analysis of results can be complex. Testing matchmaking at this scale is a very interesting subject, independent of the discussion of skill. This is a topic the devs will discuss further in a future white paper focusing on experimentation methods.
As an example, in early 2024, Activision ran the Deprioritize Skill Test in Call of Duty: Modern Warfare III, where the team used an A/B test framework to loosen the constraints on skill in matchmaking. It’s important to note that skill, as a factor in matchmaking, was decreased for this test, but not removed entirely from the matching algorithm.
Based on the history of testing, completely removing skill from matchmaking would amplify the observed effects. This experiment is a repeat of a type of test that we have run at various times throughout the
last five years. The devs ran the 2024 test in North America and established a treatment group of 50% of the population. For the treatment group we loosened the skill constraints. The other half of the population was left with the standard configuration.
After a month of running this test, the devs categorized the treatment population into 10 equally sized groups across the skill population. Each bar represents the change in the labeled KPI for that 10th of
the skill distribution compared to the control group. The skill distribution is determined using an internal skill algorithm that tracks how good the devs believe a player to be, as described above.
As player skill is always fluctuating, the devs take the average of the skill values each user had during the test, then calculate the percentiles from these averaged values. For example, a player represented in the top 10% group in Figure 3, had an average skill value in the top 10% of all players seen during the experiment.
You can observe the percent difference in the number of players returning after 14 days between the treatment and control groups. With deprioritized skill, returning player rate was down significantly for 90% of players.
The 10% of highest skilled players came back in increased numbers, but in aggregate, the devs see meaningfully fewer players coming back to the game. This effect may appear small, but this change was observable within the duration of the test. This will compound over time, just like interest, and will have a meaningful impact on our player population.
This is a concern for all players, including the top 10%, as if this pattern is allowed to continue, players will exit the game in increased numbers. Eventually a top 10% player will become a top 20% player, and eventually a top 30% player, until only the very best players remain playing the game. Those original top players will become increasingly likely to not return to the game. Ultimately, this will result in a worse experience for all players, as there will be fewer and fewer players available to play with. Also, as noted
above, this test only deprioritized skill in the matching rules. If it were completely removed, the devs would expect to see the player population erode rapidly in the span of a few months, resulting in a negative outcome for all players.
The devs have also run experiments to tighten skill beyond our current configuration. This had inverse results, negatively impacting the high skill cohort. This change was not rolled out as a standard approach, as we continue to strive for a balance in our approach to matchmaking.
Quit rate
Quit rate is the likelihood for a player to quit throughout a match. Activision observed that the quit rate significantly increases across 80% of players, and only the top 10% see a meaningful decrease in quit rates. The devs have historically found that quit rates have a strong negative correlation with self-reported “fun” gathered through player surveys.
This will be a short-term benefit for the top 10% of players, however. As the accelerated departure of
players in the lower skill brackets takes hold, top 10% players will eventually drift down the skill distribution (as originally top 10% players will make up a larger and larger portion of the player base). As a result, the devs expect to see once top 10% players quit games at increasing rates as they become a 50th percentile player after much of the lower skill population has left the game.
Activision’s data showed how you can see the difference in the rate of blowouts occurring in TDM. A blowout is when a team in a lobby wins with a score delta greater than 30. This has increased for all players and has also been established as having a negative correlation with self-reported “fun.” The devs see similar results in other game modes.
Kills Per Minute is down significantly for the bottom 20% to 30% of players. The next 60% of players have no significant change, and the top 10% see significantly higher KPM. As with the other KPIs, the accelerated rate of low-skill players not returning to the game will result in players shifting to the left on this distribution over time.
The use of killstreaks and increased KPM and SPM shows that the wider lobby skill percentile disparity is disproportionality leveraged by the top 10% of players. Unfortunately, this increased performance comes at the cost of much greater impact to the much larger 30% of the population toward the bottom of the skill distribution.
Comparing match outcomes between different Call of Duty titles
The dev’s other opportunity to measure the impact of using skill as a factor in matchmaking is from
one game to another in Call of Duty. There’s variability in core multiplayer skill tightness/looseness across titles in the franchise, because they tune for skill differently relative to the other matching criteria.
The developers compare across games by amalgamating player match outcomes between two different games with different approaches to skill: one tighter, one looser.
Match outcome is a broad metric encompassing many factors including leaderboard placement, regardless of team interactions with game systems, like killstreaks, and interactions with the objective, such as hardpoints.
The devs can then look at match outcome differences between two games, across the skill distribution.
The range of potential outcomes an individual player achieves is widened in the title with tighter skill. In the title with low skill matching, a bottom decile player will place in the bottom half of a TDM match close to 90% of the time. In the title with tighter skill, a bottom decile player only experiences this about 75% of the time.
Using skill in matchmaking does not necessarily flatten the outcome graph, it reduces the severity of the slope. A completely flat outcome is included as reference. Even with more consideration for skill in matchmaking, higher skill players perform better than lower skill players by a significant margin, and still perform far better than they would if skill disparity was the top priority of the Call of Duty matchmaking system, which it is not.
Other historical testing
The above are just two examples of ways the devs can see the impact of skill on Call of Duty matchmaking. Skill as a consideration has been a factor in matchmaking for Call of Duty from as early as Call of Duty 4: Modern Warfare. In the early years of the franchise the ability to formally experiment was limited, and so the devs iterated game by game on matchmaking approach.
Since the release of Call of Duty: Modern Warfare (2019) our testing capabilities have improved substantially. Activision is now able to run experiments with modern testing methodologies, which the devs will explore in an upcoming entry in our white paper series, later this year (target timing may shift).
The developers can see that loosening skill negatively impacts our ability to keep players interested in the game. In a test similar to the Deprioritized Skill Test discussed above, the devs were able to see a significant decrease in the number of players playing Call of Duty: Modern Warfare (2019) and an increase in the overall match quit rate, when treated with a looser skill matching.
Subsequent attempts to protect only the bottom 25% of players and allow for looser matchmaking for the remaining 75% of players also had clear negative effects on player counts in two weeks, with increased quit rates, and reductions in total hours played.
Both of which are well established as negative indicators of self-reported “fun.” Another example was a test to tighten skill in Call of Duty: Modern Warfare III. This had inverse results consistent with the results of the loosening test. Quit rate was down for 90% of players and we saw other improvements in the experience of low-skill players (KPM and SPM). However, we observed negative impacts for high-skill players. As a result, this change was not rolled out as a standard approach in Call of Duty: Modern Warfare III, as we continue to strive for a balance in our approach to matchmaking.
Our goal has always been to make Call of Duty as enjoyable for as many players as we can, and we’ll continue to experiment with how we can provide a better experience for all our players.
How is skill incorporated into matchmaking?
Matchmaking targets are loosened over time in a pattern. The devs call these loosening patterns backoffs. As a search ages, the system becomes more willing to accept looser restrictions across all dimensions, as the absence of a match over time is an indicator that not enough players are available to form a match with the current targets.
The rate of these backoffs and the volume of available searches determines time to match. The devs have always backed off on skill more quickly than other matchmaking dimensions like Delta Ping, as outlined in the first white paper. Exactly how much is dependent on the game mode and game type.
Below is a detailed description of how skill is used in the matchmaking algorithm.
Skill percentiles
The devs refer to the skill values used during team balancing as a player’s Raw Skill. Raw skill is a normal distribution between -1.0 and 1.0, but for the purpose of skill grouping the devs would rather have a normalized uniform value. This can be achieved by converting raw skill into a percentile. A system constantly tracks population skill values, and the devs convert each player’s skill to a corresponding skill percentile.
The benefits of a skill percentile are that by default any matchmaking rules based on these values apply to all players equally, e.g. the bottom 30% and top 30% of the skill population get similar matchmaking times.
The downside of skill percentiles is that they are less indicative of a player’s skill level, so raw skill is used for team balancing.
Skill grouping
Skill grouping is a key factor to matchmaking with subtle differences across all game modes. The goal of skill grouping is to keep similarly skilled players together and to find optimal opponents for parties that have large skill differences in the best and worst players.
Call of Duty imposes no restrictions on how wide this skill gap can be for parties outside of ranked play and thus the best and worst players in the world can group up and the objective is to deliver a fun, fair match.
The devs have three overlapping systems that attempt to optimize skill grouping: A heuristic selection process, a skill grouping rule, and a skill disparity minimization rule. The combination of these systems achieves the intended goal. The skill similarity rule aims to keep the effective skill of parties in the lobby similar and the skill disparity rule tries to group parties with similar skill disparity.
Skill similarity rules
These matchmaking rules attempt to minimize the difference in the average skill of parties
in a lobby. In effect this acts to ensure the skill distribution in a lobby is roughly centered on
the average skill of the lobby.
As with delta ping this is a constraint that is loosened the longer a player’s search runs. The amount of skill similarity a search will accept is also modified by other factors such as which game mode is selected, if there is a high-quality lobby open for joins, and how many players are playing in a specific region.
The rules exist to primarily account for parties and to aid team balancing. Parties with high disparity are difficult to match fairly, take for instance a party of two players, Alice and Bob, Bob is an average player with 50% skill percentile and Alice is an elite player with 99% skill percentile.
If the devs matchmake them with Bob’s skill, Alice is practically guaranteed to be the best player in every lobby they join, more so than if she played solo. If the devs matchmake on Alice’s skill, then Bob will likely be the worst player in every lobby they join. Thus, the devs must match them in the middle such that the worst player gets some opponents of equal footing, while minimizing the inherent advantage of the better player.
Skill disparity rules
The skill disparity rules are concerned with minimizing the difference between the worst and the best player in a lobby. This rule works in tandem with the skill similarity rule to group parties with high disparity together where possible. As discussed above, parties with high skill disparity are inherently difficult to matchmake fairly, thus the more we can group similarly disparate parties together the less of an effect they have on the less disparate population.
The skill disparity rules loosen over time using the same mechanism as skill similarity and delta ping. Even though we can track and predict how long it may take to form a desirable match, this prediction can be off when fewer players search than expected. When this happens, looser constraints aid in the formation of a match in a reasonable length of time.
The skill disparity rule stops some searches from being included in a forming lobby with relatively high disparity.
Ranked Play
Skill is not isolated as a factor in matchmaking for Ranked Play chiefly due to game design. Ranked Play is designed to deliver an expressly competitive environment; accordingly, players must qualify for access to Ranked Play modes.
Many players who have qualified for Ranked Play still choose to enter the game in non-ranked playlists. For new players and those who do not participate in Ranked Play, it’s important they can contribute meaningfully to their team and their own personal in-game achievements. The next Matchmaking Series white paper will further detail Ranked Play.
How does skill impact other matchmaking KPIs across the skill spectrum?
One of the goals of the system is to give everyone a relatively similar matchmaking experience as mentioned in the introduction; a fair shot at achieving and experiencing the range of outcomes and events in Call of Duty. However, the population is highly asymmetric, with most parties, particularly disparate ones, sitting higher in the skill distribution.
The practical result of this is that matchmaking at the higher skill level requires more population to form equivalently equitable matches. Note that in the previous white paper of the Matchmaking Series, discussing the role of Ping, we stated that skill level has no impact on the latency experience. This was an oversimplification and should be clarified.
Skill level has a small impact on matchmaking outcomes, including Delta Ping and search time, but it is
minor and not strictly linear.
The rules of the Call of Duty core multiplayer matchmaking system are applied consistently
across the entire skill distribution to provide as fun and fair an experience as possible. While
this approach has been shown to support the long-term quality of our players’ experience,
the devs are always looking for ways to improve and will continue to experiment in this area.
Source link