# Aircraft and COVID-19: a Data Science Business Case -- without Data

*More than a year ago, in October 2020, COVID-19 made us face more
challenges than conceivable. The leading question was: how can we, the MMU,
continue our flying operations effectively and efficiently? I was happy
to provide a little scientific insight into this question.*

## The challenge

It is usually not a problem to gather an aircrew of – let’s say – ten people to operate an airplane. In 2020, though, this became quite a challenge. Most countries imposed several travel restrictions on the crew. The most pertinent restriction was having a negative COVID-19 test, not older than 48 or 72 hours.

These restrictions formed a handsome math problem. For ten people, it could be that one of them is infectious – and is tested positive. In that case, the aircrew is not complete. Thus, the flight mission would have needed to be canceled. The question was, how could we keep flying?

## Tackling the Challenge with Probability

Most people in our unit looked at the question like this: how should we schedule the tests of an aircrew so that if one is tested positive, a replacement could be found? Actually, this was the question they asked me. At work, I was tasked to find an arrangement with a laboratory for COVID-19 testing. For those who forgot: back then a PCR test was only available after 24 hours – even in the fastest labs.

As is often the case with me, I ignored what they told me. I rather listened to what they wanted. Instead of getting the laboratory on the phone, I tried to view the question from another angle.

I asked myself this question: how many persons should we test so that ten people will pass the test almost certainly? I assumed that instead of waiting for the test results and then sending in another person to test, we could start by testing more people right from the beginning.

## Rehearsal of the Classes in Probability

Luckily, I have to stay in touch with probability theory due to my part-time PHD (confer what my research is all about). Therefore, I was quite acquainted with the kind of math you need to solve this question. If read the next few paragraphs, it does not even look like a sophisticated thing. Indeed, the math is usually taught in the 9th grade for the first time.

Let us first look at only one person. There is a chance that the person gets tested positive. Then, there is a chance that he or she isn’t. The hard part was about estimating the probability for either case. Looking at an incidence of about 300 infections per 100.000 inhabitants, I assumed that the chance to get infected should be about 1% – just to stay on the safe side.

If we only look at this one person, we have a Bernoulli-Experiment. Let us define the failure of it whenever the person is tested positive. The chance for this is – by an educated guess – 1%. The chance that the person is tested negative is thus 99%. We will call a negative test a success.

The next step is about combining multiple such Bernoulli-Experiments. One Bernoulli-Experiment followed by another one is called a Bernoulli-Chain. Let us form our Bernoulli-Chain with two persons for the beginning. Let’s call them Julie and Mike. There is a chance that both Mike and Julie are tested negative. The chance of this is 99% * 99% = 98.01%. So we multiplied “success times success”. For all these combinations, have a look at the table below.

Julie/Mike |
tested positive (1%) | tested negative (99%) |
---|---|---|

tested positive (1%) |
1% * 1% = 0.01% | 99% * 1% = 0.99% |

tested negative (99%) |
1% * 99% = 0.99% | 99% * 99% = 98.01% |

## Meet the Binomial Distribution

Based on this table for two persons, we could answer how likely it is that two out of two are tested negative. The probability for this is 98.01%. More interesting, how likely is it that (at least) one person is tested negative? In other words, what is the combined probability of Julie being tested negative and Mike positive, the other way around, and that both are tested negative? The answer to this would be 99.99%. You just need to sum the probabilities in the table above.

I rephrased the question once more to make the math easier. The new question is: what is the probability of Julie being tested positive, or Mike, or none of them? Because, if we need at least one person, this combination of probabilities will give the answer.

If we view the question like this, we can use a so-called cumulative probability distribution for the Binomial Distribution. Note: the Binomial Distribution is the distribution we can use to model a Bernoulli-Chain. It looks like this:

Here, *n* represents the number of persons we send to the test facility. In
terms of probability, this is the number of trials. *k* stands for the number
of positive tests that should happen – at max. In other words, there should
be no more than *k* positive tests. This is the
number of successes. *p* is then the probability that one test turns out
positive. In terms of probabliity, this his the success probability.

In our example, it would look like this:

This calculation returns the chance that for Julie and Mike (n=2) either one or none of them is tested positive (k=1).

For the details of the formula: the sum is “over no one is sick” (i=0) to “one
person is sick” (i=1). Then, there comes the binomial coefficient. You
remember that to have one positive test, it could be either Mike or Julie?
From my point of view, the event of Mike being sick has the same probability as
Julie being sick. This is what statisticians call independent. So, we can
compute the probability of one of them and multiply it by two. This is exactly
what the binomial coefficient is doing. It tells, how many ways there are that
you have exactly *i* positive tests. Then, there comes the sum of a
probability for being tested positive (0.01) times being not tested positive
(0.99 = 1 - 0.01). If you want further details, Wikipedia does a great job on
that one, confer Wikipedia > Binomial
Distribution.

## Modeling this in Excel

For the better or worse, you do not need to know about many of the details of the formula from above. In Excel, the above formula is behind BINOM.DIST. You can look it up in the Microsoft Documentation. As you see, the function in Excel has four arguments.

Luckily, the arguments are well named. The first argument *number_s* is *k*,
so the number of successes. The next one is *trials*, formerly *n*, so the
number of – yeah – trials we had. Then there is *probability_s*, the
probability for one success. The last argument is rather an option. We do
want to know the probability if both *k=0* and *k=1*. Setting *cumulative* to
*True* makes the BINOM.DIST return the cumulative function.

If we have a look at the Excel screenshot from above, we see that the
probability of having not more than one person tested positive is *0.9999*.
This is equal to the chance of having at least one person tested negative –
what we need.

We can now apply this to our problem. Recall, we want to know the probability
that at least *10* persons out of a pool of *10*, *11*, …, *15* people is
tested positive. To get the numbers right, we need to translate this into math
as we did previously. Let’s start with the *10* people. If I have *10*
people, how many people may maximally be tested positive before we have to
cancel the flight? The answer is *0*. If only one of them is sick, the
mission will be canceled. Next, how many people may be maximally tested
negative if I have *11* people? No more than *1* of them.

The screenshot above shows the probabilities for the different numbers of
people tested. If you look at it, then you see that by testing not *10* but *11*
people, the chance for a successful mission goes way up.

## Finding a preliminary answer: test 12 people

The original goal was this: we want to make sure the aircraft takes off. For
that, we need at least *10* people with a negative test. We want to reach that
goal with as much certainty as makes economically sense. For me, I conclude
from the above table we should test *12* persons.

But is this a good choice? If I ask this, it probably is not. To understand that, we need to inspect the assumption behind the math things we were using. One critical assumptions behind Bernoulli-Chains is independence.

Independence means, that the events do never influence one another. So, if Julie was infected, then Mike could still be uninfected with the same probability. In the case of our aircrew, this means that they would all do only home office and never meet each other. However, this is – of course – not the case. They work together. They fly together in an airplane. They hang out together. Thus, if one is infected, there is a higher probability for others to be infected, too.

I could have tackled this independence issue in an easy and a hard way. The hard way is to identify a proper statistical model. That is a nice, time-intensive task. The easy, heuristic way is to just adapt the probability of a positive test. I prefer heuristicts. Instead of 1%, I assumed the test to be positive with a chance of 5%. The results are in the next Screenshot.

We see, that a good recommendation would be to test between 12 and 14 people.

## But I was all wrong

I presented to my colleagues the results – and I will never forget their reactions. They were highly appreciative of the work I had put into this. But they, as well, told me that I was all wrong.

And that is where we all began to learn something. The numbers above are correct – given the assumptions. As always in statistics and mathematical modeling, the assumptions need to be examined.

My colleagues pointed out, that an essential assumption was off. They told me, that an aircrew is not just a bunch of ten people. Instead, an aircrew needs to cover various tasks. Some of those tasks can only be taken over by specially trained persons. In the stage we were, some of those persons were almost irreplaceable.

Combining the math and their domain knowledge, we came up with various plans. Some plans aimed to train more people, so that more people would have the skills that were then rare. Other plans were about how many people to test. Again other plans defined what to do with a positive test result at different stages of a flight mission.

## The Takeaway

The takeaway for us was thus twofold. Based on the math, we could as well economically explain that testing more persons than needed was worth the effort. Where testing more people was not an option, we analysed the composition of the aircrew and designed plans to overcome bottlenecks.

Would we have gotten there without my calculations? I am quite sure. However, I doubt if the intensity would have been the same. Still, I remember the uproar upon my e-mail in which I had explained the above. People put their heads together together and examined the issues.

Math could not (and can’t!) solve such issues. But math can get the conversation started.