*More than a year ago, in October 2020, COVID-19 made us face more
challenges than conceivable. The leading question was: how can we, the MMU,
continue our flying operations effectively and efficiently? I was happy
to provide a little scientific insight into this question.*

It is usually not a problem to gather an aircrew of – let’s say – ten people to operate an airplane. In 2020, though, this became quite a challenge. Most countries imposed several travel restrictions on the crew. The most pertinent restriction was having a negative COVID-19 test, not older than 48 or 72 hours.

These restrictions formed a handsome math problem. For ten people, it could be that one of them is infectious – and is tested positive. In that case, the aircrew is not complete. Thus, the flight mission would have needed to be canceled. The question was, how could we keep flying?

Most people in our unit looked at the question like this: how should we schedule the tests of an aircrew so that if one is tested positive, a replacement could be found? Actually, this was the question they asked me. At work, I was tasked to find an arrangement with a laboratory for COVID-19 testing. For those who forgot: back then a PCR test was only available after 24 hours – even in the fastest labs.

As is often the case with me, I ignored what they told me. I rather listened to what they wanted. Instead of getting the laboratory on the phone, I tried to view the question from another angle.

I asked myself this question: how many persons should we test so that ten people will pass the test almost certainly? I assumed that instead of waiting for the test results and then sending in another person to test, we could start by testing more people right from the beginning.

Luckily, I have to stay in touch with probability theory due to my part-time PHD (confer what my research is all about). Therefore, I was quite acquainted with the kind of math you need to solve this question. If read the next few paragraphs, it does not even look like a sophisticated thing. Indeed, the math is usually taught in the 9th grade for the first time.

Let us first look at only one person. There is a chance that the person gets tested positive. Then, there is a chance that he or she isn’t. The hard part was about estimating the probability for either case. Looking at an incidence of about 300 infections per 100.000 inhabitants, I assumed that the chance to get infected should be about 1% – just to stay on the safe side.

If we only look at this one person, we have a Bernoulli-Experiment. Let us define the failure of it whenever the person is tested positive. The chance for this is – by an educated guess – 1%. The chance that the person is tested negative is thus 99%. We will call a negative test a success.

The next step is about combining multiple such Bernoulli-Experiments. One Bernoulli-Experiment followed by another one is called a Bernoulli-Chain. Let us form our Bernoulli-Chain with two persons for the beginning. Let’s call them Julie and Mike. There is a chance that both Mike and Julie are tested negative. The chance of this is 99% * 99% = 98.01%. So we multiplied “success times success”. For all these combinations, have a look at the table below.

Julie/Mike |
tested positive (1%) | tested negative (99%) |
---|---|---|

tested positive (1%) |
1% * 1% = 0.01% | 99% * 1% = 0.99% |

tested negative (99%) |
1% * 99% = 0.99% | 99% * 99% = 98.01% |

Based on this table for two persons, we could answer how likely it is that two out of two are tested negative. The probability for this is 98.01%. More interesting, how likely is it that (at least) one person is tested negative? In other words, what is the combined probability of Julie being tested negative and Mike positive, the other way around, and that both are tested negative? The answer to this would be 99.99%. You just need to sum the probabilities in the table above.

I rephrased the question once more to make the math easier. The new question is: what is the probability of Julie being tested positive, or Mike, or none of them? Because, if we need at least one person, this combination of probabilities will give the answer.

If we view the question like this, we can use a so-called cumulative probability distribution for the Binomial Distribution. Note: the Binomial Distribution is the distribution we can use to model a Bernoulli-Chain. It looks like this:

Here, *n* represents the number of persons we send to the test facility. In
terms of probability, this is the number of trials. *k* stands for the number
of positive tests that should happen – at max. In other words, there should
be no more than *k* positive tests. This is the
number of successes. *p* is then the probability that one test turns out
positive. In terms of probabliity, this his the success probability.

In our example, it would look like this:

This calculation returns the chance that for Julie and Mike (n=2) either one or none of them is tested positive (k=1).

For the details of the formula: the sum is “over no one is sick” (i=0) to “one
person is sick” (i=1). Then, there comes the binomial coefficient. You
remember that to have one positive test, it could be either Mike or Julie?
From my point of view, the event of Mike being sick has the same probability as
Julie being sick. This is what statisticians call independent. So, we can
compute the probability of one of them and multiply it by two. This is exactly
what the binomial coefficient is doing. It tells, how many ways there are that
you have exactly *i* positive tests. Then, there comes the sum of a
probability for being tested positive (0.01) times being not tested positive
(0.99 = 1 - 0.01). If you want further details, Wikipedia does a great job on
that one, confer Wikipedia > Binomial
Distribution.

For the better or worse, you do not need to know about many of the details of the formula from above. In Excel, the above formula is behind BINOM.DIST. You can look it up in the Microsoft Documentation. As you see, the function in Excel has four arguments.

Luckily, the arguments are well named. The first argument *number_s* is *k*,
so the number of successes. The next one is *trials*, formerly *n*, so the
number of – yeah – trials we had. Then there is *probability_s*, the
probability for one success. The last argument is rather an option. We do
want to know the probability if both *k=0* and *k=1*. Setting *cumulative* to
*True* makes the BINOM.DIST return the cumulative function.

If we have a look at the Excel screenshot from above, we see that the
probability of having not more than one person tested positive is *0.9999*.
This is equal to the chance of having at least one person tested negative –
what we need.

We can now apply this to our problem. Recall, we want to know the probability
that at least *10* persons out of a pool of *10*, *11*, …, *15* people is
tested positive. To get the numbers right, we need to translate this into math
as we did previously. Let’s start with the *10* people. If I have *10*
people, how many people may maximally be tested positive before we have to
cancel the flight? The answer is *0*. If only one of them is sick, the
mission will be canceled. Next, how many people may be maximally tested
negative if I have *11* people? No more than *1* of them.

The screenshot above shows the probabilities for the different numbers of
people tested. If you look at it, then you see that by testing not *10* but *11*
people, the chance for a successful mission goes way up.

The original goal was this: we want to make sure the aircraft takes off. For
that, we need at least *10* people with a negative test. We want to reach that
goal with as much certainty as makes economically sense. For me, I conclude
from the above table we should test *12* persons.

But is this a good choice? If I ask this, it probably is not. To understand that, we need to inspect the assumption behind the math things we were using. One critical assumptions behind Bernoulli-Chains is independence.

Independence means, that the events do never influence one another. So, if Julie was infected, then Mike could still be uninfected with the same probability. In the case of our aircrew, this means that they would all do only home office and never meet each other. However, this is – of course – not the case. They work together. They fly together in an airplane. They hang out together. Thus, if one is infected, there is a higher probability for others to be infected, too.

I could have tackled this independence issue in an easy and a hard way. The hard way is to identify a proper statistical model. That is a nice, time-intensive task. The easy, heuristic way is to just adapt the probability of a positive test. I prefer heuristicts. Instead of 1%, I assumed the test to be positive with a chance of 5%. The results are in the next Screenshot.

We see, that a good recommendation would be to test between 12 and 14 people.

I presented to my colleagues the results – and I will never forget their reactions. They were highly appreciative of the work I had put into this. But they, as well, told me that I was all wrong.

And that is where we all began to learn something. The numbers above are correct – given the assumptions. As always in statistics and mathematical modeling, the assumptions need to be examined.

My colleagues pointed out, that an essential assumption was off. They told me, that an aircrew is not just a bunch of ten people. Instead, an aircrew needs to cover various tasks. Some of those tasks can only be taken over by specially trained persons. In the stage we were, some of those persons were almost irreplaceable.

Combining the math and their domain knowledge, we came up with various plans. Some plans aimed to train more people, so that more people would have the skills that were then rare. Other plans were about how many people to test. Again other plans defined what to do with a positive test result at different stages of a flight mission.

The takeaway for us was thus twofold. Based on the math, we could as well economically explain that testing more persons than needed was worth the effort. Where testing more people was not an option, we analysed the composition of the aircrew and designed plans to overcome bottlenecks.

Would we have gotten there without my calculations? I am quite sure. However, I doubt if the intensity would have been the same. Still, I remember the uproar upon my e-mail in which I had explained the above. People put their heads together together and examined the issues.

Math could not (and can’t!) solve such issues. But math can get the conversation started.

]]>After finishing my master’s degree, I wanted to continue to work scientifically. Not only that, I wanted to go as analytical as possible because number crunching is what I adore. So, I applied at the chair of business analytics and management science of my university; since January 2017, I belong to their externals. The field of our research is commonly named as well as operations research.

My research subject are Shared Mobility Systems (SMS). Probably, most know or have already one. The most famous one is likely ShareNow, which emerged from a fusion between Car2Go (aka Mercedes) and DriveNow (aka BMW and Sixt).

Though, if you have never heard of SMS, you can imagine them like a highly flexible car rental. Within a city, a SMS allows you to rent a vehicle of their fleet at any time, at any place. You usually only pay a price per minute of use.

Apart from DriveNow, there are several other providers and, moreover, kinds of SMS. Some SMS rent bicycles (like Velib, oBike, or Uber Bike). Others rent scooters. Some allow their users to do almost anything with their vehicles. Others are very restrictive on their use and pricing. However, most of them suffer from one problem.

The one problem they all have is this: they suffer from really low revenues. It is that bad, that numerous SMS either chose or were forced to close their business. To give you a hint, even a well established SMS with more than 100 scooters in one of Europeans capitals will make less than 100,000 EUR a year.

Sad, isn’t it? Now we have a technology – SMS – which reduces CO2 consumption, helps the poor, and frees up space in cities. But, no one will offer this technology, because it is not worthwhile using it, yet.

This is what I and others research: how can we make SMS more profitable? While there are many different levers to drive profits, I focus on the price per minute. In detail, I try to find prices per minutes so that the revenue goes up.

As you can imagine, finding the best price is a daunting task. You could set prices based on time. You could vary them based on location. You could combine those two factors. You could furthermore distinguish prices based on how long a user rents a vehicle. Or, you could change the price depending on the itinerary of the customers. Options are endless.

I first tried modeling and solving this challenge with a “classic” mathematical model. I implemented this in the Python, Pyomo, and GLPK. I failed horribly. The takes way too long to compute, even for commercial solvers like Gurobi and a hardcore server with 224 cores. Been there, done that. By the way, this summarises two years of research.

Thus, I opted for a simulation approach. VoilĂ , this is where operations research meets machine. research. I am stil using Python, but this time to write a simulation of an SMS. Then, I test out different pricing strategies in this simulation. As I have laid out, there are myriads of possibilities to set prices – so I apply methods from machine learning to find the right prices (a lot) faster.

Lastly, you may wonder, why Python? The reason is historical. Before I was doing all my work in Python, I had been working in R and Clojure for a long time. Since there were no real optimisation libraries available for neither R nor Clojure, I went over to the Python club. Since then I have grown fond of Python and even Object Oriented Programming. Still, I miss the days of writing a concise chain of piped functions and working on only a single type of data structure.

Indeed, for visualisations I still use R, especially ggplot2. I have to admit, I never really looked into the Python equivalent, into matplotlib. Saving my files as CSV or Sqlite and then analysing them in R is just so much more comfortable to me. Apart from that, I prefer the style of ggplot2.

My hope is that by the end of 2024 I will have finished all the writing and hand in my dissertation. But, life is a miracle. There are no guarantees. When I started out in 2017, I thought that by now I would already have two publications. Life has been a rollercoaster both professionally and in private – so I learned to not stress myself. I try to work between seven and ten hours per week while being as effective as possible. There is not much more I can do, frankly.

]]>