There is a pretty standard interview question that I am sure many of you have seen. Given a list of numbers from 1 to n with a single number missing, what is the missing number?

The standard solution is to recall that:

$\sum\limits_{i=1}^n i = n(n+1)/2$

So you find the sum from 1 to n, then subtract the sum of the given list and the result is the missing number.

A think a more interesting problem is given a list of numbers from 1 to n with a single number missing, randomized, and have no discernable spaces; what is the missing number?

So for the numbers 1 to 10, you would be given a string of numbers that was generated from the list of 1 to 10; with no spaces and in random order, but numbers are still intact. 1037845192 could be one such string. 1037845192 was generated from randomizing the list of numbers from 1 to 10, removing a random number, then stripping all of the spaces. Here we can easily tell that 6 is the missing number, but if the range of numbers was larger it would be much more difficult. For example 1 to 255 makes it considerably harder: 224168211247217223419510461241962061764110317111520375129189235242574250302431504918423680128112321162040281231221612116418011233635616110718758131531421521792623181137178471532521861812274312222634259784251248219246241138144163222021654351719313533791831097224938141147225441852132531721731451972552292151482381431612288899778260130365234213118718310748156220922049417413923266212690376755254205119239160781755116619911420859661251626519413620773223120239115950190157132209100391332496212154951062910454763115517014915201931022401018719111727200192869158221998182441461511341127218611168105188230218214198237113182245210222169458916764857019140177576114. Now it is not apparent where one number ends and another begins.

Given this long string of digits, how can we decide which number is missing?

I think the summation approach is still valid, but the we can't use the closed-form from above however because we don't number where numbers start. We can instead add the digits together.

Following the same 1 to 255 example:

$\sum\limits_{i=1}^{255} digits(i) = 2382$

Following the same approach as above, we sum the string of numbers

sum(224168211247217223419510461241962061764110317111520375129189235242574250302431504918423680128112321162040281231221612116418011233635616110718758131531421521792623181137178471532521861812274312222634259784251248219246241138144163222021654351719313533791831097224938141147225441852132531721731451972552292151482381431612288899778260130365234213118718310748156220922049417413923266212690376755254205119239160781755116619911420859661251626519413620773223120239115950190157132209100391332496212154951062910454763115517014915201931022401018719111727200192869158221998182441461511341127218611168105188230218214198237113182245210222169458916764857019140177576114) = 2377


If we take the difference of these two sums we get 2382 - 2377 = 5. The sum for this string of digits was 2377 (5 less than the total), so this means that the possible missing numbers are any digits >= 1 and <= 255 that sum to 5.Which makes our possible numbers: 5, 14, 41, 23, 32, 50, 131.

How can we determine now what numbers are missing? Well if we examine the frequencies of digits, we should be able to pick out the exact missing number.

Frequency of all digits from numbers 1 to 255:

    { '0' => 45,
'1' => 156,
'2' => 112,
'3' => 56,
'4' => 56,
'5' => 52
'6' => 45,
'7' => 45,
'8' => 45,
'9' => 45,
}


Frequence of all digits from numbers 1 to 255, with a number missing:

    { '0' => 45,
'1' => 154,
'2' => 112,
'3' => 55,
'4' => 56,
'5' => 52
'6' => 45,
'7' => 45,
'8' => 45,
'9' => 45,
}


If take the difference of these two frequencies we can see the digits that were missing:

    { '0' => 0,
'1' => 2,
'2' => 0,
'3' => 1,
'4' => 0,
'5' => 0
'6' => 0,
'7' => 0,
'8' => 0,
'9' => 0,
}


From this we can tell that there was exactly two missing 1s and one missing 3, the only number that matches from our possible list is 131. Which means that we can say with certainty that 131 is the missing number.

Aside:
What if the range of numbers was from 1 to 500 and the difference in sums was 5? Well then we would have both 131 and 311 in our possible numbers, and I can't come up with a way to say deterministically how I could choose one over the other. Any thoughts on the subject are welcome!