Using Data Proximity For Data Transmission

Part 1: Introduction

The current form of data transmission is to transfer the data bit-by-bit. The sender and receiver stay in contact throughout the entire process. What if the data could be generated on the receiver end, and the sender alert the receiver as the data reaches the proximity of the correct data?

This would be possible because every digital file is a number, which can be reached by simply counting up from zero. The same techniques that measure distance and time can also be used to measure the correctness of data.

All digital information that has ever existed, or ever will exist, currently exists on a one-dimensional line. The only difference between two pieces of information are their positions on the line. Every piece of digital information, regardless of its size or complexity, exists as a single point on that line.

Imagine yourself in an open field with a friend. You’ve hidden a key at a specific location in the field, exactly 2,174 ft. to the north of where you are currently standing, and you decide to send your friend out to retrieve it. You can tell your friend to start running north, but you know that it’s going to take some time for him to get anywhere near the location, so you don’t have to tell him to “keep running” after every step he takes. You might wait until you see him get within 2,000 ft. of the key before telling him that he’s getting close. Then tell him to “slow down” when he gets within 2,100 ft., and tell him to “stop” when he gets within 2,170 ft. That is the point where he would need to search for the exact location of the key.

In the same way that you told your friend to change his speed as he reached various points on his way to the location of the key, a computer can be told to change counting algorithms when it reaches certain points while counting to the number representing the desired data. The actual location of the data can be represented by a checksum hash that is significantly smaller than the size of the data, but still confirms with absolute certainty that the data has been found.

In order to send data through counting algorithms, I created a collection of functions and algorithms called Shifting Exponential Iteration Counting Algorithm Compression, also known as SEICAC (pronounced “sci-kak”). At the time of writing this, many of these concepts and algorithms are still experimental and have not been tested with all forms of real-world data. To simplify the initial development and understanding of the concepts, the data used to test the SEICAC algorithms are all base-10 integers.

A computer sending a piece of data using SEICAC can use certain properties of the data to tell the receiving computer which counting algorithms to use to replicate the data. I call these properties “the data between the data” because they do not add to the size of the data itself, but can be deduced from characteristics of the numbers that make of the data, such as their order, average, sum, etc.

Part 2: Knowing where to count

The first thing the receiving computer will need to know is the range of numbers in which it should be counting. A minimum and maximum value of a large number can be calculated by looking at the following properties: the number of digits, the value of the first digit, and the highest and lowest difference between two consecutive digits. The first two properties should be easy to determine by simply looking at the number, and the last two properties can be found using these two PHP functions:

function digitdiffmax($inputval="0") {
 // The maximum difference between two digits in a number
 if (!is_numeric($inputval) || $inputval < 10) {
  return 0;
 }
 $inputvalarray = str_split($inputval);
 $numdigits = count($inputvalarray);
 $maxdifference = -9;
 for ($i = 1; $i < $numdigits; $i++) {
  if (($inputvalarray[$i] - $inputvalarray[$i-1]) > $maxdifference) {
   $maxdifference = $inputvalarray[$i] - $inputvalarray[$i-1];
  }
 }
 return $maxdifference;
}

function digitdiffmin($inputval="0") {
 // The minimum difference between two digits in a number
 if (!is_numeric($inputval) || $inputval < 10) {
  return 0;
 }
 $inputvalarray = str_split($inputval);
 $numdigits = count($inputvalarray);
 $mindifference = 9;
 for ($i = 1; $i < $numdigits; $i++) {
  if (($inputvalarray[$i] - $inputvalarray[$i-1]) < $mindifference) {
   $mindifference = $inputvalarray[$i] - $inputvalarray[$i-1];
  }
 }
 return $mindifference;
}

Once those properties are known, SEICAC first determines the minimum value that a number could be, by using the number of digits, first digit, and minimum difference between two consecutive digits:

function findminvalue($numdigits, $firstdigit, $diffmin) {
 $currentnum = pow(10, $numdigits-1);
 $currentnum = $currentnum * $firstdigit;
 $currentnumstr = (string)$currentnum;
 $currentnumarray = str_split($currentnumstr);
 for ($i = 1; $i < $numdigits; $i++) {
  $nextdigit = $currentnumarray[$i-1] + $diffmin;
  if ($nextdigit < 0) {
   $nextdigit = 0;
  }
  $currentnumarray[$i] = $nextdigit;
 }
 return implode($currentnumarray);
}

After the minimum value is known, SEICAC then calculates the maximum value using a similar technique based on the properties: number of digits, first digit, and the maximum difference between two consecutive digits.

function findmaxvalue($numdigits, $firstdigit, $diffmax) {
 $currentnum = pow(10, $numdigits-1);
 $currentnum = $currentnum * $firstdigit;
 $currentnumstr = (string)$currentnum;
 $currentnumarray = str_split($currentnumstr);
 for ($i = 1; $i < $numdigits; $i++) {
  $nextdigit = $currentnumarray[$i-1] + $diffmax;
  if ($nextdigit > 9) {
   $nextdigit = 9;
  }
  $currentnumarray[$i] = $nextdigit;
 }
 return implode($currentnumarray);
}

Once the range is known, then the computer receiving the data can either start at the minimum value and count up until the actual number is found, or it can start at the maximum value and count down. By looking at the average difference of the digits of both the minimum and maximum values, SEICAC can often predict whether the actual number is closer to the minimum value or the maximum value.

By taking the average difference of the digits of the range values (excluding the first digit), SEICAC can estimate the accuracy of those values. The higher the absolute value of the average is, the more likely that the range value is accurate. This is calculated using the PHP code below. The minimum value of the range has been assigned to the variable $minval while the maximum value has been assigned to the variable $maxval.

function digitdiffavg($inputval="0") {
 if (!is_numeric($inputval) || $inputval < 10) {
  return 0;
 }
 $inputvalarray = str_split($inputval);
 $numdigits = count($inputvalarray);
 $differencearray = array();
 for ($i = 1; $i < $numdigits; $i++) {
  $differencearray[] = ($inputvalarray[$i] - $inputvalarray[$i-1]);
 }
 return array_sum($differencearray) / count($differencearray);
}
$maxdiffavg = digitdiffavg(substr($maxval, 1));
$mindiffavg = digitdiffavg(substr($minval, 1));

If you take the absolute value of $maxdiffavg, and it is greater than the absolute value of $mindiffavg, then the actual number is probably closer to the maximum value of the range. Therefore, it would be best to start with the maximum value and count down until the actual number is found. Otherwise, the actual number is probably closer to the minimum value, so it should start there and count up.

Part 3: Counting to the result

Once the computer receiving the data knows where to start counting, it can begin counting until it reaches the number that represents the data. After each count, it should compare the hash of the current number with the hash of the desired number, to see if it has located the data it was looking for. Instead of counting one number at a time, the algorithm should use the properties of the number to count in the largest increments possible without overshooting the number. As a simple example, if counting to an even number, the algorithm can count in multiples of 2.

By counting in the largest increments possible, the algorithm reduces the number of numbers that need to be checked. This not only decreases the amount of time required to reproduce the original number, but it also shortens the length of the hash that is required to prevent a collision during the counting process.

The first counting algorithm I created for SEICAC is Shifting Exponential Iteration, or SEI. SEI is a function that can perform a set number of mathematical operations on exponential numbers, while incrementing or decrementing the base and/or the power after each operation. An example would be:

2^5 + 4^6 + 6^7 = 284064

In this example, the base was set to 2 and the power was set to 5. The SEI function has been set to run an iteration until the base reaches 6. In each iteration, the base is increased by 2, and the power is increased by 1. The parameters of the SEI function can set the starting values of the base and power, along with different incremental or decremental values of each after every iteration. Another parameter can set the type of operation done between each iteration, such as addition, subtraction, multiplication, division, or exponential. There are other operations that can decrease the incremental rates as the result approaches larger values. SEI is able to produce a variety of very large numbers from small input values.

Although SEI has the potential to count to very large numbers, I’m still figuring out how to use SEI to replicate specific large numbers so that it can be used as a practical compression algorithm. In the meantime, I created a second counting algorithm named CAC that uses a hash and properties of a specific number to replicate that number.

CAC uses some of the same properties that were used to determine the maximum and minimum values for the counting range: the number of digits, and the highest and lowest difference between two consecutive digits. The hash should be the smallest possible hash that can validate the desired number while avoiding a collision with any of the numbers that will be counted by CAC when using the properties of the desired number.

As explained in detail earlier, CAC will either start at the minimum value that a number can be and count up, or it will start at the maximum value and count down. The counting algorithm is a recursive function that fills in the possible digits from left to right, based on the known difference ranges between the digits. If starting from the minimum value and counting up, the function would start with the left digit, and then add the minimum difference to the digit next to it. It would then increment that digit until it reaches the maximum difference, and then it will move on to the next digit. After every increment, the function will take the hash of the current number, and compare it to the hash of the desired number.

If the hash has not been matched when the algorithm reaches the maximum value that the desired number could have been, then either the provided properties or the hash must have been invalid, and the function should return an error. The PHP code of the CAC function, along with its recursive incrementing function, is as follows:

function cac($hash, $numdigits, $firstdigit, $diffmax, $diffmin) {
 // Reconstructs a number by counting until the hash value is matched
 $hashsize = strlen($hash);
 $result = "0";
 // Find the maximum and minimum values that the number could be
 $maxval = findmaxvalue($numdigits, $firstdigit, $diffmax);
 $minval = findminvalue($numdigits, $firstdigit, $diffmin);
 // Estimation of the accuracy of maxval and minval
 $maxdiffavg = digitdiffavg(substr($maxval, 1));
 $mindiffavg = digitdiffavg(substr($minval, 1));
 if (abs($maxdiffavg) > abs($mindiffavg)) {
  // The number is probably closer to the max value
  // Start there and count down until the number is found
  $currentnumarray = str_split($maxval);
  while(implode($currentnumarray) >= $minval && $result == "0") {
   countdown($currentnumarray, $result, $hash, $hashsize, $numdigits,
 $diffmax, $diffmin, 2);
  }
 } else {
  // The number is probably closer to the min value
  // Start there and count up until the number is found
  $currentnumarray = str_split($minval);
  while(implode($currentnumarray) <= $maxval && $result == "0") {
   countup($currentnumarray, $result, $hash, $hashsize, $numdigits,
 $diffmax, $diffmin, 2);
  }
 }
 if (md5truncate($result,$hashsize) == $hash) {
  return $result;
 } else {
  return false;
 }
}
function countup(&$numarray, &$result, $hash, $hashsize, $numdigits,
 $diffmax, $diffmin, $position) {
 // Incrementing counter used within cac()
 $enddigit = $numarray[$position-1]+$diffmax;
 if ($enddigit > 9) {
  $enddigit = 9;
 } elseif ($enddigit < 0) {
  $enddigit = 0;
 }
 while($numarray[$position] <= $enddigit) {
  if ($position == $numdigits-1) {
   if (md5truncate(implode($numarray),$hashsize) == $hash) {
    $result = implode($numarray);
   }
   $numarray[$position] = $numarray[$position] + 1;
  } else {
   countup($numarray, $result, $hash, $hashsize, $numdigits, $diffmax,
 $diffmin, $position+1);
  }
 }
 $startdigit = $numarray[$position-1]+$diffmin;
 if ($startdigit > 9) {
  $startdigit = 9;
 } elseif ($startdigit < 0) {
  $startdigit = 0;
 }
 $numarray[$position] = $numarray[$position] = $startdigit;
 $numarray[$position-1] = $numarray[$position-1] + 1;
 return true;
}

The decrementing countdown function has been omitted because it is essentially the same as the incrementing function, but counts in the opposite direction.

One possible way to optimize the CAC function is to include another property of the desired number: a digit count table. This is the number of times that each numerical value occurs within the number, and SEICAC includes a function that can build a digit count table:

function digitcounttable($inputval="0") {
 // Builds a table of the number of times each digit between
 // 0-9 is in a number.  Returns the table as a 10 digit number
 // as an array.
 // The sum of the digits in the result should equal the
 // number of digits in the input number.
 // Example 1: 234370 would return 1012100100
 // Example 2: 501523373 would return 1113020100
 $inputvalstr = (string)$inputval;
 $inputvalarray = str_split($inputvalstr);
 $result = array();
 for ($i = 0; $i <= 9; $i++) {
  $result[$i] = count(array_keys($inputvalarray, $i));
 }
 return $result;
}

The CAC function could use this information to skip many of the possible digit values, even if they fit between the minimum and maximum difference between two consecutive digits. For example, if the digit count table indicates that there are two 3’s in a five-digit number, and the number currently being iterated by CAC is 3153x, then the unknown digit could not be a 3. SEICAC determines the possible values of an unknown digit using the following PHP function:

function findpossibledigits($inputval="0", $counttable=array()) {
 $inputvalstr = (string)$inputval;
 $inputvalarray = str_split($inputvalstr);
 $result = array();
 for ($i = 0; $i <= 9; $i++) {
  $actualcount = count(array_keys($inputvalarray, $i));
  $tablecount = $counttable[$i];
  if ($actualcount < $tablecount) {
   $result[] = $i;
  }
 }
 return $result;
}

The digit count table property can reduce the number of iterations required by CAC, and therefore the length of the required hash, but it is additional data that the sending computer will need to transmit to the receiving computer in order to reconstruct the number. I have not yet determined how effective a digit count table would be in compressing real-world data.

At this time, the algorithms and techniques I’ve created with SEICAC are experimental and additional testing is required to determine if they could be used to store or transmit real-world data. Any piece of digital data can be reconstructed in isolation by counting to the number representing that data. Finding the best algorithm to count to that number, and the properties to verify that the desired number has been located, is the key to making reconstruction of the data more efficient than transmitting the data bit-by-bit.