Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Dale Cooper

macrumors regular
Original poster
Sep 20, 2005
218
0
Hi,
I'm trying to find a way to compress four 2-bit values into one byte. I have a file with four different 1 byte ascii characters, but since the "alphabet" has only 4 letters I can in theory code four of these into one byte.

I've made a method that takes four characters at the time, and "codes" them into one integer between 0 and 255. Since 1 byte can represent 256 different values, and one C char is one byte, I was under the impression that I could convert the 2-byte integer to a 1-byte char somehow, but I can't find out how to do this!? Is this possible, and if so - how??

Any suggestions would be greatly appreciated!
 
Thanks so much for your reply! I'm not sure I completely understand though:eek:

Say I have an alphabet with the four chars, "\n", "h", "3" and "9".
Do I do like this?
Code:
int main(void) {
	char test[4];
        test[0] = ' \n';
        test[1] = ':h;
        test[2] = '3';
        test[3] = '9';
    
	unsigned char c;
	int i;

	for (i = 0; i < 4; i++) {
		test[i] | c;
		c<<2;
	}

	printf("%d\n", c);
}
This makes c null/0!

Or do I first have to assign one combination of the four chars to an integer between 0 and 255, and then do the operations on that one?

For example
"\n" "\n" "\n" "\n" --> 0
"\n" "\n" "\n" "h" --> 1
[...]
"9" "9" "9" "9" --> 255

and then somehow convert each of these integers to a specific char? (This was my original plan).
 
If there are truly the same four characters across ALL input files ...

const char encode_match[] = { '\n', 'h', '3', '9' };

for each input character search for a match within encode_match[]

if the input character matches an entry in encode_match[] do a bitwise OR (as suggested above) of the INDEX value of the matched character, not the character being encoded.
 
Try this:

Code:
#include <stdio.h>

unsigned char charValue(char);

int main(int argc, char *argv[]) {
	char test[4];
	unsigned char c = 0;
	int i;
        test[0] = '\n';
        test[1] = 'h';
        test[2] = '3';
        test[3] = '9';
    
	for (i = 0; i < 4; i++) {
		c |= charValue(test[i]);
		c<<2;
	}

	printf("%d\n", c);
}

unsigned char charValue(char x) {
	unsigned char result = 0;
	switch(x) {
		case '\n':
			result = 0;
			break;
		case 'h':
			result = 1;
			break;
		case '3':
			result = 2;
			break;
		case '9':
			result = 3;
			break;
	}
	return result;
}

You need to get the 2-bit value first, you can't just or in the character values of your "indicators", they will not be 0-3.

-Lee

EDIT: Note that anything other than these 4 values passed into charValue will return 0, which matches the result for \n. Without additional error parameters, etc. there's not much to be done about this. I suppose charValue you call exit(-1) or something and kill the program, but that seems sort of extreme.
 
I prefer a lookup table method. This allows changes, as to which characters to encode, and decode, take place in single line of code.

The 'encode_char' functions success is its return value, while the 2-bit encoded result is placed into the unsigned char whose address was passed in ''address_encode_result".

Code:
//#include <stdio.h>
//#include <stdlib.h>
#include <string.h>

static const char encode_table[] =
{
  '\n', 'h', '3', '9'
};

int encode_char(char ch, unsigned char* address_encode_result)
{
  if ( address_encode_result )
  {
    for ( int i = 0; i < sizeof(encode_table); i++ )
    {
      if ( ch == encode_table[i] )
      {
        *address_encode_result = i;
        return true;
      }
    }
  }
  
  return false;
}

int main()
{
  unsigned char encoded_temp;
  unsigned char encoded;
  char      signature[] = { '3', '3', '9', 'h' };
  
  // encode 'signature' ...
  encoded = 0;
  for ( int i = 0; i < sizeof(signature); i++ )
  {
    encoded <<= 0x2;
    if ( encode_char(signature[i], &encoded_temp) == false )
    {
      return 1; // failure, non-encodable 'signature' character
    }

    encoded |= encoded_temp;
  }
  
  // ... clear 'signature' for reconstruction ...
  memset(signature, 0, sizeof(signature)/sizeof(signature[0]));
  

  //  ... reconstruct 'signature' from endcoded form

  // don't print result as 'alphabet' contains non-printable 'ASCII'
  for ( int i = (sizeof(signature)-1); i >= 0; --i )
  {
    signature[i] = encode_table[encoded & 0x03];
    encoded >>= 2;
  }

  return 0;
}
 
I really appreciate your help guys, thank you so much! I now have both the encoding and decoding working:)
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.