I'm getting into C++. I just wrote a Ruby script that takes over 20 minutes to process a 136MB file. Too slow. And, I have bigger files to process. So, it's off to C++ I go!
For a test, I wrote a small c++ app that simply reads in the file I need to process and writes it back out. It's coming in off a flash drive, and it output to my hard drive.
The first time I run it, it takes about 2-3 minutes to run. This is the same 136MB file I used with Ruby. Kinda slow I thought. Running it immediately again, it runs in about 3 seconds. Perfectly acceptable!
If I unplug the flash drive, and plug it back in, I'm back to the glacially slow process. (OK, 2-3 minutes better than 20...) I'll attribute this to the poor data transfer rate of the flash drive plugging in my keyboard.
Running hard-drive to same hard-drive is very fast - just a few seconds. Not sure if I'm getting any benefits of operating system I/O buffering or not, but I could reboot and test again, I guess.
I've only one book on C++ and it doesn't get into multithreading or advanced I/O buffering, so I'll be looking into these topics as well. For this file, and larger ones still to come, I would like a dedicated process reading as fast as it can, and then have a data-conversion process, and then finally a third output process for the new file.
Reader Process - read as fast as it can, not stopping for anthing except out of memory conditions, and then if that happens, start reusing buffers that have been emptied.
Converter process - when data exist in a buffer, convert it and place it into a seond buffer for output
Writer process - when data starts showing up in the output buffer, start writing. Or, wait for larger chunks of data then go to sleep waiting for the next chunk of data.
One last comment / question.
In my first tests with this homemade file-copier, I simply output the first 3 records (128 bytes each) of non-displayable binary data to the xcode console.
I then tweaked my program to write these same 3 records to a second file on my hard drive.
Changing my program again to read from the file I just created and output to the console, the non-displayable binary data was different. The newly created file appears to have extra data prefixed to the original data. Does anyone have an explanation for this?
No rocket science here:
Thanks, Todd
For a test, I wrote a small c++ app that simply reads in the file I need to process and writes it back out. It's coming in off a flash drive, and it output to my hard drive.
The first time I run it, it takes about 2-3 minutes to run. This is the same 136MB file I used with Ruby. Kinda slow I thought. Running it immediately again, it runs in about 3 seconds. Perfectly acceptable!
If I unplug the flash drive, and plug it back in, I'm back to the glacially slow process. (OK, 2-3 minutes better than 20...) I'll attribute this to the poor data transfer rate of the flash drive plugging in my keyboard.
Running hard-drive to same hard-drive is very fast - just a few seconds. Not sure if I'm getting any benefits of operating system I/O buffering or not, but I could reboot and test again, I guess.
I've only one book on C++ and it doesn't get into multithreading or advanced I/O buffering, so I'll be looking into these topics as well. For this file, and larger ones still to come, I would like a dedicated process reading as fast as it can, and then have a data-conversion process, and then finally a third output process for the new file.
Reader Process - read as fast as it can, not stopping for anthing except out of memory conditions, and then if that happens, start reusing buffers that have been emptied.
Converter process - when data exist in a buffer, convert it and place it into a seond buffer for output
Writer process - when data starts showing up in the output buffer, start writing. Or, wait for larger chunks of data then go to sleep waiting for the next chunk of data.
One last comment / question.
In my first tests with this homemade file-copier, I simply output the first 3 records (128 bytes each) of non-displayable binary data to the xcode console.
I then tweaked my program to write these same 3 records to a second file on my hard drive.
Changing my program again to read from the file I just created and output to the console, the non-displayable binary data was different. The newly created file appears to have extra data prefixed to the original data. Does anyone have an explanation for this?
No rocket science here:
Code:
int main (int argc, char * const argv[]) {
int i ;
char n[4096*10] ; // buffer
ofstream out(OFILE, ios::out | ios::binary ) ;
if (!out) {
cout << "Cannot open output file " << OFILE << ".\n" ;
return -1 ;
}
ifstream in(IFILE, ios::in | ios::binary) ;
if (!in) {
cout << "Cannot open input file " << IFILE << ".\n" ;
return -1 ;
}
i = 0 ;
while (!in.eof() ) {
in.read( (char *) &n, sizeof n ) ;
i += in.gcount() ;
out.write((char *) &n, in.gcount() ) ;
}