Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

stndn

macrumors member
Oct 22, 2006
80
1
earth
Here's the updated version of the PHP code. Note that it's quickly put together so it might not be the best solution but it works for now. I'm only splitting the text for you. How you process it later is up to you -)

PHP:
<?php
# Note: I modified some of the texts for testing purpose (the 1-3 digits after Transports part)

$myText[]	= '06/07/07 02:44:28.917 INFO Servlet.Engine.Transports : 0     <FBIBLVI>';
$myText[]	= '06/07/07 02:44:28.918 INFO Servlet.Engine.Transports : 01         <client_group_id>xxxxxxxxxx</client_group_id>';
$myText[]	= '06/07/07 02:44:28.918 INFO Servlet.Engine.Transports : 04         <portal_id>xxxxxxxxxx</portal_id>';
$myText[]	= '06/07/07 02:44:28.919 INFO Servlet.Engine.Transports : 098         <incl_rules_table>N</incl_rules_table>';
$myText[]	= '06/07/07 02:44:27.585 INFO Servlet.Engine.Transports : 088 Type: TEST';
$myText[]	= '06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 013 Site User: QA_BPORTAL';
$myText[]	= '06/07/07 02:44:27.588 INFO Servlet.Engine.Transports : 0 Retrieving user ent list';
$myText[]	= '06/07/07 02:44:27.619 INFO Servlet.Engine.Transports : 01 Retrieve ent names for user: 3771000016, contract: qa_bportal';

foreach ($myText as $val)
{
	preg_match ('/^([^\s]+)\s+([^\s]+)\s+(.*?)\s+(Servlet.Engine.Transports : \d+)\s+(.*)$/i', $val, $myResult);

	print_r ($myResult);
}

?>

For your reference, this is roughly what happens in the regex:

^ -> Start at the beginning of line
([\s])+) -> Grab 1 or more non-space characters and store it in placeholder 1 (note the () )
\s+ -> Grab 1 or more whitespace characters
([^\s]+) -> Grab 1 or more non-space characters and store it in placeholder 2
(.*?)\s+ -> Grab any characters before encountering a whitespace. Note the ?, which is a limiter. Place the 'any characters' in placeholder 3. Then grab 1 or more whitespace characters
(Servlet.Engine.Transports : \d+) -> Grab the Servlet.(etc) and then one or more digits, then store it in placeholder 4. Note that I used a single space around the : sign, but if it can be any type of whitespace, substitute it with \s+ as before.
\s+ -> Grab 1 or more whitespace
(.*) -> Grab 1 or more any characters and store it in placeholder 5.
$ -> Denotes the end of line.

The i at the end is for case insensitive matching. I put it there just to be safe.

Hope it helps.

-stndn.
 

Soulstorm

macrumors 68000
Feb 1, 2005
1,887
1
Sorry for messing up the thread, but could someone give any idea on how to answer the original question in C++?
 

iBert

macrumors regular
Jul 14, 2004
148
0
Sorry for messing up the thread, but could someone give any idea on how to answer the original question in C++?

I'll suggest to use Perl or a better scripting language to work with text or strings. And then execute that script and read the new file in C++, I haven't used C++ in some time but I remember how trouble some it is to work with string in it. Just a suggestion!
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
Sorry for messing up the thread, but could someone give any idea on how to answer the original question in C++?

No worries!

I didn't even consider using C++, as I was already familiar with the parsing power of PHP and PERL. I used those as a starting point, and recently decided to go with PHP. I have been working on some code for a while, using the help I am getting here as a starting point, so I wouldn't really want to go back.
 

krunk

macrumors regular
Jan 29, 2004
236
0
Sorry for messing up the thread, but could someone give any idea on how to answer the original question in C++?

If it's a flat text file that is not changing (or you don't care about constant monitoring just the data at a single point) this would suffice:

06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 013 Site User: QA_BPORTAL

C code
Code:
FILE *fp;

fp = fdopen(f, "r");
char* a[10],b[10],c[10],d[10],e[10],f[10],g[10], h[10], i[10], j[10]; 

// if you want to ignore (skip) a section, you add a * like this %*s
while(fscanf(fp, "%s %s %s %s %s %s %s %s %s %s",  &a, &b, &c, &d, &e, &f, &g, &h, &i, &j)) {
// do stuff with your string values, output, write to file...including html
}

fp.close();

C++
Code:
fstream f.open("/path/to/log/file");

char* line[100];

while(f.getline(line, 100)) {
// and so on. 
}

In C you have to do lower level processing of the input.

If you wish to "monitor" the file, as in sit and watch it for changes then do something on that event then even queues are more important. Event Queues depend on the Operating system, kqueue for bsd, epoll for linux, and something event manager or some such on Windows.

The bsd kqueue would look like:

Code:
struct kevent change;
kq = kqueue();
int f, kq, nev;

EV_SET(&change, f, EVFILT_VNODE, EV_ADD | EV_ENABLE | EV_ONESHOT, NOTE_WRITE | NOTE_EXTEND, 0,0);

for(;;) {
nev = kevent(kq, &change, 1, &event, 1, NULL);
if(nev == -1) {perror("kevent");}

else if(nev > 0) {
if(event.fflags & NOTE_WRITE || event.fflags & NOTE_EXTEND) {
this->functionToProcessDataStream(f);
}
close(kq);
close(f);
}
}

Those aren't debugged at all and are just meant to illustrate the basic pattern. If you go with the Event Queue you should look into an Observer/Dispatcher design pattern as they are particularly suited for such work.
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
OK I have been working on this for a while, and I have a decent script going, but I was wondering if anyone can help me out with my new problem.

I now need to take the script I have an do two things to it. First, I need to take data in from a text file, in the format I have been using in this thread, but thousands of lines of it. So the parser needs to pull the data from this text file, do the parsing (the parsing step I have under control), and then the second step is to unload it into a mysql database.

Any ideas or starting points would be great.
Thanks!
 

iBert

macrumors regular
Jul 14, 2004
148
0
My first thought or idea would be read 1 line at a time parse it and push it to the DB. Problem with this is that it'll be too expensive in computing time. Since you want to upload this data to a DB, why not create an XML for the data you wish to push to the DB. So then you parse all the text file to an XML file and then tell the DB to read the XML file and store it.

A friend of mine works a lot with DBs and I've heard him talk about how sql can manipulate XMLs. Pretty sure MySQL can handle XML types. I guess with the second idea you do less computing or better word would be you don't do much work. Like on my first idea, where you read a line parse it then access a DB to store it vs read a line parse store it in an XML then dump all of the XML to DB.
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
My first thought or idea would be read 1 line at a time parse it and push it to the DB. Problem with this is that it'll be too expensive in computing time. Since you want to upload this data to a DB, why not create an XML for the data you wish to push to the DB. So then you parse all the text file to an XML file and then tell the DB to read the XML file and store it.

A friend of mine works a lot with DBs and I've heard him talk about how sql can manipulate XMLs. Pretty sure MySQL can handle XML types. I guess with the second idea you do less computing or better word would be you don't do much work. Like on my first idea, where you read a line parse it then access a DB to store it vs read a line parse store it in an XML then dump all of the XML to DB.

So instead of parsing directly to the DB, I could first parse to an XML file, and then parse from the XML to the DB? Would this be easier than going directly from text to the DB through PHP?

Would doing it one line at a time, without XML, take way too long? I am planning on running this at like midnight each night, and hoping for it to be available by like 7:00 AM.
 

iBert

macrumors regular
Jul 14, 2004
148
0
Don't expect this to take hours, well I wouldn't. But I don't know how big of a text file are you going to be parsing. What I said was more from an execution perspective or an algorithm analysis, think about it if you do a manual work. If someone gives you a book full of things and you have to look for something specific to store elsewhere, would you organize all the stuff in the box given to you or would you stop and store the things as you find them? Remember that computers now a day and quite powerful to manipulate data, although for this think about where is that process gonna be running. A server or a simple PC.

Remember, to read a file that is 1 execution for each line then you execute your parser then you execute your storing. But if you read all your file to memory then parse it all to an XML file so that you can push the info to the DB. Remember that you need to open a connection to the DB and the push that one line, so if you have lets say 100 lines to parse that 100 inserts you'll be executing. You won't necessarily need to connect and close the DB for each line, but keeping a connection open for x amount of time to do this job could be a security breach you might want to avoid. Well this if this is running on a machine online. But if you got an XML file with all your data, you can then connect to the DB and tell the DB to read this XML and do the insert. So you'd connect to the DB tell the DB to read your XML insert the data and close.

The idea with the XML should or could be consider if MySQL can handle XML types, that way you upload your XML like a variable and SQL will handle the rest. Again, I haven't done anything with this but my friend uses it a lot when handling XMLs. One note he uses SQL Server from Microsoft, so not sure if that is only available to SQL Server or any DBM.

Another thing, this are just ideas that are coming as I've read this post. Maybe wait if others do some input or run this by anyone else you know you can talk to.
 

krunk

macrumors regular
Jan 29, 2004
236
0
On line by line entry into the DB:

I've done this quite a bit, a 10,000+ line file can be parsed and entered into the DB using Perl's DBI package in less then 5m. So unless your dealing with 100,000 or more lines I'd say direct entry into MySQL should be fine.
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
On line by line entry into the DB:

I've done this quite a bit, a 10,000+ line file can be parsed and entered into the DB using Perl's DBI package in less then 5m. So unless your dealing with 100,000 or more lines I'd say direct entry into MySQL should be fine.

I will be dealing with considerably more than 100,000 lines. Suggestions?
 

krunk

macrumors regular
Jan 29, 2004
236
0
I will be dealing with considerably more than 100,000 lines. Suggestions?

As a previous poster mentioned, you may want to two stage it then. Create a text file with the data in a "nice" format like xml, then load that into mysql.

If I'm not mistaken though, xml import support in mysql is pretty new and most people parse the xml file and then insert...much like your already doing.

CSV might be a better data storage as it is well supported by mysqlimport.

Best thing to do is lurk the mysql user forums and search their documentation to find the method that best suits you.

Most of these alternative methods have to do with preventing your server from being overly taxed during the import procedure. If that's not a concern, e.g. if you could set the script to run for an extended time (say late at night) without worry...mine as well just insert as you go. There's nothing inherently bad about taking 30m to an hour to import data if that's what it takes and it doesn't adversely effect users.
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
OK it has come time to test. I need a way to test my php and mysql code. How would you suggest doing this. I have installed XAMMP, but can't figure out how to use it for php. Any other suggestions?

Oh, and remember, I am in windows. Thanks.
 

krunk

macrumors regular
Jan 29, 2004
236
0
OK it has come time to test. I need a way to test my php and mysql code. How would you suggest doing this. I have installed XAMMP, but can't figure out how to use it for php. Any other suggestions?

Oh, and remember, I am in windows. Thanks.

If integrating into an existing db copy that db over to "existing_dbname_testing. If not, just create your db. Take out a snippet, few 1000 lines, of the file for testing.

Write code, test, debug, fix, test, debug, fix. And so on until your sure it's good. Then run with it.

Make backups as needed.
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
If integrating into an existing db copy that db over to "existing_dbname_testing. If not, just create your db. Take out a snippet, few 1000 lines, of the file for testing.

Write code, test, debug, fix, test, debug, fix. And so on until your sure it's good. Then run with it.

Make backups as needed.

Don't I need to install something, like MySQL or PHP? I have never done this before, so you all need to be pretty specific. Thanks
 

krunk

macrumors regular
Jan 29, 2004
236
0
Don't I need to install something, like MySQL or PHP? I have never done this before, so you all need to be pretty specific. Thanks

Not to be vague, but you need to install what you need hehe. If your serving it, you'll need an httpd server.

If your writing the script in php, you'll need php. . . if in perl you need to install perl.

It's been a long time since I fussed with windows, but you'll have to install all of these tools to do the job.

Perhaps a windows user can pipe in with some good tutorials on how to get these things up and running smoothly in that environment.

It bears mentioning that all these packages are included by default in osx and easily installed in any linux with a single command. Unless it's absolutely necessary to host on the windows machine, I personally would simply share the log directory via smb and do my work in a unix rather then go through the trouble of installing and configuring for a one shot job.
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
Perhaps a windows user can pipe in with some good tutorials on how to get these things up and running smoothly in that environment.

It bears mentioning that all these packages are included by default in osx and easily installed in any linux with a single command. Unless it's absolutely necessary to host on the windows machine, I personally would simply share the log directory via smb and do my work in a unix rather then go through the trouble of installing and configuring for a one shot job.

Well I don't need to host for anyone else to see, I just need a way to test. Hosting comes in from a different department.

Maybe a good php editor could do this. Does anyone know a free php editor that allows testing for windows? That would be awesome!
 

krunk

macrumors regular
Jan 29, 2004
236
0
Well I don't need to host for anyone else to see, I just need a way to test. Hosting comes in from a different department.

Maybe a good php editor could do this. Does anyone know a free php editor that allows testing for windows? That would be awesome!

The best way to test for php is within a web server. For strictly scripts, ones that don't tie into server specific functions, testing is done by installing php then executing the script with the php cli interpreter.

Looks something like this:
$ php -e myUberParsingScript.php
 

macfaninpdx

macrumors regular
Mar 6, 2007
198
0
OK it has come time to test. I need a way to test my php and mysql code. How would you suggest doing this. I have installed XAMMP, but can't figure out how to use it for php. Any other suggestions?

Oh, and remember, I am in windows. Thanks.

Don't I need to install something, like MySQL or PHP? I have never done this before, so you all need to be pretty specific. Thanks

If you have installed XAMPP, you have already installed your server testbed on your local machine. XAMPP includes Apache (for the http server), PHP and MySQL.

First, check your Start menu. Look for Program -> Apache Friends -> XAMPP -> XAMPP Control Panel. If you run this, a control panel should pop up telling you what is running. Click the Admin button next to Apache to see some links to a web-based administration for your installation. Follow the suggestions to get you started.

Then you can place your php folder (files) in c:\xampp\htdocs\ if you used the default location. Then fire up your browser of choice and head to http://localhost/<your_folder>/<your_file.php>.

Try to get that going before you dive into MySQL and setting up users, tables and permissions.
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
If you have installed XAMPP, you have already installed your server testbed on your local machine. XAMPP includes Apache (for the http server), PHP and MySQL.

First, check your Start menu. Look for Program -> Apache Friends -> XAMPP -> XAMPP Control Panel. If you run this, a control panel should pop up telling you what is running. Click the Admin button next to Apache to see some links to a web-based administration for your installation. Follow the suggestions to get you started.

Then you can place your php folder (files) in c:\xampp\htdocs\ if you used the default location. Then fire up your browser of choice and head to http://localhost/<your_folder>/<your_file.php>.

Try to get that going before you dive into MySQL and setting up users, tables and permissions.

Awesome, got the php to work. Thanks for the idiot-proof instructions :)
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
PHP:
<?php 
# Note: I modified some of the texts for testing purpose (the 1-3 digits after Transports part) 

$myText[]    = '06/07/07 02:44:28.917 INFO Servlet.Engine.Transports : 0     <FBIBLVI>'; 
$myText[]    = '06/07/07 02:44:28.918 INFO Servlet.Engine.Transports : 01         <client_group_id>xxxxxxxxxx</client_group_id>'; 
$myText[]    = '06/07/07 02:44:28.918 INFO Servlet.Engine.Transports : 04         <portal_id>xxxxxxxxxx</portal_id>'; 
$myText[]    = '06/07/07 02:44:28.919 INFO Servlet.Engine.Transports : 098         <incl_rules_table>N</incl_rules_table>'; 
$myText[]    = '06/07/07 02:44:27.585 INFO Servlet.Engine.Transports : 088 Type: TEST'; 
$myText[]    = '06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 013 Site User: QA_BPORTAL'; 
$myText[]    = '06/07/07 02:44:27.588 INFO Servlet.Engine.Transports : 0 Retrieving user ent list'; 
$myText[]    = '06/07/07 02:44:27.619 INFO Servlet.Engine.Transports : 01 Retrieve ent names for user: 3771000016, contract: qa_bportal'; 

foreach ($myText as $val) 
{ 
    preg_match ('/^([^\s]+)\s+([^\s]+)\s+(.*?)\s+(Servlet.Engine.Transports : \d+)\s+(.*)$/i', $val, $myResult); 

    print_r ($myResult); 
} 

?>

This works perfectly. Now I need one more thing for this part of the process.

Right now this script is taking the sample lines of log data that is placed in the script manually. I need it to pull from the log file, in text form, and then parse it, without having to enter the lines automatically. So $myText has to be put there from the log file, not hard-written into the code. Is this possible?

The log data is in this format:
Code:
06/07/07 02:44:27.585 INFO Servlet.Engine.Transports : 0 Type: TEST
06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 0 Site User: QA_BPORTAL
06/07/07 02:44:27.586 INFO Servlet.Engine.Transports : 0 Product:ProductId: Broker Portal Name: Broker Portal
06/07/07 02:44:27.588 INFO Servlet.Engine.Transports : 0 Retrieving user ent list
06/07/07 02:44:27.619 INFO Servlet.Engine.Transports : 0 Retrieve ent names for user: 3771000016, contract: qa_bportal

A log file is just thousands of lines like these ones.

Thanks!
 

iBert

macrumors regular
Jul 14, 2004
148
0
Yes barr08, should be kinda(quite) simple. Look at the read() function, I think that is the name, or google read function in php or something like that.

Just need to learn how reading a file works and you should be set. This last part you are on should be easy after you got all the parsing worked out. Couple of ways to do this, could read 1 line at a time or a x-lines at a time or the whole document. Once you get the general idea on how php read files and you can apply your parser test what works best. Hope that reading a file a few thousands of lines to memory won't kill the server. :)
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
Yes barr08, should be kinda(quite) simple. Look at the read() function, I think that is the name, or google read function in php or something like that.

Just need to learn how reading a file works and you should be set. This last part you are on should be easy after you got all the parsing worked out. Couple of ways to do this, could read 1 line at a time or a x-lines at a time or the whole document. Once you get the general idea on how php read files and you can apply your parser test what works best. Hope that reading a file a few thousands of lines to memory won't kill the server. :)

Thanks, I knew there was a name for it, I just couldn't remember it. I'll look into this.
 

macfaninpdx

macrumors regular
Mar 6, 2007
198
0
PHP:
$logfile = file("\\path\\to\\your\\file");

foreach ($logfile AS $linenum => $val) {
    preg_match ('/^([^\s]+)\s+([^\s]+)\s+(.*?)\s+(Servlet.Engine.Transports : \d+)\s+(.*)$/i', $val, $myResult);  

    print_r ($myResult);  
}

NOTE: The preg_match line was taken from your previous post - I haven't looked at it.
 

barr08

macrumors 65816
Original poster
Aug 9, 2006
1,361
0
Boston, MA
PHP:
$logfile = file("\\path\\to\\your\\file");

foreach ($logfile AS $linenum => $val) {
    preg_match ('/^([^\s]+)\s+([^\s]+)\s+(.*?)\s+(Servlet.Engine.Transports : \d+)\s+(.*)$/i', $val, $myResult);  

    print_r ($myResult);  
}

NOTE: The preg_match line was taken from your previous post - I haven't looked at it.

Worked great, thanks a ton!
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.