Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

juzernejm

macrumors member
Original poster
Mar 13, 2015
51
3
Hello people. First discussion.. I hope someone will help me with this.

I have a list of jdf files in a folder (they are like html files: lots of code inside).
Is it possible to take values written inside these jdf files and put them into an excel file?
Maybe in a simple way like a script or in Automator maybe (I have no idea).

The content of the jdf files is something like this:

Code:
TimeStamp="2015-03-09T09:19:53+01:00"
DescriptiveName="Project Name"
Pages Actual="12"
SSi:Signature="Sig. Name"

The text that I want to copy is inside the quotes.
So in the case above on the excel file I would have something like this:

Code:
[	A	][	B	][	C	][	D	]
[1][2015-03-09…][Project Name][12][Sig. Name]
[2]
[3]

What would be a siple way (hopefully in mac) and considering my poor programming skills...

Thanks
 
Simplest would be to write an application in something like python to read those files in and then parse and spit out the details in a comma delimited format, like so:

Code:
[2015-03-09…],[Project Name],[12],[Sig. Name]
[2015-03-09…],[Project Name],[12],[Sig. Name]

Save the file as whatever.csv and open it in Excel, then save as an Excel file. (I think you could actually create a quick automator script for this bit. I am not 100% sure since I am working on a Mac without Office installed right now, but you can certainly write an automator script to open excel files and manipulate them. Let me double check)

There is no escaping the fact that you will have to write the parsing code to get the data out of TimeStamp="datadatadata", DescriptiveName="datadatadata" and so forth. But luckily you will find plenty of python code that does nearly exactly that. For example:

http://stackoverflow.com/questions/17105456/parsing-data-from-text-file

The above is not the exact solution to your problem and further work will be required.
 
Thanks for the detailed answer TheSeb
I'll look into that link (even though it was scary to look all that code.. I have no idea where to start, but that's my problem now)

You are right, a csv file would be ok. Even a simple txt file at this point.

The goal for me in doing this, is to have a list so that I could find for example: all the jdfs that have a certain word on one of those entries in quotes. For some reason in my computer I can't make it look inside the content of these jdf files... maybe that is possible on mac(?) who knows.

Another option could be to just copy those values in quotes and create with the copied values an empty file named with the values it copied. In this way I could do a simple search (cmd+F) and find what I'm looking for.
 
Thanks for the detailed answer TheSeb
I'll look into that link (even though it was scary to look all that code.. I have no idea where to start, but that's my problem now)

You are right, a csv file would be ok. Even a simple txt file at this point.

The goal for me in doing this, is to have a list so that I could find for example: all the jdfs that have a certain word on one of those entries in quotes. For some reason in my computer I can't make it look inside the content of these jdf files... maybe that is possible on mac(?) who knows.

Another option could be to just copy those values in quotes and create with the copied values an empty file named with the values it copied. In this way I could do a simple search (cmd+F) and find what I'm looking for.

I created something to get you going in a couple of minutes in python. The matching is very rudimentary. It looks at each line in the input file and then looks for ="something" and pulls out something

Code:
import re
import sys


print '\nArgument supplied: ' + str(sys.argv) 


outputFile = open('output.txt', 'a')

for fileNumber in range(1,len(sys.argv)):
	print '\nProcessing file: ' + sys.argv[fileNumber]
	with open(sys.argv[fileNumber],'r') as inputFile:
		outputText = ''
		data = inputFile.readlines()
		for line in data:
			
			matches = re.findall(r'="(.*?)"', line)
			if len(matches) > 0:	
				if outputText == "":
					outputText = matches[0] 
				else:
					outputText = outputText + ',' + matches[0]
		
		inputFile.close()
		
	print 'File: ' + str(fileNumber) + ' output text = ' + outputText
	
	outputFile.write(outputText + '\n')


outputFile.close()

I used the following two test files to check if it works

input.jdf
Code:
TimeStamp="2015-03-09T09:19:53+01:00"
DescriptiveName="Project Name"
Pages Actual="12"
SSi:Signature="Sig. Name"
nlsh nlshdhfdjdhf nls
dfhjdhjf djfhjdfhjd

input2.jdf
Code:
TimeStamp="2015-03-09T09:19:53+01:02"
DescriptiveName="Project Name2"
Pages Actual="13"
SSi:Signature="Sig. Name2"

It creates an output file called output.txt, which you can rename to output.csv or whatever. If the file output.txt does not exist, it will create one first. Otherwise it keeps appending to the file, so if you run the script again and output.txt exists, it will append more stuff to it. If you run it multiple times using the same input files, you will have duplicate lines in the output.

Code:
2015-03-09T09:19:53+01:00,Project Name,12,Sig. Name
2015-03-09T09:19:53+01:02,Project Name2,13,Sig. Name2

Copy the .py file that I linked to the directory where the jdf files are. The way to run this program for the input files specified above is to open terminal on your Mac and then type in

Code:
python match2.py input.jdf input2.jdf

In your case you would have to substitute input.jdf and input2.jdf with whatever files you want to read in. I could make it so that it reads in all of the jdf files in the current directory, but I didn't because I am lazy.

Screenshot%202015-03-14%2014.24.23.png


The script does not modify your jdf files, however you use it at your own risk with no warrantee whatsoever. I am not responsible for any harm that this application may cause to your data, your computer, you, your dog, or anyone else, nor for any associated trauma.

Right click, save link as
https://dl.dropboxusercontent.com/u/25622670/match2.py

Edit: if a line does not have ="something" in it, then this will crash. Give me a moment to fix it. Ok, it's fixed. That should be more robust, but I am only going by what I can see in the data you have provided in your first post.
 
Last edited:
Hey Seb I'm speechless. Thanks so much for helping.
I'm a graphic designer, don't hesitate to ask if you need anything.

Today I'm not at the computer with the jdfs but I can test it with a jdf I create from scratch also I guess.. but, again this looks perfect to me. You definitely were not lazy :D

Just out of curiosity..
Would it be more difficult to write something that does this same thing (copying from a jdf file and list it to another text file) but as soon as a jdf is created? Or it's complicated?
 
Hey Seb I'm speechless. Thanks so much for helping.
I'm a graphic designer, don't hesitate to ask if you need anything.

Today I'm not at the computer with the jdfs but I can test it with a jdf I create from scratch also I guess.. but, again this looks perfect to me. You definitely were not lazy :D

:) Don't thank me yet, because I just looked at a real jdf file example and this won't work exactly like you were hoping. Unfortunately I am completely ignorant of jdf files and didn't realise they were actually xml.

For example, in a line such as the below, the code won't work, because of the space between = and ".

Code:
JDF xmlns = "http://www.CIP4.org/JDFSchema_1_1" ID = "ColorTest" JobID = "ColorJob"

That's an easy fix, however, if all of that info is on one line, then the above script will pick up the first instance of "data" only, which in this case would be "http://www.CIP4.org/JDFSchema_1_1"

The good news is that because this is xml, parsing it in python is actually pretty easy, but the code above would need to be changed quite a lot to get at the exact elements like JobID or whatever and then spit them out. The dirty and quick option is to modify the above code to simply spit out all of the instances of "data" on each line into a text file, but that would only take 1 or 2 extra lines of code. This would work, in theory, if every jdf file is produced to look exactly the same, but I have no idea if that is true, because I have never dealt with these files. The comma delimited output is likely to be useless, so the xml parsing option is the best one, because it gives you full control.

Just out of curiosity..
Would it be more difficult to write something that does this same thing (copying from a jdf file and list it to another text file) but as soon as a jdf is created? Or it's complicated?

Conceptually it's pretty easy. There are many ways to achieve this, but the simplest would be to let this script run in the background continuously, wake up every minute, or whatever, and check if any new files are in the "inbox" directory. Then the script would process the data, spit out the info into the output file and move the jdf file into the "done" directory.

However, again this is where a choice needs to be made on how robust this solution has to be. There could be a situation (very small chance, but it is possible) that the script tries to open the file when it is still being written to the "inbox" directory. But because it's xml and it has the <audit> elements, that issue should be easy enough to deal with.

Do you only need some of the elements (like "Status" and "JobPartID") to go into the csv file? If you list them, then I can modify the script above to at least do some of them and explain how to modify it for your own needs. Also, I would need at least one of your actual jdf files. Make sure to remove any personal/private stuff and replace it with any data.
 
Last edited:
This is a jdf file I found online, just to give you an idea.
But the one I have, even though it is structured like this one, doesn't have the four values I put on the first post of this thread (which were the actual values I wanted to be listed out of all that code).

Code:
<?xml version="1.0" encoding="UTF-8"?>
<JDF ID="ID_040050" Type="Product" Version="1.2" Status="Waiting" JobID="Sample_GrayBox" JobPartID="Apogee Prepress Minimal Graybox" DescriptiveName="Apogee Prepress Minimal Graybox" xmlns="http://www.CIP4.org/JDFSchema_1_1" xmlns:jdftyp="http://www.CIP4.org/JDFSchema_1_1_Types" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.CIP4.org/JDFSchema_1_1
D:\shared\Software\JDFSchema_1_2\JDF.xsd">
	<CustomerInfo CustomerID="100002">
		<Contact ContactTypes="Customer Administrator">
			<Company OrganizationName="DIOXINUS BV"/>
			<Person FamilyName="MESTDAGH">
			</Person>
		</Contact>
	</CustomerInfo>
	<Comment Name="JobDescription">6 page brochure v5</Comment>
	<AuditPool>
		<Created Author="Miss Prepress" TimeStamp="2004-02-18T22:24:09+01:00" AgentName="MIS" AgentVersion="v1.0"/>
	</AuditPool>
	<ResourcePool>
		<ColorPool Class="Parameter" ID="ColorPool" DescriptiveName="Colors for the job" Status="Available" Locked="false">
			<Color CMYK="1 0 0 0" Name="Cyan"/>
			<Color CMYK="0 1 0 0" Name="Magenta"/>
			<Color CMYK="0 0 1 0" Name="Yellow"/>
			<Color CMYK="0 0 0 1" Name="Black"/>
		</ColorPool>
		<ColorantControl Class="Parameter" ID="ColorantControl" DescriptiveName="Colors of the job" Locked="false" Status="Available">
			<ColorantParams>
				<SeparationSpec Name="Cyan"/>
				<SeparationSpec Name="Magenta"/>
				<SeparationSpec Name="Yellow"/>
				<SeparationSpec Name="Black"/>
			</ColorantParams>
			<ColorPoolRef rRef="ColorPool"/>
		</ColorantControl>
		<Media Class="Consumable" ID="Plate000001" Brand="SuperPlates" DescriptiveName="Plate" MediaType="Plate" Status="Available"/>
		<Layout Class="Parameter" ID="LAY000" Status="Incomplete" Name="CoverLayout">
			<Signature Name="SIG1059600001">
				<Sheet Name="SHT1059600001">
					<Surface Side="Front"/>
				</Sheet>
			</Signature>
		</Layout>
		<ExposedMedia Class="Handling" PartIDKeys="SignatureName SheetName Side Separation" PartUsage="Implicit" ID="ExposedMedia" DescriptiveName="Plates" Status="Unavailable" Locked="false">
			<MediaRef rRef="Plate000001"/>
			<ExposedMedia SignatureName="SIG1059600001">
				<ExposedMedia SheetName="SHT1059600001">
					<ExposedMedia Side="Front">
						<ExposedMedia Separation="Cyan" ProductID="0001"/>
						<ExposedMedia Separation="Magenta" ProductID="0002"/>
						<ExposedMedia Separation="Yellow" ProductID="0003"/>
						<ExposedMedia Separation="Black" ProductID="0004"/>
					</ExposedMedia>
				</ExposedMedia>
			</ExposedMedia>
		</ExposedMedia>
		<Component ID="Component_Link001" Class="Quantity" ComponentType="FinalProduct" Status="Unavailable"/>
	</ResourcePool>
	<ResourceLinkPool>
		<ComponentLink rRef="Component_Link001" Usage="Output"/>
	</ResourceLinkPool>
	<JDF ID="PRE000" JobPartID="PRE000" Type="ProcessGroup" Category="PrePress" Status="Waiting" DescriptiveName="PrePress Folder">
		<ResourcePool>
			<RunList Class="Parameter" ID="RNL000_D" Status="Unavailable"/>
			<RunList Class="Parameter" ID="RNL000_M" Status="Unavailable"/>
			<RunList Class="Parameter" ID="RNL000" Status="Available" NPage="8"/>
		</ResourcePool>
		<JDF ID="PPP000" JobPartID="PPP000" Type="ProcessGroup" Types="PrePressPreparation" Category="PrePressPreparation" DescriptiveName="GB PrePressPreparation" Status="Waiting">
			<ResourceLinkPool>
				<RunListLink rRef="RNL000" ProcessUsage="Document" Usage="Input"/>
				<RunListLink rRef="RNL000_D" ProcessUsage="Document" Usage="Output"/>
			</ResourceLinkPool>
		</JDF>
		<!-- no input resources so operator must select a template (+associated marks) -->
		<JDF ID="STR000" JobPartID="STR000" Type="ProcessGroup" Types="ImpositionPreparation" Category="ImpositionPreparation" DescriptiveName="GB ImpositionPreparation" Status="Waiting">
			<NodeInfo/>
			<!-- This tells the Apogee Prepress operator which impositon to choose  -->
			<Comment Name="Instruction">Imposition: folder with 6 pages folded using F6-1, WorkAndTurn</Comment>
			<ResourceLinkPool>
				<LayoutLink rRef="LAY000" Usage="Output"/>
				<RunListLink rRef="RNL000_M" ProcessUsage="Marks" Usage="Output"/>
			</ResourceLinkPool>
		</JDF>
		<!-- this creates the plates and previews -->
		<JDF ID="IMS000" JobPartID="IMS000" Type="ProcessGroup" Types="Imposition RIPing ImageSetting" Category="FinalImaging" Status="Waiting" DescriptiveName="GB PlateMaking">
			<NodeInfo/>
			<ResourceLinkPool>
				<RunListLink rRef="RNL000_D" ProcessUsage="Document" Usage="Input"/>
				<RunListLink rRef="RNL000_M" ProcessUsage="Marks" Usage="Input"/>
				<LayoutLink rRef="LAY000" Usage="Input"/>
				<ColorantControlLink rRef="ColorantControl" Usage="Input"/>
				<MediaLink rRef="Plate000001" Usage="Input"/>
				<ExposedMediaLink rRef="ExposedMedia" Usage="Output">
					<Part SignatureName="SIG1059600001" SheetName="SHT1059600001"/>
				</ExposedMediaLink>
			</ResourceLinkPool>
		</JDF>
	</JDF>
</JDF>

Here's the story behind these jdfs..
They are files that are used in the printing industry. Every time a new job gets processed, the computer generates these jdf files where it collects all the (technical) infos about the job to be printed (like number of pages, of plates.. etc).
Although I work with this files on a mac, they are actually on a Windows machine. There is a folder on that machine (lets call it "Jobs") inside that folder there are sub-folders for every new job I do. And each of these job folders has its jdf file inside.

My thinking was to do a search (filtering it to only give me the jdfs); copy all these jdf files somewhere else and do what we are trying to do here.

I don't know if I was clear in explaining.. but if you understood it and think there are better ways..
 
Last edited:
This is where things become a little bit more complicated and hence why I would need a sample of your files to be able to do this.

For example, you say you want DescriptiveName, but which descriptive name do you want?

There is a DescriptiveName in <JDF> and in <ColorPool> and in other places.

The timestamp is another good example. Looking at the example you have posted, there is only one timestamp, yet in other examples that I am looking at there are multiple timestamp attributes, which makes sense, but the question would be, "Which one are you interested in?" Is it the last one? Or the first one? Or a particular status time stamp?

Code:
< AuditPool >
< Created AgentName = "Rainer's JDFWriter 0.2000"
TimeStamp = "2005-06-01T10:26:11+01:00" />
< Modified AgentName = "EatJDF Complete: task=*"
TimeStamp = "2005-06-01T10:26:57+01:00" />
< PhaseTime End = "2005-06-01T10:26:57+01:00"
Start = "2005-06-01T10:26:57+01:00"
Status = "Setup"
TimeStamp = "2005-06-01T10:26:57+01:00" />
< PhaseTime End = "2005-06-01T10:26:57+01:00"
Start = "2005-06-01T10:26:57+01:00"
Status = "InProgress"
TimeStamp = "2005-06-01T10:26:57+01:00" />
< PhaseTime End = "2005-06-01T10:26:57+01:00"
Start = "2005-06-01T10:26:57+01:00"
Status = "Cleanup"
TimeStamp = "2005-06-01T10:26:57+01:00" />
< ProcessRun End = "2005-06-01T10:26:57+01:00"
EndStatus = "Completed"
Start = "2005-06-01T10:26:57+01:00"
TimeStamp = "2005-06-01T10:26:57+01:00" />
</ AuditPool >

I would need to understand your requirements in detail.
 

Attachments

  • tree_swing_development_requirements.jpg
    tree_swing_development_requirements.jpg
    86.9 KB · Views: 138
funny that image :D

You're right, I know.
The computers are not flexible so they need precise data.

I'll get to the jdfs monday.

Can't thank you enough for all you did, Sebastian.
Even if you are busy later and can't follow this discussion, it's ok. You told me all I needed to know. And I learned a lot. Thanks!
 
I have now the file. For privacy issues I would like to send it privately (if it's possible).

I did a search inside the jdf and you are right "TimeStamp=" and "DescriptiveName=" are repeated, the other two, are not. But in all cases the first one that the script finds, is the one to use.

Tell me if I can send it in a PM.
 
If all you need to do is find which files contain particular words, perhaps there's no reason at all to put the contents in a spreadsheet. Simply use something like 'grep' to search your files for a pattern.

E.g from a terminal session

Code:
grep -e "prepress" -f *.jdf
 
If all you need to do is find which files contain particular words, perhaps there's no reason at all to put the contents in a spreadsheet. Simply use something like 'grep' to search your files for a pattern.

Can't make it work :(
I put the file on the desktop and on the terminal went to the desktop folder, then entered the line you wrote (changing what's in quotes to a text that was on the jdf file) but I get an empty line in the terminal and no result. Maybe I'm doing something wrong.. Anyway is good to know that grep can be used in the terminal. Thanks
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.